[MySQL] Fix mysql data corruption workflow

Problem:
Mysql cannot start normally and tried some recovery but failed

Log messages:

root@box71:/data/mysql# mysqld  
200908 11:50:17 [Warning] 'debug' is disabled in this build 
200908 11:50:17 [Note] InnoDB: Using mutexes to ref count buffer pool pages 
200908 11:50:17 [Note] InnoDB: The InnoDB memory heap is disabled 
200908 11:50:17 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 
200908 11:50:17 [Note] InnoDB: Compressed tables use zlib 1.2.3.4 
200908 11:50:17 [Note] InnoDB: Using Linux native AIO 
200908 11:50:17 [Note] InnoDB: Using CPU crc32 instructions 
200908 11:50:17 [Note] InnoDB: Initializing buffer pool, size = 2.0G 
200908 11:50:17 [Note] InnoDB: Completed initialization of buffer pool 
200908 11:50:17 [Note] InnoDB: Highest supported file format is Barracuda. 
200908 11:50:17 [Note] InnoDB: Log scan progressed past the checkpoint lsn 1599016860192 
200908 11:50:17 [Note] InnoDB: Database was not shutdown normally! 
200908 11:50:17 [Note] InnoDB: Starting crash recovery. 
200908 11:50:17 [Note] InnoDB: Reading tablespace information from the .ibd files... 
200908 11:50:17 [Note] InnoDB: Restoring possible half-written data pages 
200908 11:50:17 [Note] InnoDB: from the doublewrite buffer... 
InnoDB: Doing recovery: scanned up to log sequence number 1599016937136 
InnoDB: 1 transaction(s) which must be rolled back or cleaned up 
InnoDB: in total 1455 row operations to undo 
InnoDB: Trx id counter is 349966336 
200908 11:50:18 [Note] InnoDB: Starting an apply batch of log records to the database... 
InnoDB: Progress in percent: 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 2020-09-08 11:50:19 7fca6f7fc700  InnoDB: Assertion failure in thread 140507430766336 in file btr0cur.cc line 288 
InnoDB: Failing assertion: btr_page_get_next(get_block->frame, mtr) == page_get_page_no(page) 
InnoDB: We intentionally generate a memory trap. 
InnoDB: Submit a detailed bug report to http://bugs.mysql.com. 
InnoDB: If you get repeated assertion failures or crashes, even 
InnoDB: immediately after the mysqld startup, there may be 
InnoDB: corruption in the InnoDB tablespace. Please refer to 
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html 
InnoDB: about forcing recovery. 
200908 11:50:19 [ERROR] mysqld got signal 6 ; 
This could be because you hit a bug. It is also possible that this binary 
or one of the libraries it was linked against is corrupt, improperly built, 
or misconfigured. This error can also be caused by malfunctioning hardware. 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs 
We will try our best to scrape up some info that will hopefully help 
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail. 
Server version: 10.0.13-MariaDB-1~precise-wsrep-log 
key_buffer_size=134217728 
read_buffer_size=2097152 
max_used_connections=0 
max_threads=302 
thread_count=0 
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1992930 K  bytes of memory 
Hope that's ok; if not, decrease some variables in the equation. 
Thread pointer: 0x0x0 
Attempting backtrace. You can use the following information to find out 
where mysqld died. If you see no messages after this, something went 
terribly wrong... 
stack_bottom = 0x0 thread_stack 0x48000 
addr2line: 'mysqld': No such file 
mysqld(my_print_stacktrace+0x2b)[0xb933fb] 
mysqld(handle_fatal_signal+0x398)[0x743538] 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fcb0ba5a330] 
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fcb0a012c37] 
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fcb0a016028] 
mysqld[0x99c501] 
mysqld[0x99d34b] 
mysqld[0x9aa186] 
mysqld[0x8b6c71] 
mysqld[0x8be17d] 
mysqld[0x9bf85b] 
mysqld[0xa07976] 
mysqld[0x95c285] 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7fcb0ba52184] 
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fcb0a0d9ffd] 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains 
information that should help you find out what is causing the crash.

The workflow to fix:

Backup the original data files at first
cp /data/mysql /data/mysql.bak
Set innodb_force_recovery level from 1 to 6 in /etc/mysql/my.cnf then trying to auto fix data corruption by restart
(Beware if we set level > 3, tables will no longer be able to write and need to prepare dump and restore data)

If innodb_force_recovery has set to 6 and still cannot recovery, trying to recover tables by idb recovery script
This script is running ALTER TABLE DISCARD TABLESPACE; and ALTER TABLE IMPORT TABLESPACE;

mysql_fix.sh

#!/bin/bash
PATH_FROM=/data/mysql.bak20200908
PATH_TO=/data/mysql

for data in $(find $PATH_FROM -name '*.ibd' | sort -n -t/)
do
COMPONENT=$(echo $data | cut -d'/' -f 4)
IBD_FILENAME=$(echo $data | cut -d'/' -f 5)
TABLE_NAME=$(echo $IBD_FILENAME | cut -d'.' -f 1)

echo -e "$data \c"
#echo $COMPONENT $IBD_FILENAME $TABLE_NAME
if [ ! -f ${PATH_TO}/${COMPONENT}/$IBD_FILENAME ]; then
    echo "WARNING: ibd file not found on target mysql"
    continue
fi
if [[ $1 == "--dry-run" ]]; then
    echo "...ok"
    continue
fi

mysql -uroot -p -e "SET foreign_key_checks = 0; use $COMPONENT; ALTER TABLE $TABLE_NAME discard tablespace;"
cp ${PATH_FROM}/${COMPONENT}/${IBD_FILENAME} ${PATH_TO}/${COMPONENT}/${IBD_FILENAME}
chown mysql:mysql ${PATH_TO}/${COMPONENT}/${IBD_FILENAME}
mysql -uroot -p -e "use $COMPONENT; ALTER TABLE $TABLE_NAME import tablespace; SET foreign_key_checks = 1;"

echo "...done"
done

It used to recover data from .ibd file

Make sure we have backup the corruption data to /data/mysql.bak
cleanup /data/mysql
Set the PATH_FROM=/data/mysql.bak and PATH_TO=/data/mysql in mysql_fix.sh
Initialize tables
run ./mysql_fix.sh --dry-run to check if the path is correct, if there are some table
run ./mysql_fix.sh and see the output, check if there are waring message

use mysqldump to check all data point is normal
mysqldump -u root -p --all-databases > /data/backup.sql
If there still have corrupted data, will see some error messages.
mysqldump: Couldn't execute 'SELECT /*!40001 SQL_NO_CACHE */ * FROM classlists': Lost connection to MySQL server during query (2013)
Try to fix by:
mysql > CHECK TABLE classlists;
mysql > OPTIMIZE TABLE classlists;
After recovered from .ibd file, it's better to do mysqldump once to check if there is any data corrupt. If it happens, reset db and restore from the dump files

Charlie27's Blog

some note of becoming fullstack engineer

[MySQL] Fix mysql data corruption workflow