记一次服务器宕机后数据库恢复的过程

现象

现象很简单,数据库服务器被宕机,当然是在没有停数据库服务的情况下。

机器重启后,试图重启MySQL服务,无果,查看错误日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
170920  0:30:17  InnoDB: Assertion failure in thread 140107687212800 in file /export/home/pb2/build/sb_0-2629600-1291399482.5/mysql-5.5.10/storage/innobase/include/fut0lst.ic line 83
InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
170920 0:30:17 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=0
max_threads=500
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 406067 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x39)[0x916839]
/usr/local/mysql/bin/mysqld(handle_segfault+0x359)[0x4fc0d9]
/lib64/libpthread.so.0(+0xf4a0)[0x7f6d5ca9f4a0]
/lib64/libc.so.6(gsignal+0x35)[0x7f6d5be4a885]
/lib64/libc.so.6(abort+0x175)[0x7f6d5be4c065]
/usr/local/mysql/bin/mysqld[0x7d5601]
/usr/local/mysql/bin/mysqld[0x7ca012]
/usr/local/mysql/bin/mysqld[0x7ca357]
/usr/local/mysql/bin/mysqld[0x7cce1a]
/usr/local/mysql/bin/mysqld[0x7b89e8]
/usr/local/mysql/bin/mysqld[0x78d97d]
/usr/local/mysql/bin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x6683a8]
/usr/local/mysql/bin/mysqld[0x57ddba]
/usr/local/mysql/bin/mysqld(_Z11plugin_initPiPPci+0xb5d)[0x581cbd]
/usr/local/mysql/bin/mysqld[0x50212c]
/usr/local/mysql/bin/mysqld(_Z11mysqld_mainiPPc+0x3c2)[0x504742]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f6d5be36cdd]
/usr/local/mysql/bin/mysqld[0x4fa3fa]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170920 00:30:17 mysqld_safe mysqld from pid file /usr/local/mysql/data/localhost.localdomain.pid ended
170920 01:04:55 mysqld_safe Starting mysqld daemon with databases from /usr/local/mysql/data
170920 1:04:55 [Warning] Ignoring user change to 'ser=mysql' because the user was set to 'mysql' earlier on the command line

解决过程

刚开始的重点放在了这段日志上:

1
2
3
4
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 406067 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

以为是MySQL的一些参数设置有问题,结合Google结果,对/etc/my.cnf进行了修改,仍无果。问题解决之后想来,因为之前MySQL是运行正常的,因此配置一般不会有问题,当时也是“病急乱投医”了。

1. Forcing InnoDB Recovery

设置恢复模式启动mysql,在 /etc/my.cnf中添加如下配置:

1
2
[mysqld]
innodb_force_recovery = 1

其中后面的值设置为1、如果1不能恢复,再逐步增加为2/3/4等。直到能启动mysql为止!!!

Forcing InnoDB Recovery提供了6个等级的修复模式,需要注意的是值大于3的时候,会对数据文件造成永久的破坏,不可恢复。六个等级的介绍摘抄如下:

  1. SRV_FORCE_IGNORE_CORRUPT
    Lets the server run even if it detects a corrupt page. Tries to make SELECT * FROM tbl_name jump over corrupt index records and pages, which helps in dumping tables.
  2. SRV_FORCE_NO_BACKGROUND
    Prevents the master thread and any purge threads from running. If a crash would occur during the purge operation, this recovery value prevents it.
  3. SRV_FORCE_NO_TRX_UNDO
    Does not run transaction rollbacks after crash recovery.
  4. SRV_FORCE_NO_IBUF_MERGE
    Prevents insert buffer merge operations. If they would cause a crash, does not do them. Does not calculate table statistics. This value can permanently corrupt data files. After using this value, be prepared to drop and recreate all secondary indexes.
  5. SRV_FORCE_NO_UNDO_LOG_SCAN
    Does not look at undo logs when starting the database: InnoDB treats even incomplete transactions as committed. This value can permanently corrupt data files.
  6. SRV_FORCE_NO_LOG_REDO
    Does not do the redo log roll-forward in connection with recovery. This value can permanently corrupt data files. Leaves database pages in an obsolete state, which in turn may introduce more corruption into B-trees and other database structures.

恢复模式下启动MySQL

1
/usr/local/mysql/bin/mysqld_safe -user=mysql&

重启成功后,测试数据库是否可以正常连接:mysql -uroot -p123456

数据备份

恢复模式数据库是只读的,当然和恢复级别相关。
现在需要做的是将数据库数据备份,然后清除之前的错误数据,最后再从备份数据中进行恢复。

1
mysqldump -uroot -p123456 --all-databases  > all_mysql_backup.sql

原数据清理或备份

清理数据前需要先将数据库服务停止。
将数据库的data目录进行备份,相当于恢复到数据库刚安装完成时的状态。

1
2
3
mkdir data-bak
cd data
mv * ../data-bak/

数据恢复

数据库初始化

因为所有的数据都已删除掉,因此需要进行MySQL的初始化。

1
2
cd /usr/local/mysql											
./scripts/mysql_install_db --user=mysql&

备份数据恢复

登录MySQL:

1
mysql -u root -p123456

登录后,在数据库中执行下列语句,即可恢复数据:

1
source /app/all_mysql_backup.sql

恢复后对数据进行检查。

hoxis wechat
一个脱离了高级趣味的程序员,关注回复1024有惊喜~
赞赏一杯咖啡
0%