MySQL主从复制错误解决之slave_pending_jobs_size_max

/ Database / 没有评论 / 6214浏览

错误日志

2018-03-29T15:35:48.608513Z 8473251 [Note] Multi-threaded slave: Coordinator has waited 31621 times hitting slave_pending_jobs_size_max; current event size = 7990.
2018-03-29T15:35:48.618715Z 8473251 [Note] Multi-threaded slave: Coordinator has waited 31631 times hitting slave_pending_jobs_size_max; current event size = 8002.
2018-03-29T15:35:48.625974Z 8473251 [Note] Multi-threaded slave: Coordinator has waited 31641 times hitting slave_pending_jobs_size_max; current event size = 8078.
2018-03-29T15:35:48.632782Z 8473251 [Note] Multi-threaded slave: Coordinator has waited 31651 times hitting slave_pending_jobs_size_max; current event size = 7990.
2018-03-29T15:35:49.921167Z 8473252 [ERROR] Slave SQL for channel '': Worker 1 failed executing transaction 'ANONYMOUS' at master log mysql-bin.000096, end_log_pos 912768886; Could not execute Update_rows event on table zyd1.zyd1_action_log; Can't find record in 'zyd1_action_log', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000096, end_log_pos 912768886, Error_code: 1032
2018-03-29T15:35:55.416105Z 8473251 [ERROR] Error writing relay log configuration.
2018-03-29T15:35:56.941455Z 8473251 [Warning] Slave SQL for channel '': ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2018-03-29T15:35:56.941479Z 8473251 [Note] Error reading relay log event for channel '': slave SQL thread was killed

机器翻译

2018-03-29T15:35:48.608513Z 8473251 [注]多线程从机:协调器等待31621次,达到slave_pending_jobs_size_max;当前事件大小= 7990。
2018-03-29T15:35:48.618715Z 8473251 [注]多线程从机:协调器等待31631次,达到slave_pending_jobs_size_max;当前事件大小= 8002。
2018-03-29T15:35:48.625974Z 8473251 [注]多线程从机:协调器等待命中slave_pending_jobs_size_max 31641次;当前事件大小= 8078。
2018-03-29T15:35:48.632782Z 8473251 [注]多线程从机:协调器等待命中slave_pending_jobs_size_max 31651次;当前事件大小= 7990。
2018-03-29T15:35:49.921167Z 8473252 [错误]通道“从属SQL”:工人1在主日志上执行事务'ANONYMOUS'失败mysql-bin.000096,end_log_pos 912768886;无法在表zyd1.zyd1_action_log上执行Update_rows事件; 'zyd1_action_log'中找不到记录,Error_code:1032;处理程序错误HA_ERR_KEY_NOT_FOUND;事件的主日志mysql-bin.000096,end_log_pos 912768886,Error_code:1032
2018-03-29T15:35:55.416105Z 8473251 [错误]写入中继日志配置时出错。
2018-03-29T15:35:56.941455Z 8473251 [警告]通道“从属SQL”:...从属协调器和工作线程停止,可能使数据处于不一致的状态。尽管对数据或信息表或DDL查询使用非事务性存储可能会导致问题,但重新启动应自动恢复一致性。在这种情况下,您必须检查您的数据(有关详细信息,请参阅文档)。 Error_code:1756
2018-03-29T15:35:56.941479Z 8473251 [注意]错误读取通道''的中继日志事件:从属SQL线程被终止

错误分析

协调器等待31621次,达到slave_pending_jobs_size_max 可能是达到这个参数峰值,

默认值为16777216(16MB),但是slave接收到的slave_pending_jobs_size_max可能超过了,我得环境是,大批量导入,导致主从挂掉

这个参数意味着什么

在多线程复制时,在队列中Pending的事件所占用的最大内存,默认为16M,如果内存富余,或者延迟较大时,可以适当调大;

注意这个值要比主库的max_allowed_packet大

官方解释

https://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html

查一下slave_pending_jobs_size_max;

mysql> show global variables like '%slave_pending_jobs_size_max%';  
+-----------------------------+----------+  
| Variable_name               | Value    |  
+-----------------------------+----------+  
| slave_pending_jobs_size_max | 16777216 |  
+-----------------------------+----------+  
1 row in set (0.00 sec)  

参数调大

# 主库锁库

stop slave;  
set global slave_pending_jobs_size_max=20000000;  
start slave;  

再次检验

mysql> show global variables like '%slave_pending_jobs_size_max%';
+-----------------------------+----------+
| Variable_name               | Value    |
+-----------------------------+----------+
| slave_pending_jobs_size_max | 19999744 |
+-----------------------------+----------+
1 row in set (0.00 sec)

End