起因

收到报警,有节点docker卡死

[Mon Jun 25 19:41:36 2018] INFO: task kworker/u32:0:7799 blocked for more than 120 seconds.
[Mon Jun 25 19:41:36 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Jun 25 19:41:36 2018] kworker/u32:0   D ffff880342b7fb48     0  7799      2 0x00000080
[Mon Jun 25 19:41:36 2018] Workqueue: writeback bdi_writeback_workfn (flush-252:92)
[Mon Jun 25 19:41:36 2018]  ffff880342b7f828 0000000000000046 ffff880e4f26bec0 ffff880342b7ffd8
[Mon Jun 25 19:41:36 2018]  ffff880342b7ffd8 ffff880342b7ffd8 ffff880e4f26bec0 ffff880e4f26bec0
[Mon Jun 25 19:41:36 2018]  ffff8801f8da6ad0 fffffffeffffffff ffff8801f8da6ad8 ffff880342b7fb48
[Mon Jun 25 19:41:36 2018] Call Trace:
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168c7f9>] schedule+0x29/0x70
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168e1a5>] rwsem_down_read_failed+0xf5/0x170
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07d64e7>] ? xfs_map_blocks+0x87/0x210 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffff81327618>] call_rwsem_down_read_failed+0x18/0x30
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168b980>] down_read+0x20/0x30
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07ef2ac>] xfs_ilock+0xdc/0x120 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07d64e7>] xfs_map_blocks+0x87/0x210 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07d715b>] xfs_do_writepage+0x17b/0x550 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffff8118c5d1>] write_cache_pages+0x251/0x4d0
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07d6fe0>] ? xfs_aops_discard_page+0x150/0x150 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07d6df5>] xfs_vm_writepages+0xc5/0xe0 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffff8118d94e>] do_writepages+0x1e/0x40
[Mon Jun 25 19:41:36 2018]  [<ffffffff81229010>] __writeback_single_inode+0x40/0x210
[Mon Jun 25 19:41:36 2018]  [<ffffffff81229cfe>] writeback_sb_inodes+0x25e/0x420
[Mon Jun 25 19:41:36 2018]  [<ffffffff81229f5f>] __writeback_inodes_wb+0x9f/0xd0
[Mon Jun 25 19:41:36 2018]  [<ffffffff8122a7a3>] wb_writeback+0x263/0x2f0
[Mon Jun 25 19:41:36 2018]  [<ffffffff810abc96>] ? set_worker_desc+0x86/0xb0
[Mon Jun 25 19:41:36 2018]  [<ffffffff8122c5e5>] bdi_writeback_workfn+0x115/0x460
[Mon Jun 25 19:41:36 2018]  [<ffffffff810c52b4>] ? try_to_wake_up+0x174/0x340
[Mon Jun 25 19:41:36 2018]  [<ffffffff810a845b>] process_one_work+0x17b/0x470
[Mon Jun 25 19:41:36 2018]  [<ffffffff810a9296>] worker_thread+0x126/0x410
[Mon Jun 25 19:41:36 2018]  [<ffffffff810a9170>] ? rescuer_thread+0x460/0x460
[Mon Jun 25 19:41:36 2018]  [<ffffffff810b0a4f>] kthread+0xcf/0xe0
[Mon Jun 25 19:41:36 2018]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[Mon Jun 25 19:41:36 2018]  [<ffffffff81697758>] ret_from_fork+0x58/0x90
[Mon Jun 25 19:41:36 2018]  [<ffffffff810b0980>] ? kthread_create_on_node+0x140/0x140
[Mon Jun 25 19:41:36 2018] INFO: task tar:499 blocked for more than 120 seconds.
[Mon Jun 25 19:41:36 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Jun 25 19:41:36 2018] tar             D ffffffff8168a850     0   499    460 0x10000080
[Mon Jun 25 19:41:36 2018]  ffff880bca2d3170 0000000000000086 ffff880d32358000 ffff880bca2d3fd8
[Mon Jun 25 19:41:36 2018]  ffff880bca2d3fd8 ffff880bca2d3fd8 ffff880d32358000 ffff880fff2d6c40
[Mon Jun 25 19:41:36 2018]  0000000000000000 7fffffffffffffff ffff88103ff56ee8 ffffffff8168a850
[Mon Jun 25 19:41:36 2018] Call Trace:
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168c7f9>] schedule+0x29/0x70
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0
[Mon Jun 25 19:41:36 2018]  [<ffffffff81060c1f>] ? kvm_clock_get_cycles+0x1f/0x30
[Mon Jun 25 19:41:36 2018]  [<ffffffff810eb0dc>] ? ktime_get_ts64+0x4c/0xf0
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168a850>] ? bit_wait+0x50/0x50
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168bd9e>] io_schedule_timeout+0xae/0x130
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168be38>] io_schedule+0x18/0x20
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168a861>] bit_wait_io+0x11/0x50
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168a385>] __wait_on_bit+0x65/0x90
[Mon Jun 25 19:41:36 2018]  [<ffffffff81180221>] wait_on_page_bit+0x81/0xa0
[Mon Jun 25 19:41:36 2018]  [<ffffffff810b1be0>] ? wake_bit_function+0x40/0x40
[Mon Jun 25 19:41:36 2018]  [<ffffffff81195832>] shrink_page_list+0x6e2/0xb00
[Mon Jun 25 19:41:36 2018]  [<ffffffff81194b63>] ? isolate_lru_pages.isra.43+0xd3/0x190
[Mon Jun 25 19:41:36 2018]  [<ffffffff811962da>] shrink_inactive_list+0x1fa/0x630
[Mon Jun 25 19:41:36 2018]  [<ffffffff81196e75>] shrink_lruvec+0x385/0x770
[Mon Jun 25 19:41:36 2018]  [<ffffffff811972d6>] shrink_zone+0x76/0x1a0
[Mon Jun 25 19:41:36 2018]  [<ffffffff811977e0>] do_try_to_free_pages+0xf0/0x4e0
[Mon Jun 25 19:41:36 2018]  [<ffffffff81197e2a>] try_to_free_mem_cgroup_pages+0xda/0x170
[Mon Jun 25 19:41:36 2018]  [<ffffffff811f1e1e>] mem_cgroup_reclaim+0x4e/0x120
[Mon Jun 25 19:41:36 2018]  [<ffffffff811f230c>] __mem_cgroup_try_charge+0x41c/0x670
[Mon Jun 25 19:41:36 2018]  [<ffffffff811f4736>] __memcg_kmem_newpage_charge+0xf6/0x180
[Mon Jun 25 19:41:36 2018]  [<ffffffff8118b47d>] __alloc_pages_nodemask+0x22d/0x420
[Mon Jun 25 19:41:36 2018]  [<ffffffff811cf9ca>] ? alloc_pages_current+0xaa/0x170
[Mon Jun 25 19:41:36 2018]  [<ffffffff811cf9ca>] alloc_pages_current+0xaa/0x170
[Mon Jun 25 19:41:36 2018]  [<ffffffff811da64c>] new_slab+0x30c/0x320
[Mon Jun 25 19:41:36 2018]  [<ffffffff811dbecc>] ___slab_alloc+0x3ac/0x4f0
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07fdb27>] ? kmem_zone_alloc+0x97/0x130 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07fdb27>] ? kmem_zone_alloc+0x97/0x130 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07fdb27>] ? kmem_zone_alloc+0x97/0x130 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffff8168414c>] __slab_alloc+0x40/0x5c
[Mon Jun 25 19:41:36 2018]  [<ffffffff811de993>] kmem_cache_alloc+0x193/0x1e0
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07fdb27>] ? kmem_zone_alloc+0x97/0x130 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07fdb27>] kmem_zone_alloc+0x97/0x130 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07fca55>] xfs_trans_add_item+0x25/0x60 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa080ea71>] xfs_trans_ijoin+0x31/0x60 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07ecd20>] xfs_vn_update_time+0xe0/0x190 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffff8121a1f5>] update_time+0x25/0xd0
[Mon Jun 25 19:41:36 2018]  [<ffffffff810c8395>] ? sched_clock_cpu+0x85/0xc0
[Mon Jun 25 19:41:36 2018]  [<ffffffff8121a4a0>] file_update_time+0xa0/0xf0
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07e3875>] xfs_file_aio_write_checks+0x185/0x1f0 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07e3c63>] xfs_file_buffered_aio_write+0x93/0x260 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffff8108dc82>] ? current_fs_time+0x12/0x60
[Mon Jun 25 19:41:36 2018]  [<ffffffff8121a331>] ? touch_atime+0x91/0x160
[Mon Jun 25 19:41:36 2018]  [<ffffffff81207dae>] ? pipe_read+0x38e/0x4c0
[Mon Jun 25 19:41:36 2018]  [<ffffffffa07e3fbd>] xfs_file_aio_write+0x18d/0x1a0 [xfs]
[Mon Jun 25 19:41:36 2018]  [<ffffffff811fe18d>] do_sync_write+0x8d/0xd0
[Mon Jun 25 19:41:36 2018]  [<ffffffff811fe9fd>] vfs_write+0xbd/0x1e0
[Mon Jun 25 19:41:36 2018]  [<ffffffff8111f244>] ? __audit_syscall_entry+0xb4/0x110
[Mon Jun 25 19:41:36 2018]  [<ffffffff811ff51f>] SyS_write+0x7f/0xe0
[Mon Jun 25 19:41:36 2018]  [<ffffffff81697a12>] tracesys+0xdd/0xe2
[Mon Jun 25 19:43:36 2018] INFO: task dockerd:6540 blocked for more than 120 seconds.
[Mon Jun 25 19:43:36 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Jun 25 19:43:36 2018] dockerd         D ffffffff81230510     0  6540      1 0x10000080
[Mon Jun 25 19:43:36 2018]  ffff880348f23d50 0000000000000086 ffff88039e7f2f10 ffff880348f23fd8
[Mon Jun 25 19:43:36 2018]  ffff880348f23fd8 ffff880348f23fd8 ffff88039e7f2f10 ffff880348f23e80
[Mon Jun 25 19:43:36 2018]  ffff880348f23e88 7fffffffffffffff ffff88039e7f2f10 ffffffff81230510
[Mon Jun 25 19:43:36 2018] Call Trace:
[Mon Jun 25 19:43:36 2018]  [<ffffffff81230510>] ? generic_write_sync+0x60/0x60
[Mon Jun 25 19:43:36 2018]  [<ffffffff8168c7f9>] schedule+0x29/0x70
[Mon Jun 25 19:43:36 2018]  [<ffffffff8168a239>] schedule_timeout+0x239/0x2c0
[Mon Jun 25 19:43:36 2018]  [<ffffffff810a6270>] ? __queue_work+0x140/0x320
[Mon Jun 25 19:43:36 2018]  [<ffffffff811802fe>] ? __filemap_fdatawait_range+0xbe/0x190
[Mon Jun 25 19:43:36 2018]  [<ffffffff810a651a>] ? __queue_delayed_work+0xaa/0x1a0
[Mon Jun 25 19:43:36 2018]  [<ffffffff810a68ee>] ? try_to_grab_pending+0x11e/0x160
[Mon Jun 25 19:43:36 2018]  [<ffffffff81230510>] ? generic_write_sync+0x60/0x60
[Mon Jun 25 19:43:36 2018]  [<ffffffff8168cbd6>] wait_for_completion+0x116/0x170
[Mon Jun 25 19:43:36 2018]  [<ffffffff810c54e0>] ? wake_up_state+0x20/0x20
[Mon Jun 25 19:43:36 2018]  [<ffffffff812294a7>] sync_inodes_sb+0xb7/0x1e0
[Mon Jun 25 19:43:36 2018]  [<ffffffff81230510>] ? generic_write_sync+0x60/0x60
[Mon Jun 25 19:43:36 2018]  [<ffffffff81230529>] sync_inodes_one_sb+0x19/0x20
[Mon Jun 25 19:43:36 2018]  [<ffffffff81202582>] iterate_supers+0xb2/0x110
[Mon Jun 25 19:43:36 2018]  [<ffffffff812307e4>] sys_sync+0x44/0xb0
[Mon Jun 25 19:43:36 2018]  [<ffffffff81697a12>] tracesys+0xdd/0xe2

原因

默认情况下, Linux会最多使用40%的可用内存作为文件系统缓存。当超过这个阈值后,文件系统会把将缓存中的内存全部写入磁盘, 导致后续的IO请求都是同步的。
将缓存写入磁盘时,有一个默认120秒的超时时间。 出现上面的问题的原因是IO子系统的处理速度不够快,不能在120秒将缓存中的数据全部写入磁盘。
IO系统响应缓慢,导致越来越多的请求堆积,最终系统内存全部被占用,导致系统失去响应。 

从日志来看,发现tar,排查ps aux | grep tar发现 tar 进程变成了 D 进程,说明当时系统 IO 存在问题,等待时间过长.

后续改进

# vim /etc/sysctrl.conf
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10