Ce dernier freezait et ramait avec des montés de load assez phénoménales.
Spam dans les logs :
des erreurs I/O de partoutata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000001
ata1.00: failed command: READ DMA
ata1.00: cmd c8/00:04:00:00:00/00:00:00:00:00/e0 tag 0 dma 2048 in
res 51/04:04:00:00:00/00:00:00:00:00/e0 Emask 0x1 (device error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { ABRT }
ata1.00: configured for UDMA/133
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Les symptômes des pics de load qui monte parfois à 500 avec ksoftirqd qui bourrine pas mal.Message from syslogd@nsxxxxx at Aug 18 11:19:52 ...
kernel:journal commit I/O error
Message from syslogd@nsxxxxx at Aug 18 11:19:52 ...
kernel:journal commit I/O error
Le fs qui passe en read-only.
sur le Raid, il n'y a plus que le disque sdb
Le disque sda est parti en vacances (normal on est au mois d'Août).
Et pour couronner le tout, le disque sdb a des problèmes.root@rescue:~# mdadm --misc --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Fri Jan 18 17:56:03 2013
Raid Level : raid1
Array Size : 1932012480 (1842.51 GiB 1978.38 GB)
Used Dev Size : 1932012480 (1842.51 GiB 1978.38 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Tue Aug 18 13:29:38 2015
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : 4d0c4433:e3a58844:a4d2adc2:26fd5302 (local to host rescue.ovh.net)
Events : 0.21637216
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2
Smartctl est un programme qui permet de lancer des tests SMART et lire les remontés des disques dur.
root@rescue:~# smartctl -l selftest /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.32-xxxx-std-ipv6-64-rescue] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Error SMART Values Read failed: scsi error aborted command
Smartctl: SMART Read Values failed.
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 25776 3905985376
# 2 Short offline Completed without error 00% 3175 -
# 3 Short offline Completed without error 00% 3164 -
# 4 Short offline Completed without error 00% 3164 -
# 5 Short offline Completed without error 00% 36 -
# 6 Short offline Completed without error 00% 32 -
# 7 Short offline Completed without error 00% 32 -
# 8 Short offline Completed without error 00% 31 -
# 9 Short offline Completed without error 00% 20 -
#10 Short offline Completed without error 00% 19 -
#11 Short offline Completed without error 00% 17 -
#12 Short offline Completed without error 00% 16 -
#13 Short offline Completed without error 00% 11 -
#14 Short offline Completed without error 00% 0 -
#15 Short offline Completed without error 00% 0 -
~~
On remplace le disque sda
et recopie les partitions de sdb vers sda
et on remet la partition dans le RAID qui va se synchroniser.
(si GPT utiliser sgdisk)sfdisk -d /dev/sdb | sfdisk /dev/sda
mdadm /dev/md1 --manage --add /dev/sda1
même chose pour md2 :
mdadm /dev/md2 --manage --add /dev/sda2
cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid1 sda1[2] sdb1[1]
20971456 blocks [2/1] [_U]
[==>..................] recovery = 13.7% (2879872/20971456) finish=1.8min speed=159992K/sec
md2 : active raid1 sdb2[1]
1932012480 blocks [2/1] [_U]
unused devices: <none>
root@rescue:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid1 sda1[2] sdb1[1]
20971456 blocks [2/1] [_U]
[===>.................] recovery = 15.5% (3256320/20971456) finish=1.8min speed=162816K/sec
md2 : active raid1 sdb2[1]
1932012480 blocks [2/1] [_U]
Il existe beaucoup de tutorials sur le RAID mda
Pour les secteurs défectueux, cette page est bien faite : http://www.vincentliefooghe.net/content ... -un-disque