ext3 error

Roberto Acuña racuna.mahatta en gmail.com
Mar Jul 4 13:33:13 CLT 2006


desde hace tiempo que estoy teniendo este problema... y es que a veces 
derepente la partición del raíz (/dev/hda2) se remonta (remount) en sólo 
escritura... y lo tengo que hacerle un fsck al reinicio.... es muy raro 
porque en otros pcs con linux nunca me había pasado, y hasta ahora ni rtfm ni 
stfw me han servido... a lo único que llego es a fsck, pero la idea es que no 
me vuelva a pasar lo mismo.... aquí adjunto los logs...

antes pasaba bien seguido, pero ahora con el 2.6.17 pensé que el problema se 
había ido pero a las casi dos semanas de uso habitual el problema volvió.

dmesg log:
(al inicio lo que dice del disco duro)
Probing IDE interface ide0...
hda: HITACHI_DK23DA-20B, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...

...
(después de que manda el error)

EXT3-fs error (device hda2): ext3_new_block: Allocating block in system zone - 
blocks from 3, length 1
Aborting journal on device hda2.
ext3_abort called.
EXT3-fs error (device hda2): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device hda2): ext3_free_blocks: Freeing blocks in system zones 
- Block = 3, count = 1
EXT3-fs error (device hda2) in ext3_free_blocks_sb: Journal has aborted
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data

Después hice un smartctl -t long /dev/hda, que me dijo que el test terminaría 
en 19 minutos, luego de los cuales hice un smartctl -a /dev/hda
y esto es lo que salió:

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: HITACHI_DK23DA-20B
Serial Number: 13K175
Firmware Version: 00J2A0B6
User Capacity: 20,003,880,960 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 5
ATA Standard is: ATA/ATAPI-5 T13 1321D revision 3
Local Time is: Tue Jul 4 12:24:04 2006 CLT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (1090) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 19) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000d 100 100 050 Pre-fail Offline - 8589934595
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 4000
3 Spin_Up_Time 0x0007 100 100 050 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1697
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 3
7 Seek_Error_Rate 0x000f 100 100 050 Pre-fail Always - 342
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 1266
9 Power_On_Minutes 0x0032 094 094 000 Old_age Always - 3110h+44m
10 Spin_Retry_Count 0x0013 100 100 050 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1120
191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 44
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 38
193 Load_Cycle_Count 0x0032 064 064 000 Old_age Always - 219215/219176
194 Temperature_Celsius 0x0022 054 034 000 Old_age Always - 63 (Lifetime 
Min/Max 4/73)
195 Hardware_ECC_Recovered 0x001a 100 035 000 Old_age Always - 88
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 3
197 Current_Pending_Sector 0x0032 100 099 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
223 Load_Retry_Count 0x0012 100 100 000 Old_age Always - 0
230 Head_Amplitude 0x0032 096 096 000 Old_age Always - 146496
250 Read_Error_Retry_Rate 0x000a 100 040 000 Old_age Always - 571

SMART Error Log Version: 1
ATA Error Count: 41 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mmS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 41 occurred at disk power-on lifetime: 2668 hours (111 days + 4 hours)
When the command that caused the error occurred, the device was active or 
idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 b1 27 1c e0 Error: UNC at LBA = 0x001c27b1 = 1845169

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 ff 20 a1 27 1c e0 00 00:18:45.660 READ MULTIPLE
c8 ff 20 a1 27 1c e0 00 00:18:43.800 READ DMA
ef 03 45 ff ff 00 e0 00 00:18:43.150 SET FEATURES [Set transfer mode]
ef 03 0c ff ff 00 e0 00 00:18:43.150 SET FEATURES [Set transfer mode]
c6 ff 10 ff ff 00 e0 00 00:18:43.150 SET MULTIPLE MODE

Error 40 occurred at disk power-on lifetime: 2668 hours (111 days + 4 hours)
When the command that caused the error occurred, the device was active or 
idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 b1 27 1c e0 Error: UNC 16 sectors at LBA = 0x001c27b1 = 1845169

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 ff 20 a1 27 1c e0 00 00:18:43.800 READ DMA
ef 03 45 ff ff 00 e0 00 00:18:43.150 SET FEATURES [Set transfer mode]
ef 03 0c ff ff 00 e0 00 00:18:43.150 SET FEATURES [Set transfer mode]
c6 ff 10 ff ff 00 e0 00 00:18:43.150 SET MULTIPLE MODE
10 ff 50 3f 6b b1 e0 00 00:18:43.150 RECALIBRATE [OBS-4]

Error 39 occurred at disk power-on lifetime: 2668 hours (111 days + 4 hours)
When the command that caused the error occurred, the device was active or 
idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 b1 27 1c e0 Error: UNC at LBA = 0x001c27b1 = 1845169

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 ff 20 a1 27 1c e0 00 00:18:41.250 READ MULTIPLE
c8 ff 20 a1 27 1c e0 00 00:18:39.390 READ DMA
ef 03 45 ff ff 00 e0 00 00:18:38.740 SET FEATURES [Set transfer mode]
ef 03 0c ff ff 00 e0 00 00:18:38.740 SET FEATURES [Set transfer mode]
c6 ff 10 ff ff 00 e0 00 00:18:38.740 SET MULTIPLE MODE

Error 38 occurred at disk power-on lifetime: 2668 hours (111 days + 4 hours)
When the command that caused the error occurred, the device was active or 
idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 b1 27 1c e0 Error: UNC 16 sectors at LBA = 0x001c27b1 = 1845169

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 ff 20 a1 27 1c e0 00 00:18:39.390 READ DMA
ef 03 45 ff ff 00 e0 00 00:18:38.740 SET FEATURES [Set transfer mode]
ef 03 0c ff ff 00 e0 00 00:18:38.740 SET FEATURES [Set transfer mode]
c6 ff 10 ff ff 00 e0 00 00:18:38.740 SET MULTIPLE MODE
10 ff 50 3f 6b b1 e0 00 00:18:38.740 RECALIBRATE [OBS-4]

Error 37 occurred at disk power-on lifetime: 2668 hours (111 days + 4 hours)
When the command that caused the error occurred, the device was active or 
idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 10 b1 27 1c e0 Error: UNC at LBA = 0x001c27b1 = 1845169

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 ff 20 a1 27 1c e0 00 00:18:36.860 READ MULTIPLE
c8 ff 20 a1 27 1c e0 00 00:18:35.000 READ DMA
ef 03 45 ff ff 00 e0 00 00:18:34.350 SET FEATURES [Set transfer mode]
ef 03 0c ff ff 00 e0 00 00:18:34.350 SET FEATURES [Set transfer mode]
c6 ff 10 ff ff 00 e0 00 00:18:34.350 SET MULTIPLE MODE

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 3110 -
# 2 Extended offline Completed without error 00% 3050 -

Device does not support Selective Self Tests/Logging


Espero recibir respuestas a mi problema, y si es de hardware para ver si puedo 
hacer uso de la garantía... 

desde ya 

gracias

atte

Racuna



Más información sobre la lista de distribución Linux