Hi all ! I have a serious problem, with a server, and i''m hoping that some one could help me how to understand what''s wrong. So basically i have a server with a pool of 6 disks, and after a zpool scrub i go the message : errors: Permanent errors have been detected in the following files: <metadata>:<0x0> <metadata>:<0x15> The version of the opensolaris is 5.11 snv_101b (yes, i now, quite old). This server has been up and running for more than 4 months, with weekly zpool scrubs, and now i got this message. Here are some extra details about the system: 1 - i can still access the data in the pool , but i don''t know if i can access all the data and/or if all the data is not corrupted 2 - nothing was changed in the hardware 3 - all the disks are ST31000340NS-SN06 , Seagate 1TB 7.200 rpm "enterprise class" , firmware SN06 4 - all the disks are connected to a LSI Logic SAS1068E connected to a JBOD chassis (Supermicro) 5 - the server is a SUN X2200 Dual-Core 6 - using the lsiutil, and querying the Display phy counters i see : Expander (Handle 0009) Phy 21: Link Up Invalid DWord Count 1,171 Running Disparity Error Count 937 Loss of DWord Synch Count 0 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 22: Link Up Invalid DWord Count 2,110,435 Running Disparity Error Count 855,781 Loss of DWord Synch Count 3 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 23: Link Up Invalid DWord Count 740,029 Running Disparity Error Count 716,196 Loss of DWord Synch Count 1 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 24: Link Up Invalid DWord Count 705,870 Running Disparity Error Count 692,280 Loss of DWord Synch Count 1 Phy Reset Problem Count 0 Expander (Handle 0009) Phy 25: Link Up Invalid DWord Count 698,935 Running Disparity Error Count 667,148 Loss of DWord Synch Count 1 Phy Reset Problem Count 0 7 - the /var/log/messages show o SCSI transport failed: reason ''reset'': retrying command o SCSI transport failed: reason ''reset'': giving up Maybe i''m wrong...but it seems like the disks started to report errors? The reason behind the fact that i don''t know if all the data is accessible is because the pool size is quite big, as seen : NAME SIZE USED AVAIL CAP HEALTH ALTROOT POOL01 2.72T 1.71T 1.01T 62% ONLINE - It might be the fact that i have been suffering from this problem from some time, but the lsi hba had never reported any error, and i assumed that ZFS was build to deal with this kind of problems : the silent data corruption . I''m would to understand if the problems started due to a high load in the LSI hba that lead to timeouts and therefore disk errors, of if the the LSI hba opensolaris driver was overloaded ,resulting in disk errors and LSI hba errors... Any clue to see what lead to what? Even more importand did i lost data, or zfs is reporting errors to disk drivers errors, but the data already existing is okay, and the new data may be affected? Is the zpool metadata recoverable? My biggest concern, is to know if my pool is corrupted, and if so how can i fix the zpool,metadata, problem. Thanks for all your time, Bruno root at server01:/# zpool status -vx pool: POOL01 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM POOL01 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t9d0 ONLINE 0 0 0 c5t10d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t11d0 ONLINE 0 0 0 c5t12d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c5t13d0 ONLINE 0 0 0 c5t14d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: <metadata>:<0x0> <metadata>:<0x15> -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.