On 5/8/2019 10:14, Michelle Sullivan wrote:> Paul Mather wrote: >> On May 8, 2019, at 9:59 AM, Michelle Sullivan <michelle at sorbs.net> >> wrote: >> >>>> Did you have regular pool scrubs enabled?? It would have picked up >>>> silent data corruption like this.? It does for me. >>> Yes, every month (once a month because, (1) the data doesn't change >>> much (new data is added, old it not touched), and (2) because to >>> complete it took 2 weeks.) >> >> >> Do you also run sysutils/smartmontools to monitor S.M.A.R.T. >> attributes?? Although imperfect, it can sometimes signal trouble >> brewing with a drive (e.g., increasing Reallocated_Sector_Ct and >> Current_Pending_Sector counts) that can lead to proactive remediation >> before catastrophe strikes. > not Automatically >> >> Unless you have been gathering periodic drive metrics, you have no >> way of knowing whether these hundreds of bad sectors have happened >> suddenly or slowly over a period of time. > no, it something i have thought about but been unable to spend the > time on. >There are two issues here that would concern me greatly and IMHO you should address. I have a system here with about the same amount of net storage on it as you did.? It runs scrubs regularly; none of them take more than 8 hours on *any* of the pools.? The SSD-based pool is of course *much* faster but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it kicks off automatically at 2:00 AM when the time comes but is complete before noon.? I run them on 14 day intervals. If you have pool(s) that are taking *two weeks* to run a scrub IMHO either something is badly wrong or you need to rethink organization of the pool structure -- that is, IMHO you likely either have a severe performance problem with one or more members or an architectural problem you *really* need to determine and fix.? If a scrub takes two weeks *then a resilver could conceivably take that long as well* and that's *extremely* bad as the window for getting screwed is at its worst when a resilver is being run. Second, smartmontools/smartd isn't the be-all, end-all but it *does* sometimes catch incipient problems with specific units before they turn into all-on death and IMHO in any installation of any material size where one cares about the data (as opposed to "if it fails just restore it from backup") it should be running.? It's very easy to set up and there are no real downsides to using it.? I have one disk that I rotate in and out that was bought as a "refurb" and has 70 permanent relocated sectors on it.? It has never grown another one since I acquired it, but every time it goes in the machine within minutes I get an alert on that.? If I was to ever get *71*, or a *different* drive grew a new one said drive would get replaced *instantly*.? Over the years it has flagged two disks before they "hard failed" and both were immediately taken out of service, replaced and then destroyed and thrown away.? Maybe that's me being paranoid but IMHO it's the correct approach to such notifications. BTW that tool will *also* tell you if something else software-wise is going on that you *might* think is drive-related.? For example recently here on the list I ran into a really oddball thing happening with SAS expanders that showed up with 12-STABLE and was *not* present in the same box with 11.1.? Smartmontools confirmed that while the driver was reporting errors from the disks *the disks themselves were not in fact taking errors.*? Had I not had that information I might well have traveled down a road that led to a catastrophic pool failure by attempting to replace disks that weren't actually bad.? The SAS expander wound up being taken out of service and replaced with an HBA that has more ports -- the issues disappeared. Finally, while you *think* you only have a metadata problem I'm with the other people here in expressing disbelief that the damage is limited to that.? There is enough redundancy in the metadata on ZFS that if *all* copies are destroyed or inconsistent to the degree that they're unusable it's extremely likely that if you do get some sort of "disaster recovery" tool working you're going to find out that what you thought was a metadata problem is really a "you're hosed; the data is also gone" sort of problem. -- Karl Denninger karl at denninger.net <mailto:karl at denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4897 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20190508/eb3a3262/attachment.bin>
On Wed, May 8, 2019 at 9:31 AM Karl Denninger <karl at denninger.net> wrote:> I have a system here with about the same amount of net storage on it as > you did. It runs scrubs regularly; none of them take more than 8 hours > on *any* of the pools. The SSD-based pool is of course *much* faster > but even the many-way RaidZ2 on spinning rust is an ~8 hour deal; it > kicks off automatically at 2:00 AM when the time comes but is complete > before noon. I run them on 14 day intervals. >Damn, I wish our scrubs took 8 hours. :) Storage pool 1: 90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB SATA). 45 hours to scrub. Storage pool 2: 90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB SATA). 33 hours to scrub. Storage pool 3: 24 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB SATA). 134 hours to scrub. Storage pool 4: 24 drives in 6-disk raidz2 vdevs (mix of 1 TB, 2 TB, 4 TB SATA). Dedupe enabled. 256 hours to scrub. Storage pool 5: 90 drives in 6-disk raidz2 vdevs (mix of 2 TB and 4 TB SATA). Dedupe enabled. Takes about 6 weeks to resilver a drive, and it's constantly resilvering drives these days as it's the oldest pool, and all the drives are dying. :D Pools 1, 3, and 4 are in DC1. Pools 2 and 5 are in DC2 across town. Pool 1 sends snapshots to pool 2. Pools 3 and 4 send snapshots to pool 5. These pools are highly fragmented. :)> If you have pool(s) that are taking *two weeks* to run a scrub IMHO > either something is badly wrong or you need to rethink organization of > the pool structure -- that is, IMHO you likely either have a severe > performance problem with one or more members or an architectural problem > you *really* need to determine and fix. If a scrub takes two weeks > *then a resilver could conceivably take that long as well* and that's > *extremely* bad as the window for getting screwed is at its worst when a > resilver is being run. >Thankfully, ours are strictly storage for backups of other systems, so as long as the nightly backups complete successfully before 6 am, we're not worried about performance. :) And we do have plans to replace pools 2 and 5 to remove dedupe from the equation. There's not a lot we can do about the fragmentation issue, as these servers all run rsync backups from 200-odd other servers, and remove the oldest snapshot every night. So, while a 2-week scrub may be horrible, it all depends on the use-case. If these were direct storage systems for in-production servers, then I'd be worried. But as redundant backup systems (3 copies of everything, in 3 separate locations around the city), I'm not too worried. Yet. :D -- Freddie Cash fjwcash at gmail.com