Frank Van Damme
2010-Dec-08 13:12 UTC
[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Hello list, I''m having trouble with a server holding a lot of data. After a few months of uptime, it is currently rebooting from a lockup (reason unknown so far) but it is taking hours to boot up again. The boot process is stuck at the stage where it says: mounting zfs filesystems (1/5) the machine responds to pings and keystrokes. I can see disk activity; the disk leds blink one after another. The file system layout is: a 40 GB mirror for the syspool, and a raidz volume over 4 2TB disks which I use for taking backups (=the purpose of this machine). I have deduplication enabled on the "backups" pool (which turned out to be pretty slow for file deletes since there are a lot of files on the "backups" pool and I haven''t installed an l2arc yet). The main memory is 6 GB, it''s an HP server running Nexenta core platform (kernel version 134f). I assume sooner or later the machine will boot up, but I''m in a bit of a panic about how to solve this permanently - after all the last thing I want is not being able to restore data one day because it takes days to boot the machine. Does anyone have an idea how much longer it may take and if the problem may have anything to do with dedup? -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author.
Wolfram Tomalla
2010-Dec-08 14:40 UTC
[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Hi Frank, you might face the problem of lots of snapshots of your filesystems. For each snapshot a device is created during import of the pool. This can easily lead to an extend startup time. At my system it took about 15 minutes for 3500 snapshots. 2010/12/8 Frank Van Damme <frank.vandamme at gmail.com>> Hello list, > > I''m having trouble with a server holding a lot of data. After a few > months of uptime, it is currently rebooting from a lockup (reason > unknown so far) but it is taking hours to boot up again. The boot > process is stuck at the stage where it says: > mounting zfs filesystems (1/5) > the machine responds to pings and keystrokes. I can see disk activity; > the disk leds blink one after another. > > The file system layout is: a 40 GB mirror for the syspool, and a raidz > volume over 4 2TB disks which I use for taking backups (=the purpose > of this machine). I have deduplication enabled on the "backups" pool > (which turned out to be pretty slow for file deletes since there are a > lot of files on the "backups" pool and I haven''t installed an l2arc > yet). The main memory is 6 GB, it''s an HP server running Nexenta core > platform (kernel version 134f). > > I assume sooner or later the machine will boot up, but I''m in a bit of > a panic about how to solve this permanently - after all the last thing > I want is not being able to restore data one day because it takes days > to boot the machine. > > Does anyone have an idea how much longer it may take and if the > problem may have anything to do with dedup? > > -- > Frank Van Damme > No part of this copyright message may be reproduced, read or seen, > dead or alive or by any means, including but not limited to telepathy > without the benevolence of the author. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101208/5d2dbfe6/attachment.html>
Fred Liu
2010-Dec-08 14:45 UTC
[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Failed zil devices will also cause this... Fred From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Wolfram Tomalla Sent: Wednesday, December 08, 2010 10:40 PM To: Frank Van Damme Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems Hi Frank, you might face the problem of lots of snapshots of your filesystems. For each snapshot a device is created during import of the pool. This can easily lead to an extend startup time. At my system it took about 15 minutes for 3500 snapshots. 2010/12/8 Frank Van Damme <frank.vandamme at gmail.com<mailto:frank.vandamme at gmail.com>> Hello list, I''m having trouble with a server holding a lot of data. After a few months of uptime, it is currently rebooting from a lockup (reason unknown so far) but it is taking hours to boot up again. The boot process is stuck at the stage where it says: mounting zfs filesystems (1/5) the machine responds to pings and keystrokes. I can see disk activity; the disk leds blink one after another. The file system layout is: a 40 GB mirror for the syspool, and a raidz volume over 4 2TB disks which I use for taking backups (=the purpose of this machine). I have deduplication enabled on the "backups" pool (which turned out to be pretty slow for file deletes since there are a lot of files on the "backups" pool and I haven''t installed an l2arc yet). The main memory is 6 GB, it''s an HP server running Nexenta core platform (kernel version 134f). I assume sooner or later the machine will boot up, but I''m in a bit of a panic about how to solve this permanently - after all the last thing I want is not being able to restore data one day because it takes days to boot the machine. Does anyone have an idea how much longer it may take and if the problem may have anything to do with dedup? -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101208/9afe01e5/attachment.html>
taemun
2010-Dec-08 21:28 UTC
[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Dedup? Taking a long time to boot after hard reboot after lookup? I''ll bet that it hard locked whilst deleting some files or a dataset that was dedup''d. After the delete is started, it spends *ages* cleaning up the DDT (the table containing a list of dedup''d blocks). If you hard lock in the middle of this clean up, then the DDT isn''t valid, to anything. The next mount attempt on that pool will do this operation for you. Which will take an inordinate amount of time. My pool spent *eight days* (iirc) in limbo, waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload of blocks and then everything was fine. This was for a zfs destroy of a 900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs. Unfortunately, raidz is of course slower for random reads than a set or mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat of a workaround for this, although I''ve not seen comprehensive figures for the gain it gives - http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101209/5be8bb33/attachment.html>
Frank Van Damme
2010-Dec-09 10:01 UTC
[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
2010/12/8 taemun <taemun at gmail.com>:> Dedup? Taking a long time to boot after hard reboot after lookup? > > I''ll bet that it hard locked whilst deleting some files or a dataset that > was dedup''d. After the delete is started, it spends *ages* cleaning up the > DDT (the table containing a list of dedup''d blocks). If you hard lock in the > middle of this clean up, then the DDT isn''t valid, to anything. The next > mount attempt on that pool will do this operation for you. Which will take > an inordinate amount of time. My pool spent eight days (iirc) in limbo, > waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload > of blocks and then everything was fine. This was for a zfs destroy of a > 900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs.Eight days is just... scary. Ok so basically it seems you can''t have all the advantages of zfs at once. No more fsck, but if you have a deduplicated pool the kernel will still consider it as "unclean" if you have a crash or unclean shutdown? I am indeed nearly continously deleting older files because each day a mass of files gets written to the machine (and backups rotated). Is it in some way possible to do the cleanup in smaller increments so the amount of cleanup work to do when you (hard)reboot is smaller?> Unfortunately, raidz is of course slower for random reads than a set or > mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat > of a workaround for this, although I''ve not seen?comprehensive?figures for > the gain it gives > -?http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913-- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author.
Frank Van Damme
2010-Dec-09 13:43 UTC
[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
2010/12/8 <gonczi at comcast.net>:> To explain further the? slow delete problem: > > It is absolutely critical for zfs to manage the incoming data rate. > This is done reasonably well for write transactions. > > Delete transactions, prior to dedup, were very light-weight, nearly free, > so these are not managed. > > Because of dedup,? deletes become rather expensive, because they introduce a > substantial seek penalty. Mostly because the need to update the dedupe > meta data (reference counts and such) > > The mechanism of the problem: > 1) Too many delete transactions are accepted into the > open transaction group. > > 2) When this txg comes up to be synced to disk, the sync takes a very long > time. > ( instead of a healthy 1-2 seconds, minutes, hours or days)Ok, had to look that one up, but the fog starts clearing up. I reckon in zfs land, a command like "sync" has no effect at all?> 3) Because the open txg can not be closed while the sync of a previous txg > is in progress, eventually we run out of buffer space in the open txg, and all > input is severely throttled. > > 4) Because of (3) other bad things happen, like the arc tries to shrink, > memory shortage, making things worse.Yes... I see... speaking of which: the arc size on my system would be 1685483656 bytes - that''s 1.6 GB in a system with 6 GB, with 3942 MB allocated to the kernel (dixit mdb''s ::memstat module). So can i assume that the better part of the rest is allocated in buffers that needlessly fill up over time? I''d much rather have the memory used for ARC :)> 5) Because delete-s persist across reboots, you are unable to mount your > pool > > Once solution is booting into maintenance mode, and renaming the zfs cache > file (look in /etc/zfs, I forget the name at the moment) > You can then boot up and import your pool. The import will take a long time > but meanwhile you are up and can do other things. > At that point you have the option of getting rid of the pool and starting > over > ( possibly installing a better kernel and starting over).. > After update, and import, update your pool to the current pool version > and life will be much better.By now, the system booted up. It has taken quit a few hours though. This system is actually running Nexenta but I''ll see if I can upgrade the kernel.> I hope this helps, good luckIt clarified a few things. Thank you very much. There are one or two things I still have to change on this system it seems...> In addition, there was virtual memory related bug (allocating one of the zfs > memory caches with the wrong object size) that would cause other > components to hang, waiting for memory allocations. > > This was so bad in earlier kernels that systems would become unresponsive > for > a potentially very long time ( a phenomenon known as "bricking"). > > As I recall a lot fo fixes came in in the 140 series kernels to fix this. > > Anything 145 and above should be OK.I''m on 134f. No wonder. -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author.