Hi, all We want to run e2fsck on OSTs to finish them. However, as to the running time, we should estimate how long it takes to run e2fsck on OSTs. We hopefully to know what elements to affect the running time like the size of OST, the average size of files and so on. Can you give us the tips for it? And I hope to receive you email as soon as possible. thanks, Sarea 2009-07-29 huangql -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090729/eca84111/attachment.html
Andreas Dilger
2009-Jul-29 10:35 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
On Jul 29, 2009 15:16 +0800, huangql wrote:> We want to run e2fsck on OSTs to finish them. However, as to the > running time, we should estimate how long it takes to run e2fsck on > OSTs. We hopefully to know what elements to affect the running time like > the size of OST, the average size of files and so on. Can you give us > the tips for it? And I hope to receive you email as soon as possible.The variables in your question are numerous: - size of the filesystem - number of inodes - number of allocated blocks - distribution of the above on the disk - speed of the disks - speed of the CPU - amount of RAM on server Reasonable e2fsck times (without serious filesystem problems) might take between 5 minutes and 2 hours. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi, Andreas Thanks so much to you. The variables of our filesystem as follow: size of the filesystem: 400TB number of inodes: 427220992 (set as the default inode size 256) number of allocated blocks:1708867323 (set as the default block size 4096) distribution of the above on the disk: 4 OSTs on one disk amount of RAM on server: 16GB speed of the CPU: Intel(R) Core 8 CPU speed of the disks:7200 Can you give us the more details according to the parameters? And give us some suggestions to do e2fsck or make it do faster. We are worried about it may destroy the filesystem. Thank you in advanced for your help! thanks, Sarea 2009-07-30 huangql ???? Andreas Dilger ????? 2009-07-29 18:33:13 ???? huangql ??? lustre-discuss ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST On Jul 29, 2009 15:16 +0800, huangql wrote:> We want to run e2fsck on OSTs to finish them. However, as to the > running time, we should estimate how long it takes to run e2fsck on > OSTs. We hopefully to know what elements to affect the running time like > the size of OST, the average size of files and so on. Can you give us > the tips for it? And I hope to receive you email as soon as possible.The variables in your question are numerous: - size of the filesystem - number of inodes - number of allocated blocks - distribution of the above on the disk - speed of the disks - speed of the CPU - amount of RAM on server Reasonable e2fsck times (without serious filesystem problems) might take between 5 minutes and 2 hours. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090730/3564ac11/attachment.html
Andreas Dilger
2009-Jul-30 03:43 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
On Jul 30, 2009 09:00 +0800, huangql wrote:> Thanks so much to you. The variables of our filesystem as follow: > > size of the filesystem: 400TBThis cannot be correct, given you have 1.7B 4kB blocks. Note that e2fsck time is in parallel on all OSTs.> number of inodes: 427220992 (set as the default inode size 256) > number of allocated blocks:1708867323 (set as the default block size 4096) > distribution of the above on the disk: 4 OSTs on one diskPutting 4 OSTs on a single disk doesn''t make sense. A single OST can be up to 8TB, and if you have multiple OSTs on the same disk(s) it will cause terrible performance problems due to seeking.> amount of RAM on server: 16GB > speed of the CPU: Intel(R) Core 8 CPU > speed of the disks:7200 > > Can you give us the more details according to the parameters? And give > us some suggestions to do e2fsck or make it do faster. We are worried > about it may destroy the filesystem. > > Thank you in advanced for your help!Sorry for your misunderstanding, but providing anything more than a rough estimate of e2fsck time is the best that is possible. I would estimate (excluding errors in the filesystem) a 7TB filesystem would take on the order of 2h or less.> > ???? Andreas Dilger > ????? 2009-07-29 18:33:13 > ???? huangql > ??? lustre-discuss > ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > We want to run e2fsck on OSTs to finish them. However, as to the > > running time, we should estimate how long it takes to run e2fsck on > > OSTs. We hopefully to know what elements to affect the running time like > > the size of OST, the average size of files and so on. Can you give us > > the tips for it? And I hope to receive you email as soon as possible. > The variables in your question are numerous: > - size of the filesystem > - number of inodes > - number of allocated blocks > - distribution of the above on the disk > - speed of the disks > - speed of the CPU > - amount of RAM on server > Reasonable e2fsck times (without serious filesystem problems) might > take between 5 minutes and 2 hours. > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc.> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi, Andreas Yeah, Thank you very much. I failed to make e2fsck on ost weeks ago, As a result, some files were destroyed. Then I find the reason why we failed doing it. I find someone said that there is some bugs for e2fsck. Due to this and the time pressure, our team is talking about whether to run e2fsck. So I really you can give some tips for it and show some points to pay attention! see inline... thanks, Sarea 2009-07-30 huangql ???? Andreas Dilger ????? 2009-07-30 11:41:13 ???? huangql ??? lustre-discuss ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST On Jul 30, 2009 09:00 +0800, huangql wrote:> Thanks so much to you. The variables of our filesystem as follow: > > size of the filesystem: 400TBThis cannot be correct, given you have 1.7B 4kB blocks. Note that e2fsck time is in parallel on all OSTs. Sorry I don''t know what you said, our filesystem is up to 400TB, and the inode, block parameters were set as default value. As you mentioned, a 7TB filesystem would take on the order of 2h or less. Is the time in parallel on all OSTs?> number of inodes: 427220992 (set as the default inode size 256) > number of allocated blocks:1708867323 (set as the default block size 4096) > distribution of the above on the disk: 4 OSTs on one diskPutting 4 OSTs on a single disk doesn''t make sense. A single OST can be up to 8TB, and if you have multiple OSTs on the same disk(s) it will cause terrible performance problems due to seeking.> amount of RAM on server: 16GB > speed of the CPU: Intel(R) Core 8 CPU > speed of the disks:7200 > > Can you give us the more details according to the parameters? And give > us some suggestions to do e2fsck or make it do faster. We are worried > about it may destroy the filesystem. > > Thank you in advanced for your help!Sorry for your misunderstanding, but providing anything more than a rough estimate of e2fsck time is the best that is possible. I would estimate (excluding errors in the filesystem) a 7TB filesystem would take on the order of 2h or less.> > ???? Andreas Dilger > ????? 2009-07-29 18:33:13 > ???? huangql > ??? lustre-discuss > ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > We want to run e2fsck on OSTs to finish them. However, as to the > > running time, we should estimate how long it takes to run e2fsck on > > OSTs. We hopefully to know what elements to affect the running time like > > the size of OST, the average size of files and so on. Can you give us > > the tips for it? And I hope to receive you email as soon as possible. > The variables in your question are numerous: > - size of the filesystem > - number of inodes > - number of allocated blocks > - distribution of the above on the disk > - speed of the disks > - speed of the CPU > - amount of RAM on server > Reasonable e2fsck times (without serious filesystem problems) might > take between 5 minutes and 2 hours. > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090730/ebb1c5ef/attachment.html
Peter Grandi
2009-Aug-04 22:47 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
[ ... ] adilger> Putting 4 OSTs on a single disk doesn''t make sense. adilger> A single OST can be up to 8TB, and if you have multiple adilger> OSTs on the same disk(s) it will cause terrible adilger> performance problems due to seeking. Uhm, not exactly, that''s a quick but simplistic answer: things are more complicated than that. The seeking depends strictly on access patterns and number of disks in most cases. Suppose that you have a 1TB disk and divide it into one or two filesystems: for a given file set (assumption relaxed later) and access pattern the same bits of the disk will be accessed. The two filesystems end up being mostly super-cylinder-groups, that mostly disjoined free space allocation pools. There are secondary effects as to the disjoined free space allocations (one filesystem means allocations can spread all over the disk, two filesystems will restrict allocation to two separate pools, which most likely will improve clustering). Then two separate filesystems are more resilient to serious mangling, and might fsck faster (because of the better clustering) if done sequentially. But the assumption "given file set" does not hold if the two filesystems are part of the same Lustre filesystem *and* striping is happening. In that case two objects that are parts of the same Lustre file will usually end up on the two partitions and Lustre will assume that they can be fetched in parallel but cannot really, and this may reduce performance. But the the overall effect will not be big; it will mostly be the same as if the max object size had been doubled, because again performance depends mostly on file access patterns and number of drives. For small files though it will halve the number of disks on which it can stripe, but this can be countered by halving the max object size. Consider this example, a max object size of 1MiB, and a 100MiB file and 10 drives and striping. With one filesystem per drive you can read 10MiB in paralle in 1MiB objects (stripe size 10MiB). With two filesystems per drive you can read 20MiB in parallel (stripe size 20MiB) in 2x1MiB objects that are serialized by the drive. If the max object size is changed to 512KiB in the two filesystem per drive, you can still read 10MiB in parallel in 2x512MiB objects (back to the 10MiB stripe size). Now one might argue that in the 10x1MiB case the 1MiB is likely to be more contiguous than in the 10x2x512KiB case, where the two 512KiB objects being forced to be in different halves of the disk, but then let me point out that the 100MiB file striped across the 10 drives in 1MiB objects has got 10x1MiB objects per drive, anyhow and whether they are clustered or not is mostly up to luck. So the issue really is whether 20x512KiB objects per drive are going to be less clustered than 10x1MiB objecs, and my guess is that it does not matter a lot, and in some cases it might be of benefit. Anyhow, there is a case where two OSTs per drive is most likely of benefit. That''s the case where two OSTs belong to two Lustre filesystems, one faster (outer track OSTs) and used more often and one slower (inner track OSTs) and used less often. That means a crude form of hand-clustering. Still though performance likely depends more on the overall file access patterns and the number of disks than on whether they are split across two distinct allocation pools. Note 1: a fair bit also depends on the in-cylinder-group allocation policy of ''ldiskfs'' and how often the allocator will switch to a different cylinder group and Note 2: maybe there is some special issue within Lustre that makes it rather less effective with the partitions per disk. Note 3: in many if not most (just a guess) Lustre installations the "disk" is actually a SAN RAID pool, and each OST is a LUN of that SAN RAID pool, and that LUN is in effect a slice of a partition off each disk. Now this is may not be at all what Lustre should be about :-). Amazing barely related discovery BTW: while searching info on the current cylinder group policies of file system designs in the ''ext'' family, I found that there was an interesting filesystem called "ext4" in 1997, which has some elements reminiscent of Lustre (or the original UNIX filesystem design): http://www.cs.cmu.edu/~mihaib/fs/fs.html "A Dual-Disk File System: ext4 Mihai Budiu April 16, 1997" So RedHat and Linus should change the name of the recently introduced one to ''ext5''.
Andreas Dilger
2009-Aug-05 17:01 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
On Aug 04, 2009 22:47 +0000, Peter Grandi wrote:> Andreas Dilger wrote: > adilger> Putting 4 OSTs on a single disk doesn''t make sense. > adilger> A single OST can be up to 8TB, and if you have multiple > adilger> OSTs on the same disk(s) it will cause terrible > adilger> performance problems due to seeking. > > Uhm, not exactly, that''s a quick but simplistic answer: things > are more complicated than that.[lengthy discussion removed]> Note 3: in many if not most (just a guess) Lustre installations the > "disk" is actually a SAN RAID pool, and each OST is a LUN of that > SAN RAID pool, and that LUN is in effect a slice of a partition off > each disk. Now this is may not be at all what Lustre should be about :-).This is what will happen with any RAID that I''m aware of, and is specifically what I was referring to when I said "disk" instead of "LUN". Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
What if there *are* serious file-system problems? We''re half-way into CPU hour 7 of our e2fsck run on a 8TB OST with ~274 million inodes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND ##### root 25 0 15.1g 1.1g 772 R 100 57.4 472:07.67 e2fsck If it finishes within my lifetime I''ll post how long it takes. Thanks, Adam On Wed, 2009-07-29 at 04:35 -0600, Andreas Dilger wrote:> On Jul 29, 2009 15:16 +0800, huangql wrote: > > We want to run e2fsck on OSTs to finish them. However, as to the > > running time, we should estimate how long it takes to run e2fsck on > > OSTs. We hopefully to know what elements to affect the running time like > > the size of OST, the average size of files and so on. Can you give us > > the tips for it? And I hope to receive you email as soon as possible. > > The variables in your question are numerous: > - size of the filesystem > - number of inodes > - number of allocated blocks > - distribution of the above on the disk > - speed of the disks > - speed of the CPU > - amount of RAM on server > > Reasonable e2fsck times (without serious filesystem problems) might > take between 5 minutes and 2 hours. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Some days before it took me more than 10 hours to fsck a 4TB OST!! 2009/9/4 Adam <adam at sharcnet.ca>> What if there *are* serious file-system problems? We''re half-way into > CPU hour 7 of our e2fsck run on a 8TB OST with ~274 million inodes: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > ##### root 25 0 15.1g 1.1g 772 R 100 57.4 472:07.67 e2fsck > > If it finishes within my lifetime I''ll post how long it takes. > > Thanks, > Adam > > On Wed, 2009-07-29 at 04:35 -0600, Andreas Dilger wrote: > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > > We want to run e2fsck on OSTs to finish them. However, as to the > > > running time, we should estimate how long it takes to run e2fsck on > > > OSTs. We hopefully to know what elements to affect the running time > like > > > the size of OST, the average size of files and so on. Can you give us > > > the tips for it? And I hope to receive you email as soon as possible. > > > > The variables in your question are numerous: > > - size of the filesystem > > - number of inodes > > - number of allocated blocks > > - distribution of the above on the disk > > - speed of the disks > > - speed of the CPU > > - amount of RAM on server > > > > Reasonable e2fsck times (without serious filesystem problems) might > > take between 5 minutes and 2 hours. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090905/ab3cf01e/attachment.html
Thanks for the advice everyone. I updated to e2fsprogs-1.41.6.sun1 (much better for clearing that MMP block, and I''m sure there are other benefits) and re-ran e2fsck. With the e2fsprogs-1.40.11.sun1 version of e2fsck and ltrace resulted in a seg-fault, but with the new version the ltrace shows that e2fsck isn''t doing anything other then consuming CPU. I''m ready to repave the OST unless anyone has any ideas. I mean, if e2fsck fails... what''s left to do? Thanks, Adam On Fri, 2009-09-04 at 09:44 -0400, Adam wrote:> What if there *are* serious file-system problems? We''re half-way into > CPU hour 7 of our e2fsck run on a 8TB OST with ~274 million inodes: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > ##### root 25 0 15.1g 1.1g 772 R 100 57.4 472:07.67 e2fsck > > If it finishes within my lifetime I''ll post how long it takes. > > Thanks, > Adam > > On Wed, 2009-07-29 at 04:35 -0600, Andreas Dilger wrote: > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > > We want to run e2fsck on OSTs to finish them. However, as to the > > > running time, we should estimate how long it takes to run e2fsck on > > > OSTs. We hopefully to know what elements to affect the running time like > > > the size of OST, the average size of files and so on. Can you give us > > > the tips for it? And I hope to receive you email as soon as possible. > > > > The variables in your question are numerous: > > - size of the filesystem > > - number of inodes > > - number of allocated blocks > > - distribution of the above on the disk > > - speed of the disks > > - speed of the CPU > > - amount of RAM on server > > > > Reasonable e2fsck times (without serious filesystem problems) might > > take between 5 minutes and 2 hours. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss