Hi, all We want to run e2fsck on OSTs to finish them. However, as to the running time, we should estimate how long it takes to run e2fsck on OSTs. We hopefully to know what elements to affect the running time like the size of OST, the average size of files and so on. Can you give us the tips for it? And I hope to receive you email as soon as possible. thanks, Sarea 2009-07-29 huangql -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090729/eca84111/attachment.html
Andreas Dilger
2009-Jul-29 10:35 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
On Jul 29, 2009 15:16 +0800, huangql wrote:> We want to run e2fsck on OSTs to finish them. However, as to the > running time, we should estimate how long it takes to run e2fsck on > OSTs. We hopefully to know what elements to affect the running time like > the size of OST, the average size of files and so on. Can you give us > the tips for it? And I hope to receive you email as soon as possible.The variables in your question are numerous: - size of the filesystem - number of inodes - number of allocated blocks - distribution of the above on the disk - speed of the disks - speed of the CPU - amount of RAM on server Reasonable e2fsck times (without serious filesystem problems) might take between 5 minutes and 2 hours. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi, Andreas Thanks so much to you. The variables of our filesystem as follow: size of the filesystem: 400TB number of inodes: 427220992 (set as the default inode size 256) number of allocated blocks:1708867323 (set as the default block size 4096) distribution of the above on the disk: 4 OSTs on one disk amount of RAM on server: 16GB speed of the CPU: Intel(R) Core 8 CPU speed of the disks:7200 Can you give us the more details according to the parameters? And give us some suggestions to do e2fsck or make it do faster. We are worried about it may destroy the filesystem. Thank you in advanced for your help! thanks, Sarea 2009-07-30 huangql ???? Andreas Dilger ????? 2009-07-29 18:33:13 ???? huangql ??? lustre-discuss ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST On Jul 29, 2009 15:16 +0800, huangql wrote:> We want to run e2fsck on OSTs to finish them. However, as to the > running time, we should estimate how long it takes to run e2fsck on > OSTs. We hopefully to know what elements to affect the running time like > the size of OST, the average size of files and so on. Can you give us > the tips for it? And I hope to receive you email as soon as possible.The variables in your question are numerous: - size of the filesystem - number of inodes - number of allocated blocks - distribution of the above on the disk - speed of the disks - speed of the CPU - amount of RAM on server Reasonable e2fsck times (without serious filesystem problems) might take between 5 minutes and 2 hours. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090730/3564ac11/attachment.html
Andreas Dilger
2009-Jul-30 03:43 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
On Jul 30, 2009 09:00 +0800, huangql wrote:> Thanks so much to you. The variables of our filesystem as follow: > > size of the filesystem: 400TBThis cannot be correct, given you have 1.7B 4kB blocks. Note that e2fsck time is in parallel on all OSTs.> number of inodes: 427220992 (set as the default inode size 256) > number of allocated blocks:1708867323 (set as the default block size 4096) > distribution of the above on the disk: 4 OSTs on one diskPutting 4 OSTs on a single disk doesn''t make sense. A single OST can be up to 8TB, and if you have multiple OSTs on the same disk(s) it will cause terrible performance problems due to seeking.> amount of RAM on server: 16GB > speed of the CPU: Intel(R) Core 8 CPU > speed of the disks:7200 > > Can you give us the more details according to the parameters? And give > us some suggestions to do e2fsck or make it do faster. We are worried > about it may destroy the filesystem. > > Thank you in advanced for your help!Sorry for your misunderstanding, but providing anything more than a rough estimate of e2fsck time is the best that is possible. I would estimate (excluding errors in the filesystem) a 7TB filesystem would take on the order of 2h or less.> > ???? Andreas Dilger > ????? 2009-07-29 18:33:13 > ???? huangql > ??? lustre-discuss > ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > We want to run e2fsck on OSTs to finish them. However, as to the > > running time, we should estimate how long it takes to run e2fsck on > > OSTs. We hopefully to know what elements to affect the running time like > > the size of OST, the average size of files and so on. Can you give us > > the tips for it? And I hope to receive you email as soon as possible. > The variables in your question are numerous: > - size of the filesystem > - number of inodes > - number of allocated blocks > - distribution of the above on the disk > - speed of the disks > - speed of the CPU > - amount of RAM on server > Reasonable e2fsck times (without serious filesystem problems) might > take between 5 minutes and 2 hours. > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc.> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi, Andreas Yeah, Thank you very much. I failed to make e2fsck on ost weeks ago, As a result, some files were destroyed. Then I find the reason why we failed doing it. I find someone said that there is some bugs for e2fsck. Due to this and the time pressure, our team is talking about whether to run e2fsck. So I really you can give some tips for it and show some points to pay attention! see inline... thanks, Sarea 2009-07-30 huangql ???? Andreas Dilger ????? 2009-07-30 11:41:13 ???? huangql ??? lustre-discuss ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST On Jul 30, 2009 09:00 +0800, huangql wrote:> Thanks so much to you. The variables of our filesystem as follow: > > size of the filesystem: 400TBThis cannot be correct, given you have 1.7B 4kB blocks. Note that e2fsck time is in parallel on all OSTs. Sorry I don''t know what you said, our filesystem is up to 400TB, and the inode, block parameters were set as default value. As you mentioned, a 7TB filesystem would take on the order of 2h or less. Is the time in parallel on all OSTs?> number of inodes: 427220992 (set as the default inode size 256) > number of allocated blocks:1708867323 (set as the default block size 4096) > distribution of the above on the disk: 4 OSTs on one diskPutting 4 OSTs on a single disk doesn''t make sense. A single OST can be up to 8TB, and if you have multiple OSTs on the same disk(s) it will cause terrible performance problems due to seeking.> amount of RAM on server: 16GB > speed of the CPU: Intel(R) Core 8 CPU > speed of the disks:7200 > > Can you give us the more details according to the parameters? And give > us some suggestions to do e2fsck or make it do faster. We are worried > about it may destroy the filesystem. > > Thank you in advanced for your help!Sorry for your misunderstanding, but providing anything more than a rough estimate of e2fsck time is the best that is possible. I would estimate (excluding errors in the filesystem) a 7TB filesystem would take on the order of 2h or less.> > ???? Andreas Dilger > ????? 2009-07-29 18:33:13 > ???? huangql > ??? lustre-discuss > ??? Re: [Lustre-discuss] How to estimate the time for e2fsck on OST > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > We want to run e2fsck on OSTs to finish them. However, as to the > > running time, we should estimate how long it takes to run e2fsck on > > OSTs. We hopefully to know what elements to affect the running time like > > the size of OST, the average size of files and so on. Can you give us > > the tips for it? And I hope to receive you email as soon as possible. > The variables in your question are numerous: > - size of the filesystem > - number of inodes > - number of allocated blocks > - distribution of the above on the disk > - speed of the disks > - speed of the CPU > - amount of RAM on server > Reasonable e2fsck times (without serious filesystem problems) might > take between 5 minutes and 2 hours. > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090730/ebb1c5ef/attachment.html
Peter Grandi
2009-Aug-04 22:47 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
[ ... ]
adilger> Putting 4 OSTs on a single disk doesn''t make sense.
adilger> A single OST can be up to 8TB, and if you have multiple
adilger> OSTs on the same disk(s) it will cause terrible
adilger> performance problems due to seeking.
Uhm, not exactly, that''s a quick but simplistic answer: things
are more complicated than that.
The seeking depends strictly on access patterns and number of disks
in most cases.
Suppose that you have a 1TB disk and divide it into one or two
filesystems: for a given file set (assumption relaxed later) and
access pattern the same bits of the disk will be accessed.
The two filesystems end up being mostly super-cylinder-groups, that
mostly disjoined free space allocation pools. There are secondary
effects as to the disjoined free space allocations (one filesystem
means allocations can spread all over the disk, two filesystems
will restrict allocation to two separate pools, which most likely
will improve clustering).
Then two separate filesystems are more resilient to serious
mangling, and might fsck faster (because of the better clustering)
if done sequentially.
But the assumption "given file set" does not hold if the two
filesystems are part of the same Lustre filesystem *and* striping
is happening. In that case two objects that are parts of the same
Lustre file will usually end up on the two partitions and Lustre
will assume that they can be fetched in parallel but cannot really,
and this may reduce performance.
But the the overall effect will not be big; it will mostly be the
same as if the max object size had been doubled, because again
performance depends mostly on file access patterns and number of
drives. For small files though it will halve the number of disks on
which it can stripe, but this can be countered by halving the max
object size.
Consider this example, a max object size of 1MiB, and a 100MiB file
and 10 drives and striping.
With one filesystem per drive you can read 10MiB in paralle in 1MiB
objects (stripe size 10MiB). With two filesystems per drive you can
read 20MiB in parallel (stripe size 20MiB) in 2x1MiB objects that
are serialized by the drive.
If the max object size is changed to 512KiB in the two filesystem
per drive, you can still read 10MiB in parallel in 2x512MiB objects
(back to the 10MiB stripe size).
Now one might argue that in the 10x1MiB case the 1MiB is likely to
be more contiguous than in the 10x2x512KiB case, where the two
512KiB objects being forced to be in different halves of the disk,
but then let me point out that the 100MiB file striped across the
10 drives in 1MiB objects has got 10x1MiB objects per drive, anyhow
and whether they are clustered or not is mostly up to luck.
So the issue really is whether 20x512KiB objects per drive are
going to be less clustered than 10x1MiB objecs, and my guess is
that it does not matter a lot, and in some cases it might be of
benefit.
Anyhow, there is a case where two OSTs per drive is most likely of
benefit. That''s the case where two OSTs belong to two Lustre
filesystems, one faster (outer track OSTs) and used more often and
one slower (inner track OSTs) and used less often. That means a
crude form of hand-clustering.
Still though performance likely depends more on the overall file
access patterns and the number of disks than on whether they are
split across two distinct allocation pools.
Note 1: a fair bit also depends on the in-cylinder-group allocation
policy of ''ldiskfs'' and how often the allocator will switch to
a
different cylinder group and
Note 2: maybe there is some special issue within Lustre that makes
it rather less effective with the partitions per disk.
Note 3: in many if not most (just a guess) Lustre installations the
"disk" is actually a SAN RAID pool, and each OST is a LUN of that
SAN RAID pool, and that LUN is in effect a slice of a partition off
each disk. Now this is may not be at all what Lustre should be
about :-).
Amazing barely related discovery BTW: while searching info on the
current cylinder group policies of file system designs in the
''ext''
family, I found that there was an interesting filesystem called
"ext4" in 1997, which has some elements reminiscent of Lustre (or
the original UNIX filesystem design):
http://www.cs.cmu.edu/~mihaib/fs/fs.html
"A Dual-Disk File System: ext4 Mihai Budiu April 16, 1997"
So RedHat and Linus should change the name of the recently
introduced one to ''ext5''.
Andreas Dilger
2009-Aug-05 17:01 UTC
[Lustre-discuss] How to estimate the time for e2fsck on OST
On Aug 04, 2009 22:47 +0000, Peter Grandi wrote:> Andreas Dilger wrote: > adilger> Putting 4 OSTs on a single disk doesn''t make sense. > adilger> A single OST can be up to 8TB, and if you have multiple > adilger> OSTs on the same disk(s) it will cause terrible > adilger> performance problems due to seeking. > > Uhm, not exactly, that''s a quick but simplistic answer: things > are more complicated than that.[lengthy discussion removed]> Note 3: in many if not most (just a guess) Lustre installations the > "disk" is actually a SAN RAID pool, and each OST is a LUN of that > SAN RAID pool, and that LUN is in effect a slice of a partition off > each disk. Now this is may not be at all what Lustre should be about :-).This is what will happen with any RAID that I''m aware of, and is specifically what I was referring to when I said "disk" instead of "LUN". Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
What if there *are* serious file-system problems? We''re half-way into CPU hour 7 of our e2fsck run on a 8TB OST with ~274 million inodes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND ##### root 25 0 15.1g 1.1g 772 R 100 57.4 472:07.67 e2fsck If it finishes within my lifetime I''ll post how long it takes. Thanks, Adam On Wed, 2009-07-29 at 04:35 -0600, Andreas Dilger wrote:> On Jul 29, 2009 15:16 +0800, huangql wrote: > > We want to run e2fsck on OSTs to finish them. However, as to the > > running time, we should estimate how long it takes to run e2fsck on > > OSTs. We hopefully to know what elements to affect the running time like > > the size of OST, the average size of files and so on. Can you give us > > the tips for it? And I hope to receive you email as soon as possible. > > The variables in your question are numerous: > - size of the filesystem > - number of inodes > - number of allocated blocks > - distribution of the above on the disk > - speed of the disks > - speed of the CPU > - amount of RAM on server > > Reasonable e2fsck times (without serious filesystem problems) might > take between 5 minutes and 2 hours. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Some days before it took me more than 10 hours to fsck a 4TB OST!! 2009/9/4 Adam <adam at sharcnet.ca>> What if there *are* serious file-system problems? We''re half-way into > CPU hour 7 of our e2fsck run on a 8TB OST with ~274 million inodes: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > ##### root 25 0 15.1g 1.1g 772 R 100 57.4 472:07.67 e2fsck > > If it finishes within my lifetime I''ll post how long it takes. > > Thanks, > Adam > > On Wed, 2009-07-29 at 04:35 -0600, Andreas Dilger wrote: > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > > We want to run e2fsck on OSTs to finish them. However, as to the > > > running time, we should estimate how long it takes to run e2fsck on > > > OSTs. We hopefully to know what elements to affect the running time > like > > > the size of OST, the average size of files and so on. Can you give us > > > the tips for it? And I hope to receive you email as soon as possible. > > > > The variables in your question are numerous: > > - size of the filesystem > > - number of inodes > > - number of allocated blocks > > - distribution of the above on the disk > > - speed of the disks > > - speed of the CPU > > - amount of RAM on server > > > > Reasonable e2fsck times (without serious filesystem problems) might > > take between 5 minutes and 2 hours. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090905/ab3cf01e/attachment.html
Thanks for the advice everyone. I updated to e2fsprogs-1.41.6.sun1 (much better for clearing that MMP block, and I''m sure there are other benefits) and re-ran e2fsck. With the e2fsprogs-1.40.11.sun1 version of e2fsck and ltrace resulted in a seg-fault, but with the new version the ltrace shows that e2fsck isn''t doing anything other then consuming CPU. I''m ready to repave the OST unless anyone has any ideas. I mean, if e2fsck fails... what''s left to do? Thanks, Adam On Fri, 2009-09-04 at 09:44 -0400, Adam wrote:> What if there *are* serious file-system problems? We''re half-way into > CPU hour 7 of our e2fsck run on a 8TB OST with ~274 million inodes: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > ##### root 25 0 15.1g 1.1g 772 R 100 57.4 472:07.67 e2fsck > > If it finishes within my lifetime I''ll post how long it takes. > > Thanks, > Adam > > On Wed, 2009-07-29 at 04:35 -0600, Andreas Dilger wrote: > > On Jul 29, 2009 15:16 +0800, huangql wrote: > > > We want to run e2fsck on OSTs to finish them. However, as to the > > > running time, we should estimate how long it takes to run e2fsck on > > > OSTs. We hopefully to know what elements to affect the running time like > > > the size of OST, the average size of files and so on. Can you give us > > > the tips for it? And I hope to receive you email as soon as possible. > > > > The variables in your question are numerous: > > - size of the filesystem > > - number of inodes > > - number of allocated blocks > > - distribution of the above on the disk > > - speed of the disks > > - speed of the CPU > > - amount of RAM on server > > > > Reasonable e2fsck times (without serious filesystem problems) might > > take between 5 minutes and 2 hours. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss