Ms. Megan Larko
2010-Mar-02 20:45 UTC
[Lustre-discuss] Unbalanced OST--for discussion purposes
Hi, I have a Lustre array (version 2.6.18-53.1.13.el5_lustre.1.6.4.3smp) which will soon be decommissioned in favor of newer hardware. Therefore this question is mostly for my personal intellectual curiosity. I logged directly into the OSS (OSS4) and just ran a df (along with a periodic check of the log files). I last looked about two weeks ago (I know it was after 17 Feb). Anyway, the OST0007 is more full than any of the other OSTs. The default lustre stripe (I believe that is set to 1) is used. Can just one file shift the size used of one OST that significantly? What other reasonable explanation for a difference on one OST in comparison with the others? Could this cause a lustre performance hit at this point? [root at oss4 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sdb1 6.3T 3.6T 2.5T 60% /srv/lustre/OST/crew8-OST0000 /dev/sdb2 6.3T 4.1T 1.9T 69% /srv/lustre/OST/crew8-OST0001 /dev/sdc1 6.3T 3.3T 2.8T 55% /srv/lustre/OST/crew8-OST0002 /dev/sdc2 6.3T 3.3T 2.7T 56% /srv/lustre/OST/crew8-OST0003 /dev/sdd1 6.3T 3.5T 2.6T 58% /srv/lustre/OST/crew8-OST0004 /dev/sdd2 6.3T 4.1T 1.9T 69% /srv/lustre/OST/crew8-OST0005 /dev/sdi1 6.3T 3.9T 2.2T 65% /srv/lustre/OST/crew8-OST0006 /dev/sdi2 6.3T 5.0T 1015G 84% /srv/lustre/OST/crew8-OST0007 <---- /dev/sdj1 6.3T 3.4T 2.7T 56% /srv/lustre/OST/crew8-OST0008 /dev/sdj2 6.3T 3.3T 2.7T 56% /srv/lustre/OST/crew8-OST0009 /dev/sdk1 6.3T 3.4T 2.7T 56% /srv/lustre/OST/crew8-OST0010 /dev/sdk2 6.3T 3.8T 2.2T 64% /srv/lustre/OST/crew8-OST0011 Still learning.... megan
Brian J. Murrell
2010-Mar-02 21:00 UTC
[Lustre-discuss] Unbalanced OST--for discussion purposes
On Tue, 2010-03-02 at 15:45 -0500, Ms. Megan Larko wrote:> Hi,Hi,> I logged directly into the OSS (OSS4) and just ran a df (along with a > periodic check of the log files). I last looked about two weeks ago > (I know it was after 17 Feb).Is the implication that at this point the OSTs were more or less well balanced?> Anyway, the OST0007 is more full than > any of the other OSTs. The default lustre stripe (I believe that is > set to 1) is used. Can just one file shift the size used of one OST > that significantly?Sure. As an example, if one had a 1KiB file on that OST, called, let''s say, "1K_file.dat" and one did: $ dd if=/dev/zero of=1K_file.dat bs=1G count=1024 that would overwrite the 1KiB file on that OST with a 1TiB file. Recognizing of course that that would be 1TiB in a single object on an OST.> What other reasonable explanation for a > difference on one OST in comparison with the others?Any kind of variation on the above.> Could this cause > a lustre performance hit at this point?Not really. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100302/b4f86b6c/attachment.bin
Andreas Dilger
2010-Mar-03 04:06 UTC
[Lustre-discuss] Unbalanced OST--for discussion purposes
On 2010-03-02, at 13:45, Ms. Megan Larko wrote:> I logged directly into the OSS (OSS4) and just ran a df (along with a > periodic check of the log files). I last looked about two weeks ago > (I know it was after 17 Feb). Anyway, the OST0007 is more full than > any of the other OSTs. The default lustre stripe (I believe that is > set to 1) is used. Can just one file shift the size used of one OST > that significantly?Sure, this is easy if the size of a single file is be a large fraction of the OST size. This is one reason why we recommend people use larger OSTs (up to 16TB in 1.8.2 with RHEL5.4) instead of e.g. 1TB or less that is sometimes reported here.> What other reasonable explanation for a difference on one OST in > comparison with the others? Could this cause a lustre performance > hit at this point?It is possible, if the filesystem is getting very full and it causes more seeking to do IO. At the 84% you report below it is starting to get into that range - I wouldn''t recommend running the filesystem beyond 90% full unless you are more concerned with space usage than performance. You can find the file(s) that are abnormally large on that particular OST by running (preferably on a client mountpoint on the MDS): lfs find --obd crew8-OST0006_UUID -size +10G /mnt/lustre> [root at oss4 ~]# df -h > Filesystem Size Used Avail Use% Mounted on > > /dev/sdb1 6.3T 3.6T 2.5T 60% /srv/lustre/OST/crew8- > OST0000 > /dev/sdb2 6.3T 4.1T 1.9T 69% /srv/lustre/OST/crew8- > OST0001 > /dev/sdc1 6.3T 3.3T 2.8T 55% /srv/lustre/OST/crew8- > OST0002 > /dev/sdc2 6.3T 3.3T 2.7T 56% /srv/lustre/OST/crew8- > OST0003 > /dev/sdd1 6.3T 3.5T 2.6T 58% /srv/lustre/OST/crew8- > OST0004 > /dev/sdd2 6.3T 4.1T 1.9T 69% /srv/lustre/OST/crew8- > OST0005 > /dev/sdi1 6.3T 3.9T 2.2T 65% /srv/lustre/OST/crew8- > OST0006 > /dev/sdi2 6.3T 5.0T 1015G 84% > /srv/lustre/OST/crew8-OST0007 <---- > /dev/sdj1 6.3T 3.4T 2.7T 56% /srv/lustre/OST/crew8- > OST0008 > /dev/sdj2 6.3T 3.3T 2.7T 56% /srv/lustre/OST/crew8- > OST0009 > /dev/sdk1 6.3T 3.4T 2.7T 56% /srv/lustre/OST/crew8- > OST0010 > /dev/sdk2 6.3T 3.8T 2.2T 64% /srv/lustre/OST/crew8- > OST0011Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Ms. Megan Larko
2010-Mar-03 19:25 UTC
[Lustre-discuss] Unbalanced OST--for discussion purposes
Thanks to both Brian and Andreas for the timely responses. Brian posed the question as to whether or not the OSTs were more or less balanced a week ago. The answer is that I believe that they were. Usually all OSTs report a similar percentage of usage (between 1% and 3% of one another). I believe that is why this new report piqued my curiosity. Regarding Andreas remark about individual OST size, yes I understand that having larger individual OSTs can preempt any one OST from becoming so full that the others degrade in performance (per A. Dilger, not B. Murrel). For that reason I personally like the option available in newer Lustre releases (I think 1.8.x and higher) to allow up to 16Tb in a single OST slice. I know the previous limit was 8Tb per OST slice for precaution against data corruption. (I was able to build a larger OST slice with 1.6.7 but I was cautioned that some data may become unreachable and/or corrupted as the Lustre system had not at that time been modified to accept the larger partition sizes which the underlying files systems--ext4, xfs---would accept.) The OST formatted size of 6.3Tb fit nicely into the JBOD scheme of evenly-sized partitions. Thanks, megan On Tue, 2010-03-02 at 15:45 -0500, Ms. Megan Larko wrote:> Hi,Hi,> I logged directly into the OSS (OSS4) and just ran a df (along with a > periodic check of the log files). I last looked about two weeks ago > (I know it was after 17 Feb).Is the implication that at this point the OSTs were more or less well balanced?> Anyway, the OST0007 is more full than > any of the other OSTs. The default lustre stripe (I believe that is > set to 1) is used. Can just one file shift the size used of one OST > that significantly?Sure. As an example, if one had a 1KiB file on that OST, called, let''s say, "1K_file.dat" and one did: $ dd if=/dev/zero of=1K_file.dat bs=1G count=1024 that would overwrite the 1KiB file on that OST with a 1TiB file. Recognizing of course that that would be 1TiB in a single object on an OST.> What other reasonable explanation for a > difference on one OST in comparison with the others?Any kind of variation on the above.> Could this cause > a lustre performance hit at this point?Not really. b.