Sugree Phatanapherom
2006-Oct-10 22:23 UTC
[Lustre-discuss] unexpected "no space left on device"
Hi, I have just installed lfs on CentOS 4.4 successfully and put some files to the filesystem. However, I got unexpected error message "no space left on device". The inode seems to be enough (see below). Note that at this point, I sometimes could create 700MB files successfully but not 2GB (It terminated with that error message at 200MB). Below is what I found in /var/log/messages. Oct 10 22:49:40 araya kernel: LustreError: 24869:0:(client.c:579:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -28 Oct 10 22:49:40 araya kernel: LustreError: 24869:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message Oct 10 22:49:40 araya kernel: LustreError: 24869:0:(client.c:579:ptlrpc_check_status()) req@000001012bd3c400 x1217967/t0 o4->lustre-OST0003_UUID@10.255.255.254@tcp:28 lens 352/320 ref 2 fl Rpc:R/0/0 rc 0/-28 Oct 10 22:49:40 araya kernel: LustreError: 24869:0:(client.c:579:ptlrpc_check_status()) Skipped 2 previous similar messages I also attached "df" and "df -i" here. [sugree_ph@araya data]$ lfs df UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 10498848 1043328 9455520 9 /mnt/lustre[MDT:0] lustre-OST0000_UUID 59058092 54138348 4919744 91 /mnt/lustre[OST:0] lustre-OST0001_UUID 59058092 58844504 213588 99 /mnt/lustre[OST:1] lustre-OST0002_UUID 59058092 42914516 16143576 72 /mnt/lustre[OST:2] lustre-OST0003_UUID 59058092 58881128 176964 99 /mnt/lustre[OST:3] filesystem summary: 236232368 214778496 21453872 90 /mnt/lustre [sugree_ph@araya data]$ lfs df -i UUID Inodes IUsed IFree IUse% Mounted on lustre-MDT0000_UUID 2517606 3726 2513880 0 /mnt/lustre[MDT:0] lustre-OST0000_UUID 1980975 1007 1979968 0 /mnt/lustre[OST:0] lustre-OST0001_UUID 804358 945 803413 0 /mnt/lustre[OST:1] lustre-OST0002_UUID 3751936 1007 3750929 0 /mnt/lustre[OST:2] lustre-OST0003_UUID 795260 1003 794257 0 /mnt/lustre[OST:3] filesystem summary: 2517606 3726 2513880 0 /mnt/lustre Any suggestion? Sugree Phatanapherom Thai National Grid Center Software Industry Promotion Agency Ministry of ICT, Thailand sugree_ph@thaigrid.or.th
Somsak Sriprayoonsakul
2006-Oct-11 03:35 UTC
[Lustre-discuss] unexpected "no space left on device"
Plase note that my message with the subject "Lustre file system size" is the same system as below. Please reply e-mail to this thread instead of my thread. Sorry for confusion. ----------------------------------------------------------------------------------- Somsak Sriprayoonsakul Thai National Grid Center Software Industry Promotion Agency Ministry of ICT, Thailand somsak_sr@thaigrid.or.th ----------------------------------------------------------------------------------- Sugree Phatanapherom wrote:> Hi, > > I have just installed lfs on CentOS 4.4 successfully and put some > files to the filesystem. However, I got unexpected error message "no > space left on device". The inode seems to be enough (see below). Note > that at this point, I sometimes could create 700MB files successfully > but not 2GB (It terminated with that error message at 200MB). Below is > what I found in /var/log/messages. > > Oct 10 22:49:40 araya kernel: LustreError: > 24869:0:(client.c:579:ptlrpc_check_status()) @@@ type == > PTL_RPC_MSG_ERR, err == -28 > Oct 10 22:49:40 araya kernel: LustreError: > 24869:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous > similar message > Oct 10 22:49:40 araya kernel: LustreError: > 24869:0:(client.c:579:ptlrpc_check_status()) req@000001012bd3c400 > x1217967/t0 o4->lustre-OST0003_UUID@10.255.255.254@tcp:28 lens 352/320 > ref 2 fl Rpc:R/0/0 rc 0/-28 > Oct 10 22:49:40 araya kernel: LustreError: > 24869:0:(client.c:579:ptlrpc_check_status()) Skipped 2 previous > similar messages > > I also attached "df" and "df -i" here. > > [sugree_ph@araya data]$ lfs df > UUID 1K-blocks Used Available Use% Mounted on > lustre-MDT0000_UUID 10498848 1043328 9455520 9 > /mnt/lustre[MDT:0] > lustre-OST0000_UUID 59058092 54138348 4919744 91 > /mnt/lustre[OST:0] > lustre-OST0001_UUID 59058092 58844504 213588 99 > /mnt/lustre[OST:1] > lustre-OST0002_UUID 59058092 42914516 16143576 72 > /mnt/lustre[OST:2] > lustre-OST0003_UUID 59058092 58881128 176964 99 > /mnt/lustre[OST:3] > > filesystem summary: 236232368 214778496 21453872 90 /mnt/lustre > > [sugree_ph@araya data]$ lfs df -i > UUID Inodes IUsed IFree IUse% Mounted on > lustre-MDT0000_UUID 2517606 3726 2513880 0 > /mnt/lustre[MDT:0] > lustre-OST0000_UUID 1980975 1007 1979968 0 > /mnt/lustre[OST:0] > lustre-OST0001_UUID 804358 945 803413 0 > /mnt/lustre[OST:1] > lustre-OST0002_UUID 3751936 1007 3750929 0 > /mnt/lustre[OST:2] > lustre-OST0003_UUID 795260 1003 794257 0 > /mnt/lustre[OST:3] > > filesystem summary: 2517606 3726 2513880 0 /mnt/lustre > > Any suggestion? > > Sugree Phatanapherom > > Thai National Grid Center > Software Industry Promotion Agency > Ministry of ICT, Thailand > sugree_ph@thaigrid.or.th > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
McCabe, Donagh
2006-Oct-11 10:21 UTC
[Lustre-discuss] Re: unexpected "no space left on device"
> [sugree_ph at araya<https://mail.clusterfs.com/mailman/listinfo/lustre-discuss> data]$ lfs df> UUID 1K-blocks Used Available Use% Mounted on > lustre-MDT0000_UUID 10498848 1043328 9455520 9 > /mnt/lustre[MDT:0] > lustre-OST0000_UUID 59058092 54138348 4919744 91 > /mnt/lustre[OST:0] > lustre-OST0001_UUID 59058092 58844504 213588 99 > /mnt/lustre[OST:1] > lustre-OST0002_UUID 59058092 42914516 16143576 72 > /mnt/lustre[OST:2] > lustre-OST0003_UUID 59058092 58881128 176964 99 > /mnt/lustre[OST:3]Most of your OSTs are fairly full. The size of the largest file you can write is determined by the amount of space available on the OST where the file resides. If your file happens to reside on OST0003, you are in trouble -- it''s 99% full. With the default stripe count, the file is placed on a single OST -- so once some OSTs fill it becomes random whether a file will fit or not. In practice, once a single OST is full, the whole filesystem is full. You could try increasing the stripe count so that a large file is distributed over several OSTs. This may be useful when your OSTs are about 70-80% full, but probably no use when some are 99% full. Donagh McCabe Hewlett-Packard. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20061011/519cb4f7/attachment.html
Andreas Dilger
2006-Oct-12 14:30 UTC
[Lustre-discuss] unexpected "no space left on device"
On Oct 11, 2006 11:23 +0700, Sugree Phatanapherom wrote:> I have just installed lfs on CentOS 4.4 successfully and put some files > to the filesystem. However, I got unexpected error message "no space > left on device". The inode seems to be enough (see below). Note that at > this point, I sometimes could create 700MB files successfully but not > 2GB (It terminated with that error message at 200MB). Below is what I > found in /var/log/messages.It depends on your file striping, and on which OST the file is created on.> I also attached "df" and "df -i" here. > > [sugree_ph@araya data]$ lfs df > UUID 1K-blocks Used Available Use% Mounted on > lustre-MDT0000_UUID 10498848 1043328 9455520 9 /mnt/lustre[MDT:0] > lustre-OST0000_UUID 59058092 54138348 4919744 91 /mnt/lustre[OST:0] > lustre-OST0001_UUID 59058092 58844504 213588 99 /mnt/lustre[OST:1] > lustre-OST0002_UUID 59058092 42914516 16143576 72 /mnt/lustre[OST:2] > lustre-OST0003_UUID 59058092 58881128 176964 99 /mnt/lustre[OST:3] > > filesystem summary: 236232368 214778496 21453872 90 /mnt/lustreWhile there is 21GB of free space, the majority is on OST2 and some on OST0. If the file is created on OST1 or OST3 then the file can''t be larger than the free space there. Is this Lustre 1.6? It should be much better about balancing space usage than 1.4. Are you creating very large individual files? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Sugree Phatanapherom
2006-Oct-12 21:13 UTC
[Lustre-discuss] unexpected "no space left on device"
Andreas Dilger wrote:> On Oct 11, 2006 11:23 +0700, Sugree Phatanapherom wrote: > >> I have just installed lfs on CentOS 4.4 successfully and put some files >> to the filesystem. However, I got unexpected error message "no space >> left on device". The inode seems to be enough (see below). Note that at >> this point, I sometimes could create 700MB files successfully but not >> 2GB (It terminated with that error message at 200MB). Below is what I >> found in /var/log/messages. >> > > It depends on your file striping, and on which OST the file is created on. >What if I change the file striping at this time? Does it help?> >> I also attached "df" and "df -i" here. >> >> [sugree_ph@araya data]$ lfs df >> UUID 1K-blocks Used Available Use% Mounted on >> lustre-MDT0000_UUID 10498848 1043328 9455520 9 /mnt/lustre[MDT:0] >> lustre-OST0000_UUID 59058092 54138348 4919744 91 /mnt/lustre[OST:0] >> lustre-OST0001_UUID 59058092 58844504 213588 99 /mnt/lustre[OST:1] >> lustre-OST0002_UUID 59058092 42914516 16143576 72 /mnt/lustre[OST:2] >> lustre-OST0003_UUID 59058092 58881128 176964 99 /mnt/lustre[OST:3] >> >> filesystem summary: 236232368 214778496 21453872 90 /mnt/lustre >> > > While there is 21GB of free space, the majority is on OST2 and some on OST0. > If the file is created on OST1 or OST3 then the file can''t be larger than > the free space there. > > Is this Lustre 1.6? It should be much better about balancing space usage > than 1.4. Are you creating very large individual files? >Yes, it is Lustre 1.6.0 beta 5. We have just installed it last week. Most files are larger than 500MB and some of them are larger than 2GB. Any suggestion to get both performance and capacity in appropriate trade-off? Sugree
Somsak Sriprayoonsakul
2006-Oct-12 21:14 UTC
[Lustre-discuss] unexpected "no space left on device"
An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20061012/ba709fd0/attachment.html
Andreas Dilger
2006-Oct-13 02:22 UTC
[Lustre-discuss] unexpected "no space left on device"
On Oct 13, 2006 10:13 +0700, Sugree Phatanapherom wrote:> Andreas Dilger wrote: > >It depends on your file striping, and on which OST the file is created on. > > What if I change the file striping at this time? Does it help?If you have very large files normally, then having more stripes will help spread the usage of large files over more OSTs. I do find it unusual that the 1.6 OST selection code would let it get so imbalanced though. This can be set via "lfs setstripe" on a particular directory, or via (I believe) tunefs.lustre filesystem-wide on the MDS.> >>I also attached "df" and "df -i" here. > >> > >>[sugree_ph@araya data]$ lfs df > >>UUID 1K-blocks Used Available Use% Mounted on > >>lustre-MDT0000_UUID 10498848 1043328 9455520 9 > >>/mnt/lustre[MDT:0] > >>lustre-OST0000_UUID 59058092 54138348 4919744 91 > >>/mnt/lustre[OST:0] > >>lustre-OST0001_UUID 59058092 58844504 213588 99 > >>/mnt/lustre[OST:1] > >>lustre-OST0002_UUID 59058092 42914516 16143576 72 > >>/mnt/lustre[OST:2] > >>lustre-OST0003_UUID 59058092 58881128 176964 99 > >>/mnt/lustre[OST:3] > >> > >>filesystem summary: 236232368 214778496 21453872 90 /mnt/lustre > > Most files are larger than 500MB and some of them are larger than 2GB. > > Any suggestion to get both performance and capacity in appropriate > trade-off?One thing you can do now is to find some large files on the full OSTs and move them over to the empty ones, something like: lfs find --obd lustre-OST0003_UUID /mnt/lustre -print0 |\ xargs -0 ls -s | sort -nr > /tmp/files.ost3 This copies the 3 largest files on ost3 to ost2: head -3 /tmp/files.ost3 | while read F; do lfs setstripe $F.tmp 0 2 1 cp $F $F.tmp done If all goes well, then remove the original files: head -3 /tmp/files.ost3 | while read F; do mv $F.tmp $F; done Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Brian W. Johanson
2006-Oct-16 16:54 UTC
[Lustre-discuss] unexpected "no space left on device"
Is it possible to disable writes but still allow reads to continue from the OST that is full? I was looking at lctl readonly (Disable writes to the underlying device). Either I can''t figure out the syntax or it does not do what I would expect it to do. brian Andreas Dilger wrote:> On Oct 13, 2006 10:13 +0700, Sugree Phatanapherom wrote: >> Andreas Dilger wrote: >>> It depends on your file striping, and on which OST the file is created on. >> What if I change the file striping at this time? Does it help? > > If you have very large files normally, then having more stripes will help > spread the usage of large files over more OSTs. I do find it unusual that > the 1.6 OST selection code would let it get so imbalanced though. > > This can be set via "lfs setstripe" on a particular directory, or via > (I believe) tunefs.lustre filesystem-wide on the MDS. > >>>> I also attached "df" and "df -i" here. >>>> >>>> [sugree_ph@araya data]$ lfs df >>>> UUID 1K-blocks Used Available Use% Mounted on >>>> lustre-MDT0000_UUID 10498848 1043328 9455520 9 >>>> /mnt/lustre[MDT:0] >>>> lustre-OST0000_UUID 59058092 54138348 4919744 91 >>>> /mnt/lustre[OST:0] >>>> lustre-OST0001_UUID 59058092 58844504 213588 99 >>>> /mnt/lustre[OST:1] >>>> lustre-OST0002_UUID 59058092 42914516 16143576 72 >>>> /mnt/lustre[OST:2] >>>> lustre-OST0003_UUID 59058092 58881128 176964 99 >>>> /mnt/lustre[OST:3] >>>> >>>> filesystem summary: 236232368 214778496 21453872 90 /mnt/lustre >> Most files are larger than 500MB and some of them are larger than 2GB. >> >> Any suggestion to get both performance and capacity in appropriate >> trade-off? > > One thing you can do now is to find some large files on the full OSTs > and move them over to the empty ones, something like: > > lfs find --obd lustre-OST0003_UUID /mnt/lustre -print0 |\ > xargs -0 ls -s | sort -nr > /tmp/files.ost3 > > This copies the 3 largest files on ost3 to ost2: > head -3 /tmp/files.ost3 | while read F; do > lfs setstripe $F.tmp 0 2 1 > cp $F $F.tmp > done > > If all goes well, then remove the original files: > > head -3 /tmp/files.ost3 | while read F; do mv $F.tmp $F; done > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Andreas Dilger
2006-Oct-17 01:24 UTC
[Lustre-discuss] unexpected "no space left on device"
On Oct 16, 2006 18:54 -0400, Brian W. Johanson wrote:> Is it possible to disable writes but still allow reads to continue from > the OST that is full?Yes, on the MDS run "lctl --device {OSC #1 device number} deactivate" will deactivate it on the MDS (i.e. no new objects will be created there) but clients will read, write, unlink existing objects on that OST until the MDS is restarted or you run "lctl --device {OSC dev num} recover".> I was looking at lctl readonly (Disable writes to the underlying > device). Either I can''t figure out the syntax or it does not do what I > would expect it to do. > > brian > > Andreas Dilger wrote: > >On Oct 13, 2006 10:13 +0700, Sugree Phatanapherom wrote: > >>Andreas Dilger wrote: > >>>It depends on your file striping, and on which OST the file is created > >>>on. > >>What if I change the file striping at this time? Does it help? > > > >If you have very large files normally, then having more stripes will help > >spread the usage of large files over more OSTs. I do find it unusual that > >the 1.6 OST selection code would let it get so imbalanced though. > > > >This can be set via "lfs setstripe" on a particular directory, or via > >(I believe) tunefs.lustre filesystem-wide on the MDS. > > > >>>>I also attached "df" and "df -i" here. > >>>> > >>>>[sugree_ph@araya data]$ lfs df > >>>>UUID 1K-blocks Used Available Use% Mounted on > >>>>lustre-MDT0000_UUID 10498848 1043328 9455520 9 > >>>>/mnt/lustre[MDT:0] > >>>>lustre-OST0000_UUID 59058092 54138348 4919744 91 > >>>>/mnt/lustre[OST:0] > >>>>lustre-OST0001_UUID 59058092 58844504 213588 99 > >>>>/mnt/lustre[OST:1] > >>>>lustre-OST0002_UUID 59058092 42914516 16143576 72 > >>>>/mnt/lustre[OST:2] > >>>>lustre-OST0003_UUID 59058092 58881128 176964 99 > >>>>/mnt/lustre[OST:3] > >>>> > >>>>filesystem summary: 236232368 214778496 21453872 90 /mnt/lustre > >>Most files are larger than 500MB and some of them are larger than 2GB. > >> > >>Any suggestion to get both performance and capacity in appropriate > >>trade-off? > > > >One thing you can do now is to find some large files on the full OSTs > >and move them over to the empty ones, something like: > > > >lfs find --obd lustre-OST0003_UUID /mnt/lustre -print0 |\ > > xargs -0 ls -s | sort -nr > /tmp/files.ost3 > > > >This copies the 3 largest files on ost3 to ost2: > >head -3 /tmp/files.ost3 | while read F; do > > lfs setstripe $F.tmp 0 2 1 > > cp $F $F.tmp > >done > > > >If all goes well, then remove the original files: > > > >head -3 /tmp/files.ost3 | while read F; do mv $F.tmp $F; done > > > >Cheers, Andreas > >-- > >Andreas Dilger > >Principal Software Engineer > >Cluster File Systems, Inc. > > > >_______________________________________________ > >Lustre-discuss mailing list > >Lustre-discuss@clusterfs.com > >https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.