Rajendra prasad
2011-Apr-29 15:04 UTC
[Lustre-discuss] ost_write operation failed with -28 in 1.8.5 lustre client
Hi All, I am running lustre servers on 1.8.5 (recently upgraded from 1.8.2). Clients are still on 1.8.2 . I am getting the error "ost_write operation failed with -28" in the clients. Due to this i am getting error message as "No space left on the device" oftenly. As per lfs df -h output all the OSTs are occupied around 55% only. lfs df -h UUID bytes Used Available Use% Mounted on lustre-MDT0000_UUID 52.3G 4.2G 48.1G 8% /opt/lustre[MDT:0] lustre-OST0000_UUID 442.9G 245.6G 197.3G 55% /opt/lustre[OST:0] lustre-OST0001_UUID 442.9G 238.7G 204.3G 53% /opt/lustre[OST:1] lustre-OST0002_UUID 442.9G 243.2G 199.7G 54% /opt/lustre[OST:2] lustre-OST0003_UUID 442.9G 236.5G 206.5G 53% /opt/lustre[OST:3] lustre-OST0004_UUID 442.9G 234.8G 208.1G 53% /opt/lustre[OST:4] lustre-OST0005_UUID 442.9G 239.7G 203.3G 54% /opt/lustre[OST:5] lustre-OST0006_UUID 442.9G 237.2G 205.7G 53% /opt/lustre[OST:6] lustre-OST0007_UUID 442.9G 227.9G 215.0G 51% /opt/lustre[OST:7] filesystem summary: 3.5T 1.9T 1.6T 53% /opt/lustre As per the below bugzilla, i have upgraded one of the lustre client verstion to 1.8.5 but still the issue persist in that client. https://bugzilla.lustre.org/show_bug.cgi?id=22755 Lustre clients are on Suse linux 10.1 . In order to install lustre client packages of 1.8.5, i have upgraded the Suse kernel also. I have also checked and found that no quota are enabled in the clients. lfs quota -u 36401 /opt/lustre Disk quotas for user 36401 (uid 36401): Filesystem kbytes quota limit grace files quota limit grace /opt/lustre 127315748 0 0 - 1001083 0 0 - Below are the lustre client packages i have installed. lustre-client-modules-1.8.5-2.6.16_60_0.69.1_lustre.1.8.5_smp lustre-client-1.8.5-2.6.16_60_0.69.1_lustre.1.8.5_smp Suse kernel packages installed: kernel-default-2.6.16.60-0.69.1 kernel-source-2.6.16.60-0.69.1 kernel-smp-2.6.16.60-0.69.1 kernel-syms-2.6.16.60-0.69.1 Error: Apr 29 15:35:55 hostname kernel: LustreError: 11-0: an error occurred while communicating with 172.16.x.x at tcp. The ost_write operation failed with -28 Apr 29 15:35:55 hostname kernel: LustreError: Skipped 9657 previous similar messages Apr 29 15:38:03 hostname kernel: LustreError: 11-0: an error occurred while communicating with 172.16.x.x at tcp. The ost_write operation failed with -28 Kindly suggest. Regards, Prasad -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110429/d570c715/attachment-0001.html
Ms. Megan Larko
2011-May-02 17:19 UTC
[Lustre-discuss] ost_write operation failed with -28 in 1.8.5 lustre client
Hello, Just one very small suggestion: How are your inodes on your MDT? If one runs out of inodes then a system appears to be full because no additional inode pointers may be issued to link the data to a location/starting point. A "df -i" on the MDT can answers this question. Good Luck, megan
Andreas Dilger
2011-May-02 19:58 UTC
[Lustre-discuss] ost_write operation failed with -28 in 1.8.5 lustre client
On Apr 29, 2011, at 09:04, Rajendra prasad wrote:> I am running lustre servers on 1.8.5 (recently upgraded from 1.8.2). Clients are still on 1.8.2 . > > I am getting the error "ost_write operation failed with -28" in the clients. Due to this i am getting error message as "No space left on the device" oftenly. As per lfs df -h output all the OSTs are occupied around 55% only. > > lfs df -h > UUID bytes Used Available Use% Mounted on > lustre-MDT0000_UUID 52.3G 4.2G 48.1G 8% /opt/lustre[MDT:0] > lustre-OST0000_UUID 442.9G 245.6G 197.3G 55% /opt/lustre[OST:0] > lustre-OST0001_UUID 442.9G 238.7G 204.3G 53% /opt/lustre[OST:1] > lustre-OST0002_UUID 442.9G 243.2G 199.7G 54% /opt/lustre[OST:2] > lustre-OST0003_UUID 442.9G 236.5G 206.5G 53% /opt/lustre[OST:3] > lustre-OST0004_UUID 442.9G 234.8G 208.1G 53% /opt/lustre[OST:4] > lustre-OST0005_UUID 442.9G 239.7G 203.3G 54% /opt/lustre[OST:5] > lustre-OST0006_UUID 442.9G 237.2G 205.7G 53% /opt/lustre[OST:6] > lustre-OST0007_UUID 442.9G 227.9G 215.0G 51% /opt/lustre[OST:7] > filesystem summary: 3.5T 1.9T 1.6T 53% /opt/lustre > As per the below bugzilla, i have upgraded one of the lustre client verstion to 1.8.5 but still the issue persist in that client. > > https://bugzilla.lustre.org/show_bug.cgi?id=22755 > > Lustre clients are on Suse linux 10.1 . In order to install lustre client packages of 1.8.5, i have upgraded the Suse kernel also.How many clients do you have? I don''t think this is an inode problem, since it wouldn''t fail with ENOSPC during ost_write. There is also a problem with clients holding all of the space in grants (about 32MB/client/OST) as described in the above bug. However, unless you have upgraded ALL of the clients to 1.8.5, that problem will not be fixed. Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
Rajendra prasad
2011-May-04 03:16 UTC
[Lustre-discuss] ost_write operation failed with -28 in 1.8.5 lustre client
Hi, Thank you for the info. There are around 600 clients in the setup. Due to an issue, i had restarted the OSS servers and remounted the OSTs yesterday.. Post which i am not seeing these errors till now. However I will upgrade the lustre client version to 1.8.5 in all the clients. Regards, Prasad On Tue, May 3, 2011 at 1:28 AM, Andreas Dilger <adilger at whamcloud.com>wrote:> On Apr 29, 2011, at 09:04, Rajendra prasad wrote: > > I am running lustre servers on 1.8.5 (recently upgraded from 1.8.2). > Clients are still on 1.8.2 . > > > > I am getting the error "ost_write operation failed with -28" in the > clients. Due to this i am getting error message as "No space left on the > device" oftenly. As per lfs df -h output all the OSTs are occupied around > 55% only. > > > > lfs df -h > > UUID bytes Used Available Use% Mounted on > > lustre-MDT0000_UUID 52.3G 4.2G 48.1G 8% > /opt/lustre[MDT:0] > > lustre-OST0000_UUID 442.9G 245.6G 197.3G 55% > /opt/lustre[OST:0] > > lustre-OST0001_UUID 442.9G 238.7G 204.3G 53% > /opt/lustre[OST:1] > > lustre-OST0002_UUID 442.9G 243.2G 199.7G 54% > /opt/lustre[OST:2] > > lustre-OST0003_UUID 442.9G 236.5G 206.5G 53% > /opt/lustre[OST:3] > > lustre-OST0004_UUID 442.9G 234.8G 208.1G 53% > /opt/lustre[OST:4] > > lustre-OST0005_UUID 442.9G 239.7G 203.3G 54% > /opt/lustre[OST:5] > > lustre-OST0006_UUID 442.9G 237.2G 205.7G 53% > /opt/lustre[OST:6] > > lustre-OST0007_UUID 442.9G 227.9G 215.0G 51% > /opt/lustre[OST:7] > > filesystem summary: 3.5T 1.9T 1.6T 53% /opt/lustre > > As per the below bugzilla, i have upgraded one of the lustre client > verstion to 1.8.5 but still the issue persist in that client. > > > > https://bugzilla.lustre.org/show_bug.cgi?id=22755 > > > > Lustre clients are on Suse linux 10.1 . In order to install lustre client > packages of 1.8.5, i have upgraded the Suse kernel also. > > How many clients do you have? I don''t think this is an inode problem, > since it wouldn''t fail with ENOSPC during ost_write. There is also a > problem with clients holding all of the space in grants (about > 32MB/client/OST) as described in the above bug. However, unless you have > upgraded ALL of the clients to 1.8.5, that problem will not be fixed. > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110504/a28d3adb/attachment.html
David Noriega
2011-May-31 19:40 UTC
[Lustre-discuss] ost_write operation failed with -28 in 1.8.5 lustre client
We are running lustre 1.8.4 and I can confirm that I see this message on one of our clients, the ''file server.'' It serves up the lustre fs to machines outside our network via samba and nfs. On other clients(nodes in our compute cluster), I see the same message on a few times, though it says "-19" or in one case "-107" as the error number. Though just as they reported, we''ve had a few users say they have gotten a message saying the filesystem is full, even though its not. On Fri, Apr 29, 2011 at 10:04 AM, Rajendra prasad <rajendra.dn at gmail.com> wrote:> Hi All, > > I am running lustre servers on 1.8.5 (recently upgraded from 1.8.2). > Clients?are still?on 1.8.2 . > > I am getting the error "ost_write operation failed with -28" in the clients. > Due to this?i am getting error?message as "No space left on the device" > oftenly. As per lfs df -h output all the OSTs are occupied around?55% only. > > lfs df -h > UUID?????????????????????? bytes??????? Used?? Available Use% Mounted on > lustre-MDT0000_UUID??????? 52.3G??????? 4.2G?????? 48.1G?? 8% > /opt/lustre[MDT:0] > lustre-OST0000_UUID?????? 442.9G????? 245.6G????? 197.3G? 55% > /opt/lustre[OST:0] > lustre-OST0001_UUID?????? 442.9G????? 238.7G????? 204.3G? 53% > /opt/lustre[OST:1] > lustre-OST0002_UUID?????? 442.9G????? 243.2G????? 199.7G? 54% > /opt/lustre[OST:2] > lustre-OST0003_UUID?????? 442.9G????? 236.5G????? 206.5G? 53% > /opt/lustre[OST:3] > lustre-OST0004_UUID?????? 442.9G????? 234.8G????? 208.1G? 53% > /opt/lustre[OST:4] > lustre-OST0005_UUID?????? 442.9G????? 239.7G????? 203.3G? 54% > /opt/lustre[OST:5] > lustre-OST0006_UUID?????? 442.9G????? 237.2G????? 205.7G? 53% > /opt/lustre[OST:6] > lustre-OST0007_UUID?????? 442.9G????? 227.9G????? 215.0G? 51% > /opt/lustre[OST:7] > filesystem summary:???????? 3.5T??????? 1.9T??????? 1.6T? 53% /opt/lustre > As per the below bugzilla, i have upgraded one of the lustre client verstion > to 1.8.5 but still the issue persist in that client. > > ??????????????? https://bugzilla.lustre.org/show_bug.cgi?id=22755 > > Lustre clients are on Suse linux 10.1 . In order to install lustre client > packages of 1.8.5, i have upgraded the Suse kernel also. > > I have also checked and found that no quota are enabled in the clients. > > lfs quota -u 36401 /opt/lustre > Disk quotas for user 36401 (uid 36401): > ???? Filesystem? kbytes?? quota?? limit?? grace?? files?? quota?? limit > grace > ??? /opt/lustre 127315748?????? 0?????? 0?????? - 1001083?????? 0 > 0?????? - > Below are the lustre client?packages i have installed. > > > lustre-client-modules-1.8.5-2.6.16_60_0.69.1_lustre.1.8.5_smp > > lustre-client-1.8.5-2.6.16_60_0.69.1_lustre.1.8.5_smp > > > > Suse kernel packages installed: > > > > kernel-default-2.6.16.60-0.69.1 > > kernel-source-2.6.16.60-0.69.1 > > kernel-smp-2.6.16.60-0.69.1 > > kernel-syms-2.6.16.60-0.69.1 > > > > Error: > > Apr 29 15:35:55?hostname kernel: LustreError: 11-0: an error occurred while > communicating with 172.16.x.x at tcp. The ost_write operation failed with -28 > > Apr 29 15:35:55?hostname kernel: LustreError: Skipped 9657 previous similar > messages > > Apr 29 15:38:03?hostname kernel: LustreError: 11-0: an error occurred while > communicating with 172.16.x.x at tcp. The ost_write operation failed with -28 > > > > Kindly suggest. > > > > Regards, > > Prasad > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-- Personally, I liked the university. They gave us money and facilities, we didn''t have to produce anything! You''ve never been out of college! You don''t know what it''s like out there! I''ve worked in the private sector. They expect results. -Ray Ghostbusters