David Ramsthaler (dramstha)
2007-Jan-27 00:41 UTC
[Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status()
Hi,
I am trying to run a performance test on Lustre, running Beta 5
software. I am getting the following error message:
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
== PTL_RPC_MSG_ERR, err == -28
Any ideas what might be the problem?
More detail below.
Thanks,
-David
I have a simple 2-node setup running beta 5 software. One node is
exporting a 1 Gig disk. I have created a single
file which is as big as I could make it before running out of disk
space.
On the second node, I have used losetup to create a loop0 device on top
of that same shared file. Then I run
xdd device test program to read and write to sectors on that loop0
device. The test can run a minute or two without
a problem, but then seems to start to degrade. Sometimes the reads
return success, but
only a partial read of the data, sometimes they return -1 with an errno
that just tells me I/O error.
[root@cfs2 xdd]# m /proc/version
Linux version 2.6.9-42.EL_lustre.1.5.95smp (ltest@client1) (gcc version
3.4.4 20050721 (Red Hat 3.4.4-2
)) #1 SMP Thu Sep 28 06:36:13 MDT 2006
[root@cfs2 xdd]#
[root@cfs2 xdd]# dmesg | grep ustre
Linux version 2.6.9-42.EL_lustre.1.5.95smp (ltest@client1) (gcc version
3.4.4 20050721 (Red Hat 3.4.4-2)) #1 SMP Thu Sep 28 06:36:13 MDT 2006
inserting floppy driver for 2.6.9-42.EL_lustre.1.5.95smp
Lustre: 4421:0:(module.c:382:init_libcfs_module()) maximum lustre stack
8192
Lustre: OBD class driver, info@clusterfs.com
Lustre Version: 1.5.95
Build Version:
1.5.95-19691231170000-PRISTINE-.testsuite.tmp.boulder.lbuild-boulder.BUI
LD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.EL_lustre.1.5.95smp
Lustre: Added LNI 172.19.140.24@tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; info@clusterfs.com
Lustre: mount data:
Lustre: profile: testfs-client
Lustre: device: 172.19.140.25@tcp:/testfs
Lustre: flags: 2
Lustre: 0 UP mgc MGC172.19.140.25@tcp
0f0ebc36-63d0-371d-a826-bcc12a6dc071 5
Lustre: 1 UP lov testfs-clilov-dabdd200
909df521-50e6-3941-af83-e0d66b79277d 3
Lustre: 2 UP mdc testfs-MDT0000-mdc-dabdd200
909df521-50e6-3941-af83-e0d66b79277d 4
Lustre: 3 UP osc testfs-OST0000-osc-dabdd200
909df521-50e6-3941-af83-e0d66b79277d 4
Lustre: mount 172.19.140.25@tcp:/testfs complete
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1
previous similar message
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 3
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 5
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 11
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 23
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 2489
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 6987
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 12697
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 26157
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 55239
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 119367
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 243661
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 487721
previous similar messages
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type
=PTL_RPC_MSG_ERR, err == -28
LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 316083
previous similar messages
[root@cfs2 xdd]#
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070127/f904dc02/attachment-0001.html
Andreas Dilger
2007-Jan-28 20:17 UTC
[Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status()
On Jan 26, 2007 23:41 -0800, David Ramsthaler (dramstha) wrote:> I am trying to run a performance test on Lustre, running Beta 5 > software. I am getting the following error message: > > LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type > == PTL_RPC_MSG_ERR, err == -28-28 = -ENOSPC (per /usr/include/asm/errno.h)> I have a simple 2-node setup running beta 5 software. One node is > exporting a 1 Gig disk. I have created a single file which is as big > as I could make it before running out of disk space. > > On the second node, I have used losetup to create a loop0 device on top > of that same shared file. Then I run xdd device test program to read > and write to sectors on that loop0 device.Why are you exactly creating a loopback file on top of the shared file? That can only hurt performance. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
David Ramsthaler (dramstha)
2007-Jan-29 16:31 UTC
[Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status()
After more testing, I was able to reproduce the problem. As far as I can tell, it seems to be a minor bug (or you could call it a quirk) in Lustre with writes using DIRECT_IO if the file system is full. With DIRECT_IO, doing a write() after seek() to a location with in an existing file gives this error. The identical write() without DIRECT_IO works just fine. The Lustre error (-28) indicates out of memory, but the file is already much bigger than the location the code is trying to write to. So there is enough disk space to write into the file. The same test does work on top of ext3. I expect the problem is that somehow Lustre needs to take some of the disk space for a temporary structure, and cannot get it. The disk is full. The work around is easy - leave some space on the disk. Or don''t use DIRECT_IO. Andreas asked:> Why are you exactly creating a loopback file on top of the sharedfile?> That can only hurt performance.We have some applications that read and write directly to a block device driver. They do this for performance. If we replace the underlying file system with Lustre shared storage, we either need to tell them to change the code to use a file, carve out a non-shared volume for them, or we could provide a block device on a file. I am investigating the performance hit for this approach. Thanks for your help, -David -----Original Message----- From: Andreas Dilger [mailto:adilger@clusterfs.com] Sent: Sunday, January 28, 2007 7:17 PM To: David Ramsthaler (dramstha) Cc: lustre-discuss@clusterfs.com Subject: Re: [Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status() On Jan 26, 2007 23:41 -0800, David Ramsthaler (dramstha) wrote:> I am trying to run a performance test on Lustre, running Beta 5 > software. I am getting the following error message: > > LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@type> == PTL_RPC_MSG_ERR, err == -28-28 = -ENOSPC (per /usr/include/asm/errno.h)> I have a simple 2-node setup running beta 5 software. One node is > exporting a 1 Gig disk. I have created a single file which is as big > as I could make it before running out of disk space. > > On the second node, I have used losetup to create a loop0 device ontop> of that same shared file. Then I run xdd device test program to read > and write to sectors on that loop0 device.Why are you exactly creating a loopback file on top of the shared file? That can only hurt performance. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.