David Ramsthaler (dramstha)
2007-Jan-27 00:41 UTC
[Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status()
Hi, I am trying to run a performance test on Lustre, running Beta 5 software. I am getting the following error message: LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -28 Any ideas what might be the problem? More detail below. Thanks, -David I have a simple 2-node setup running beta 5 software. One node is exporting a 1 Gig disk. I have created a single file which is as big as I could make it before running out of disk space. On the second node, I have used losetup to create a loop0 device on top of that same shared file. Then I run xdd device test program to read and write to sectors on that loop0 device. The test can run a minute or two without a problem, but then seems to start to degrade. Sometimes the reads return success, but only a partial read of the data, sometimes they return -1 with an errno that just tells me I/O error. [root@cfs2 xdd]# m /proc/version Linux version 2.6.9-42.EL_lustre.1.5.95smp (ltest@client1) (gcc version 3.4.4 20050721 (Red Hat 3.4.4-2 )) #1 SMP Thu Sep 28 06:36:13 MDT 2006 [root@cfs2 xdd]# [root@cfs2 xdd]# dmesg | grep ustre Linux version 2.6.9-42.EL_lustre.1.5.95smp (ltest@client1) (gcc version 3.4.4 20050721 (Red Hat 3.4.4-2)) #1 SMP Thu Sep 28 06:36:13 MDT 2006 inserting floppy driver for 2.6.9-42.EL_lustre.1.5.95smp Lustre: 4421:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Lustre: OBD class driver, info@clusterfs.com Lustre Version: 1.5.95 Build Version: 1.5.95-19691231170000-PRISTINE-.testsuite.tmp.boulder.lbuild-boulder.BUI LD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.EL_lustre.1.5.95smp Lustre: Added LNI 172.19.140.24@tcp [8/256] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; info@clusterfs.com Lustre: mount data: Lustre: profile: testfs-client Lustre: device: 172.19.140.25@tcp:/testfs Lustre: flags: 2 Lustre: 0 UP mgc MGC172.19.140.25@tcp 0f0ebc36-63d0-371d-a826-bcc12a6dc071 5 Lustre: 1 UP lov testfs-clilov-dabdd200 909df521-50e6-3941-af83-e0d66b79277d 3 Lustre: 2 UP mdc testfs-MDT0000-mdc-dabdd200 909df521-50e6-3941-af83-e0d66b79277d 4 Lustre: 3 UP osc testfs-OST0000-osc-dabdd200 909df521-50e6-3941-af83-e0d66b79277d 4 Lustre: mount 172.19.140.25@tcp:/testfs complete LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 1 previous similar message LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 3 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 5 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 11 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 23 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 2489 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 6987 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 12697 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 26157 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 55239 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 119367 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 243661 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 487721 previous similar messages LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type =PTL_RPC_MSG_ERR, err == -28 LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) Skipped 316083 previous similar messages [root@cfs2 xdd]# -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070127/f904dc02/attachment-0001.html
Andreas Dilger
2007-Jan-28 20:17 UTC
[Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status()
On Jan 26, 2007 23:41 -0800, David Ramsthaler (dramstha) wrote:> I am trying to run a performance test on Lustre, running Beta 5 > software. I am getting the following error message: > > LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@ type > == PTL_RPC_MSG_ERR, err == -28-28 = -ENOSPC (per /usr/include/asm/errno.h)> I have a simple 2-node setup running beta 5 software. One node is > exporting a 1 Gig disk. I have created a single file which is as big > as I could make it before running out of disk space. > > On the second node, I have used losetup to create a loop0 device on top > of that same shared file. Then I run xdd device test program to read > and write to sectors on that loop0 device.Why are you exactly creating a loopback file on top of the shared file? That can only hurt performance. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
David Ramsthaler (dramstha)
2007-Jan-29 16:31 UTC
[Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status()
After more testing, I was able to reproduce the problem. As far as I can tell, it seems to be a minor bug (or you could call it a quirk) in Lustre with writes using DIRECT_IO if the file system is full. With DIRECT_IO, doing a write() after seek() to a location with in an existing file gives this error. The identical write() without DIRECT_IO works just fine. The Lustre error (-28) indicates out of memory, but the file is already much bigger than the location the code is trying to write to. So there is enough disk space to write into the file. The same test does work on top of ext3. I expect the problem is that somehow Lustre needs to take some of the disk space for a temporary structure, and cannot get it. The disk is full. The work around is easy - leave some space on the disk. Or don''t use DIRECT_IO. Andreas asked:> Why are you exactly creating a loopback file on top of the sharedfile?> That can only hurt performance.We have some applications that read and write directly to a block device driver. They do this for performance. If we replace the underlying file system with Lustre shared storage, we either need to tell them to change the code to use a file, carve out a non-shared volume for them, or we could provide a block device on a file. I am investigating the performance hit for this approach. Thanks for your help, -David -----Original Message----- From: Andreas Dilger [mailto:adilger@clusterfs.com] Sent: Sunday, January 28, 2007 7:17 PM To: David Ramsthaler (dramstha) Cc: lustre-discuss@clusterfs.com Subject: Re: [Lustre-discuss] Error PTL_RPC_MSG_ERR in ptlrpc_check_status() On Jan 26, 2007 23:41 -0800, David Ramsthaler (dramstha) wrote:> I am trying to run a performance test on Lustre, running Beta 5 > software. I am getting the following error message: > > LustreError: 4504:0:(client.c:579:ptlrpc_check_status()) @@@type> == PTL_RPC_MSG_ERR, err == -28-28 = -ENOSPC (per /usr/include/asm/errno.h)> I have a simple 2-node setup running beta 5 software. One node is > exporting a 1 Gig disk. I have created a single file which is as big > as I could make it before running out of disk space. > > On the second node, I have used losetup to create a loop0 device ontop> of that same shared file. Then I run xdd device test program to read > and write to sectors on that loop0 device.Why are you exactly creating a loopback file on top of the shared file? That can only hurt performance. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.