Goswin von Brederlow
2006-Nov-24 08:12 UTC
[Lustre-discuss] LustreError: lov_update_create_set()
Hi, I''m running Lustre 1.4.6 on a 2.6.15.7 vanilla kernel and am trying to deciver some lustre error messages. The system has 4 systems with 2 2TB OSTs each and default of 4 stripes for files. The lustre is 83% full so there is over 1TB free space. On the server I get: [612824.958378] LustreError: 17495:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x6a79400 sub-object on OST idx 5/4: rc = -28 [612824.973243] LustreError: 17495:0:(lov_request.c:621:lov_update_create_set()) previously skipped 200 similar messages [613607.743311] LustreError: 17510:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5d4cfef sub-object on OST idx 5/4: rc = -28 [613607.758653] LustreError: 17510:0:(lov_request.c:621:lov_update_create_set()) previously skipped 47 similar messages [616495.778648] LustreError: 17503:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5d4d013 sub-object on OST idx 5/4: rc = -28 [616495.793510] LustreError: 17503:0:(lov_request.c:621:lov_update_create_set()) previously skipped 71 similar messages [617068.096215] LustreError: 17509:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5d4d014 sub-object on OST idx 3/4: rc = -28 [617068.111559] LustreError: 17509:0:(lov_request.c:621:lov_update_create_set()) previously skipped 2 similar messages [617271.813408] LustreError: 17485:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5ada805 sub-object on OST idx 3/4: rc = -28 [617271.828346] LustreError: 17485:0:(lov_request.c:621:lov_update_create_set()) previously skipped 139 similar messages [617640.257686] LustreError: 17486:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5d4d05d sub-object on OST idx 3/4: rc = -28 [617640.272617] LustreError: 17486:0:(lov_request.c:621:lov_update_create_set()) previously skipped 7 similar messages [618312.126431] LustreError: 17486:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5d4d15a sub-object on OST idx 5/4: rc = -28 [618312.141778] LustreError: 17486:0:(lov_request.c:621:lov_update_create_set()) previously skipped 504 similar messages [618927.592338] LustreError: 17503:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5d4d1c2 sub-object on OST idx 3/4: rc = -28 [618927.607685] LustreError: 17503:0:(lov_request.c:621:lov_update_create_set()) previously skipped 211 similar messages [619532.920758] LustreError: 17506:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x5d4d21d sub-object on OST idx 3/4: rc = -28 [619532.935787] LustreError: 17506:0:(lov_request.c:621:lov_update_create_set()) previously skipped 183 similar messages And on the client: [13038.954468] LustreError: 1737:0:(openiblnd_cb.c:1982:kibnal_active_conn_callback()) Connection ffff81002fdf44c0 -> 172.17.3.253@openib IDLE [13038.971282] Lustre: 9:0:(openiblnd_cb.c:1975:kibnal_active_conn_callback()) Connection ffff8100530af680 -> 172.17.3.253@openib ESTABLISHED [13039.815435] Lustre: 9:0:(openiblnd_cb.c:1975:kibnal_active_conn_callback()) Connection ffff810054239680 -> 172.17.3.21@openib ESTABLISHED [13850.263066] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810048271800 x611/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [13850.297240] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810048271800 x612/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [13887.466952] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810044331600 x626/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [14246.923972] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff81005a57cc00 x767/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [14246.952805] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) previously skipped 1 similar messages [14391.012533] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810070ee0a00 x821/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [14571.014542] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810044331600 x882/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [14744.830493] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff81007ac52e00 x953/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [14784.793279] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810048271400 x971/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [18776.892294] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810014b9e200 x3036/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [18875.752660] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff81001f263000 x3090/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [19569.616318] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff81007b3fc000 x3376/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [19705.117998] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff81007ac52800 x3421/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [19850.093564] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff810060405c00 x3474/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [21025.911361] LustreError: 10462:0:(client.c:577:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -5 req@ffff81000516a800 x3965/t0 o3->ost1-1_UUID@sn-03-1_UUID:28 lens 328/280 ref 2 fl Rpc:R/0/0 rc 0/-5 [23563.464898] LustreError: 15351:0:(lov_obd.c:1236:lov_punch()) error: punch objid 0x6ad2af3 subobj 0x8b111c on OST idx 0: rc = -30 [24644.095006] LustreError: 15983:0:(lov_obd.c:1236:lov_punch()) error: punch objid 0x6ad2afb subobj 0x8b1948 on OST idx 7: rc = -30 Any ideas what is going wrong or tips how to deciver what those errors mean? MfG Goswin
Andreas Dilger
2006-Nov-24 12:00 UTC
[Lustre-discuss] LustreError: lov_update_create_set()
On Nov 24, 2006 16:12 +0100, Goswin von Brederlow wrote:> I''m running Lustre 1.4.6 on a 2.6.15.7 vanilla kernel and am trying to > deciver some lustre error messages. The system has 4 systems with 2 > 2TB OSTs each and default of 4 stripes for files. The lustre is 83% > full so there is over 1TB free space.Try on a client "lfs df" (if this is in 1.4.6, not sure), or alternately "grep ''[0-9]'' /proc/fs/lustre/osc/*/kbytes*" to see free space per OST. Also check "lfs df -i" or "grep ''[0-9]'' /proc/fs/lustre/osc/*/files*" to see free inodes per OST.> On the server I get: > > [612824.958378] LustreError: 17495:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x6a79400 sub-object on OST idx 5/4: rc = -28/usr/include/asm/errno.h says -28 is "No space left on device" and the message reports OST idx 5 is the one out of space. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Goswin von Brederlow
2006-Nov-27 05:43 UTC
[Lustre-discuss] LustreError: lov_update_create_set()
Andreas Dilger <adilger@clusterfs.com> writes:> On Nov 24, 2006 16:12 +0100, Goswin von Brederlow wrote: >> I''m running Lustre 1.4.6 on a 2.6.15.7 vanilla kernel and am trying to >> deciver some lustre error messages. The system has 4 systems with 2 >> 2TB OSTs each and default of 4 stripes for files. The lustre is 83% >> full so there is over 1TB free space. > > Try on a client "lfs df" (if this is in 1.4.6, not sure), or alternately > "grep ''[0-9]'' /proc/fs/lustre/osc/*/kbytes*" to see free space per OST. > > Also check "lfs df -i" or "grep ''[0-9]'' /proc/fs/lustre/osc/*/files*" to > see free inodes per OST.There is sufficient space now. The OSTs are slightly different in size but none has less than 70G free now. Plent< of inodes too. Our guess, after your info, is that at the time of the error there must have been a big job using up all space. And upon failing it has cleaned up freeing the space again.>> On the server I get: >> >> [612824.958378] LustreError: 17495:0:(lov_request.c:621:lov_update_create_set()) error creating fid 0x6a79400 sub-object on OST idx 5/4: rc = -28 > > /usr/include/asm/errno.h says -28 is "No space left on device" and the > message reports OST idx 5 is the one out of space.Thanks. So the error numbers do correlate. I wasn''t sure about that given the amount of free space left in total. MfG Goswin