robert
2010-Nov-03 20:59 UTC
[Lustre-discuss] locking/perm. denied problem with samba export after update to lustre 2.0
hi all, we recently updated our servers (from centos 5.2, lustre 1.8.1.1) and clients (centos 5.4, from lustre 1.8.3) to centos 5.4 and lustre 2.0.0.1. since then we are experiencing massive problems with users accessing the lustre filesystem through a samba share. we did not have these problems before and as we kept the exporting client os and samba installation and just updated the lustre client on that machine, my first guess is that it might be the new lustre version that does not work together with samba too well. a few soft symptoms: - users browsing directories have to wait very long to see their files - random files are not accessible, under heavy lod it is about every 10th file - the problem increases with heavy load and is almost gone under minimal load - the problem is worse right after a reboot of the osts like on nov. 1st, pls see below - accessig files takes a lot of time (sometimes seconds), even on the linux clinet level - there seems to be a bandwith limit per smbd process, many processes still bring the system to the known transfer rate limits from before the update some log snippets... from the client that exports the lustre fs to samba: --------------------------------------------------- Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) lock at ffff8100243fded0[2 5 -4 1 0 00000006] R(1):[0, 18446744073709551615]@[0x190a601:0xa3d6525c:0x0] { Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) vvp at ffff8101a18e2330: Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) lov at ffff8101727857b0: 1 Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) 0 0: lock at ffff8101c5048228[0 5 0 0 0 00000000] R(1):[0, 18446744073709551615]@[0x100050000:0x659329:0x0] { Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) lovsub at ffff810057ddf7a0: [0 ffff8101727857b0 R(1):[0, 18446744073709551615]@[0x190a601:0xa3d6525c:0x0]] [0 ffff81019dc8a610 R(1):[0, 18446744073709551615]@[0x190a601:0xa3d6525c:0x0]] Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) osc at ffff81010f5e9ce8: ffff81019a661000 00101001 0xc3ee5f2d34f879ed 4 0000000000000000 size: 52428800 mtime: 1287786414 atime: 1288789757 ctime: 1287786414 blocks: 102400 Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8101c5048228 Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8100243fded0 Nov 3 14:14:39 lh1 kernel: LustreError: 2982:0:(cl_lock.c:1371:cl_unuse_locked()) unuse return -4 Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) lock at ffff8101c5048ed0[2 5 -4 1 0 00000006] R(1):[0, 18446744073709551615]@[0x1909f52:0xa3d61076:0x0] { Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) vvp at ffff81014921b218: Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) lov at ffff81004a5a6c90: 1 Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) 0 0: lock at ffff8101c50483d8[0 5 0 0 0 00000000] R(1):[0, 18446744073709551615]@[0x100080000:0x657d44:0x0] { Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) lovsub at ffff81013ad466a0: [0 ffff81004a5a6c90 R(1):[0, 18446744073709551615]@[0x1909f52:0xa3d61076:0x0]] [0 ffff8100441754d8 R(1):[0, 18446744073709551615]@[0x1909f52:0xa3d61076:0x0]] Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) osc at ffff81013f822b88: ffff810138e33240 00101001 0xc3ee5f2d34f87d35 4 0000000000000000 size: 17419974 mtime: 1287761855 atime: 1288789756 ctime: 1287761855 blocks: 34024 Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8101c50483d8 Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) } lock at ffff8101c5048ed0 Nov 3 14:14:58 lh1 kernel: LustreError: 31267:0:(cl_lock.c:1371:cl_unuse_locked()) unuse return -4 Nov 3 14:16:05 lh1 smbd[24032]: [2010/11/03 14:16:05.305407, 0] smbd/open.c:579(open_file) Nov 3 14:16:05 lh1 smbd[24032]: Error doing fstat on open file projects/laconia/scenes/072-040/setup-maya/data/072-040_ocean-fire_v014_simohl/fluidShape1Frame1104.mc (Interrupted system call) -------------------------------------------- there are some errors on all three oss, but not from a corresponding time. however, the locking problem started before nov 1st: -------------------------------------------- Nov 1 13:41:30 oss3 kernel: Lustre: lustre-OST0003: received MDS connection from 192.168.3.205 at o2ib Nov 1 13:41:30 oss3 kernel: Lustre: 3641:0:(lustre_log.h:472:llog_group_set_export()) lustre-OST0003: export for group 0 is changed: 0xffff81006d499800 -> 0xffff81004c5b2800 Nov 1 13:41:30 oss3 kernel: Lustre: 3641:0:(llog_net.c:168:llog_receptor_accept()) changing the import ffff8100550c0000 - ffff810076130000 Nov 1 13:41:30 oss3 kernel: Lustre: 3652:0:(filter.c:2508:filter_llog_connect()) lustre-OST0003: Recovery from log 0x207c003/0x1:1fa2b553 Nov 1 16:34:11 oss3 kernel: LustreError: 3706:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0006: slow i_mutex 54s Nov 1 16:34:11 oss3 kernel: Lustre: 3689:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0006: slow journal start 40s Nov 1 16:34:11 oss3 kernel: LustreError: 3677:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0006: slow direct_io 50s Nov 1 16:34:11 oss3 kernel: LustreError: 3707:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0006: slow i_mutex 52s Nov 1 16:34:11 oss3 kernel: LustreError: 3707:0:(filter_io_26.c:662:filter_commitrw_write()) Skipped 1 previous similar message Nov 1 16:34:12 oss3 kernel: LustreError: 3716:0:(filter_io.c:826:filter_preprw_write()) lustre-OST0006: slow start_page_write 51s Nov 1 17:05:44 oss3 kernel: Lustre: 3698:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0009: slow journal start 31s Nov 1 17:05:44 oss3 kernel: Lustre: 3698:0:(lustre_fsfilt.h:226:fsfilt_start_log()) Skipped 1 previous similar message Nov 1 17:05:44 oss3 kernel: Lustre: 3721:0:(lustre_fsfilt.h:258:fsfilt_brw_start_log()) lustre-OST0009: slow journal start 34s Nov 1 17:05:44 oss3 kernel: Lustre: 3721:0:(filter_io_26.c:675:filter_commitrw_write()) lustre-OST0009: slow brw_start 34s Nov 1 17:05:44 oss3 kernel: Lustre: 3684:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0009: slow direct_io 41s Nov 1 17:15:06 oss3 kernel: LustreError: 3675:0:(lustre_fsfilt.h:258:fsfilt_brw_start_log()) lustre-OST0006: slow journal start 81s Nov 1 17:15:06 oss3 kernel: LustreError: 3675:0:(filter_io_26.c:675:filter_commitrw_write()) lustre-OST0006: slow brw_start 81s Nov 1 17:15:06 oss3 kernel: LustreError: 3705:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0006: slow journal start 79s Nov 1 17:15:06 oss3 kernel: Lustre: 3846:0:(lustre_fsfilt.h:226:fsfilt_start_log()) lustre-OST0006: slow journal start 47s Nov 1 17:15:06 oss3 kernel: Lustre: 3846:0:(lustre_fsfilt.h:226:fsfilt_start_log()) Skipped 1 previous similar message Nov 1 17:15:06 oss3 kernel: Lustre: 3617:0:(filter.c:1434:filter_parent_lock()) lustre-OST0006: slow parent lock 36s Nov 1 17:15:06 oss3 kernel: LustreError: 3670:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0006: slow direct_io 106s Nov 1 17:15:06 oss3 kernel: Lustre: 3620:0:(filter.c:1434:filter_parent_lock()) lustre-OST0006: slow parent lock 32s Nov 1 17:53:53 oss3 kernel: Lustre: 3719:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0003: slow i_mutex 38s Nov 1 17:53:53 oss3 kernel: Lustre: 3696:0:(filter_io_26.c:725:filter_commitrw_write()) lustre-OST0003: slow direct_io 38s Nov 1 17:54:02 oss3 kernel: Lustre: 3685:0:(filter_io_26.c:662:filter_commitrw_write()) lustre-OST0003: slow i_mutex 46s Nov 1 17:54:02 oss3 kernel: Lustre: 3685:0:(filter_io_26.c:662:filter_commitrw_write()) Skipped 1 previous similar message -------------------------------------- any help is very much appreciated! thanks! robert