Hello guys, I`m using the latest feature release (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm) + centos 6.4. Lustre itself is working fine, but when I export it with samba and try to connect with a windows 7 client I get : Jun 24 12:53:14 R-82L kernel: LustreError: 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 Jun 24 12:53:16 R-82L kernel: LustreError: 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 Jun 24 12:53:16 R-82L kernel: LustreError: 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages Jun 24 12:53:16 R-82L kernel: LustreError: 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 Jun 24 12:53:16 R-82L kernel: LustreError: 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages And on the windows client I get "You don''t have permissions to access ....." error. Permissions are 777. I created a local share just to test the samba server and its working. The error pops up only when trying to access samba share with lustre backend storage. Any help will be greatly appreciated. Cheers, _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Does your uid in Windows match the uid on the samba server? Does the samba account exist? I ran into similar issues with NFS. On Mon, Jun 24, 2013 at 5:59 AM, Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:> Hello guys, > > I`m using the latest feature release > (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm) > + centos 6.4. Lustre itself is working fine, but when I export it with > samba and try to connect with a windows 7 client I get : > > Jun 24 12:53:14 R-82L kernel: LustreError: > 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 > Jun 24 12:53:16 R-82L kernel: LustreError: > 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 > Jun 24 12:53:16 R-82L kernel: LustreError: > 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages > Jun 24 12:53:16 R-82L kernel: LustreError: > 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 > Jun 24 12:53:16 R-82L kernel: LustreError: > 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages > > And on the windows client I get "You don''t have permissions to access > ....." error. Permissions are 777. I created a local share just to test the > samba server and its working. The error pops up only when trying to access > samba share with lustre backend storage. > > Any help will be greatly appreciated. > > Cheers, > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Thank you all for your quick responses. Unfortunately my own stupidity played me this time ... the MDS server was not a domain member. After joining it it works OK. Cheers, :( On Mon, Jun 24, 2013 at 8:48 PM, Michael Watters <wattersmt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:> Does your uid in Windows match the uid on the samba server? Does the > samba account exist? I ran into similar issues with NFS. > > > On Mon, Jun 24, 2013 at 5:59 AM, Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: > >> Hello guys, >> >> I`m using the latest feature release >> (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm) >> + centos 6.4. Lustre itself is working fine, but when I export it with >> samba and try to connect with a windows 7 client I get : >> >> Jun 24 12:53:14 R-82L kernel: LustreError: >> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >> Jun 24 12:53:16 R-82L kernel: LustreError: >> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >> Jun 24 12:53:16 R-82L kernel: LustreError: >> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >> Jun 24 12:53:16 R-82L kernel: LustreError: >> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >> Jun 24 12:53:16 R-82L kernel: LustreError: >> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >> >> And on the windows client I get "You don''t have permissions to access >> ....." error. Permissions are 777. I created a local share just to test the >> samba server and its working. The error pops up only when trying to access >> samba share with lustre backend storage. >> >> Any help will be greatly appreciated. >> >> Cheers, >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
I don''t to start a new thread. But something fishy is going on. The client I write you about exports Lustre via samba for some windows users. The client has a 10Gigabit Ethernet NIC. Usually it works fine, but sometimes this happens : Sep 4 16:07:22 cache kernel: LustreError: Skipped 1 previous similar message Sep 4 16:07:22 cache kernel: LustreError: 167-0: lustre0-MDT0000-mdc-ffff88062edb0c00: This client was evicted by lustre0-MDT0000; in progress operations using this service will fail. Sep 4 16:07:22 cache kernel: LustreError: 7411:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -5 Sep 4 16:07:22 cache kernel: LustreError: 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115380745406215 mdc close failed: rc = -108 Sep 4 16:07:22 cache smbd[7580]: [2013/09/04 16:07:22.451276, 0] smbd/process.c:2440(keepalive_fn) Sep 4 16:07:22 cache kernel: LustreError: 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115307009623849 mdc close failed: rc = -108 Sep 4 16:07:22 cache kernel: LustreError: 7411:0:(file.c:159:ll_close_inode_openhandle()) Skipped 1 previous similar message Sep 4 16:07:22 cache kernel: LustreError: 6553:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -108 Sep 4 16:07:22 cache kernel: LustreError: 6553:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 21 previous similar messages Sep 4 16:07:22 cache kernel: LustreError: 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout [0x200002b8a:0x119dd:0x0] error -108. Sep 4 16:07:22 cache kernel: LustreError: 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout [0x200002b8a:0x119dd:0x0] error -108. Sep 4 16:07:22 cache kernel: LustreError: 6519:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at 0: rc -108 Sep 4 16:07:22 cache kernel: LustreError: 6519:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at 0: rc -108 Sep 4 16:07:22 cache kernel: LustreError: 6553:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at 0: rc -108 Sep 4 16:07:22 cache kernel: LustreError: 6553:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at 0: rc -108 Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.469729, 0] smbd/dfree.c:137(sys_disk_free) Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. Error was : Cannot send after transport endpoint shutdown Sep 4 16:07:22 cache kernel: LustreError: 6517:0:(lmv_obd.c:1289:lmv_statfs()) can''t stat MDS #0 (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108 Sep 4 16:07:22 cache kernel: LustreError: 6517:0:(llite_lib.c:1610:ll_statfs_internal()) md_statfs fails: rc = -108 Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470531, 0] smbd/dfree.c:137(sys_disk_free) Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. Error was : Cannot send after transport endpoint shutdown Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470922, 0] smbd/dfree.c:137(sys_disk_free) Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. Error was : Cannot send after transport endpoint shutdown Sep 4 16:07:22 cache kernel: LustreError: 6517:0:(lmv_obd.c:1289:lmv_statfs()) can''t stat MDS #0 (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108 Sep 4 16:07:22 cache kernel: LustreError: 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553] Sep 4 16:07:22 cache kernel: LustreError: 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553] Sep 4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517112, 0] smbd/dfree.c:137(sys_disk_free) Sep 4 16:07:22 cache smbd[8958]: disk_free: sys_fsusage() failed. Error was : Cannot send after transport endpoint shutdown Sep 4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517496, 0] smbd/dfree.c:137(sys_disk_free) Eventually it connects again to the MDS: Sep 4 16:07:31 cache kernel: LustreError: 6582:0:(ldlm_resource.c:811:ldlm_resource_complain()) Resource: ffff880a125f7e40 (8589945618/117308/0/0) (rc: 1) Sep 4 16:07:31 cache kernel: LustreError: 6582:0:(ldlm_resource.c:1423:ldlm_resource_dump()) --- Resource: ffff880a125f7e40 (8589945618/117308/0/0) (rc: 2) Sep 4 16:07:31 cache kernel: Lustre: lustre0-MDT0000-mdc-ffff88062edb0c00: Connection restored to lustre0-MDT0000 (at 192.168.11.23@tcp) But now the load average hits to high heaven ... 25 or even 50 for a 16 CPU machine. And shortly after that : Sep 4 16:08:14 cache kernel: INFO: task ptlrpcd_6:2425 blocked for more than 120 seconds. Sep 4 16:08:14 cache kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 4 16:08:14 cache kernel: ptlrpcd_6 D 000000000000000c 0 2425 2 0x00000080 Sep 4 16:08:14 cache kernel: ffff880c34ab7a10 0000000000000046 0000000000000000 ffffffffa0a01736 Sep 4 16:08:14 cache kernel: ffff880c34ab79d0 ffffffffa09fc199 ffff88062e274000 ffff880c34cf8000 Sep 4 16:08:14 cache kernel: ffff880c34aae638 ffff880c34ab7fd8 000000000000fb88 ffff880c34aae638 Sep 4 16:08:14 cache kernel: Call Trace: Sep 4 16:08:14 cache kernel: [<ffffffffa0a01736>] ? ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] Sep 4 16:08:14 cache kernel: [<ffffffffa09fc199>] ? ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] Sep 4 16:08:14 cache kernel: [<ffffffff8150f1ee>] __mutex_lock_slowpath+0x13e/0x180 Sep 4 16:08:14 cache kernel: [<ffffffff8150f08b>] mutex_lock+0x2b/0x50 Sep 4 16:08:14 cache kernel: [<ffffffffa0751ecf>] cl_lock_mutex_get+0x6f/0xd0 [obdclass] Sep 4 16:08:14 cache kernel: [<ffffffffa0b5b469>] lovsub_parent_lock+0x49/0x120 [lov] Sep 4 16:08:14 cache kernel: [<ffffffffa0b5c60f>] lovsub_lock_modify+0x7f/0x1e0 [lov] Sep 4 16:08:14 cache kernel: [<ffffffffa07514d8>] cl_lock_modify+0x98/0x310 [obdclass] Sep 4 16:08:14 cache kernel: [<ffffffffa0748dae>] ? cl_object_attr_unlock+0xe/0x20 [obdclass] Sep 4 16:08:14 cache kernel: [<ffffffffa0ac1e52>] ? osc_lock_lvb_update+0x1a2/0x470 [osc] Sep 4 16:08:14 cache kernel: [<ffffffffa0ac2302>] osc_lock_granted+0x1e2/0x2b0 [osc] Sep 4 16:08:14 cache kernel: [<ffffffffa0ac30b0>] osc_lock_upcall+0x3f0/0x5e0 [osc] Sep 4 16:08:14 cache kernel: [<ffffffffa0ac2cc0>] ? osc_lock_upcall+0x0/0x5e0 [osc] Sep 4 16:08:14 cache kernel: [<ffffffffa0aa3876>] osc_enqueue_fini+0x106/0x240 [osc] Sep 4 16:08:14 cache kernel: [<ffffffffa0aa82c2>] osc_enqueue_interpret+0xe2/0x1e0 [osc] Sep 4 16:08:14 cache kernel: [<ffffffffa0884d2c>] ptlrpc_check_set+0x2ac/0x1b20 [ptlrpc] Sep 4 16:08:14 cache kernel: [<ffffffffa08b1c7b>] ptlrpcd_check+0x53b/0x560 [ptlrpc] Sep 4 16:08:14 cache kernel: [<ffffffffa08b21a3>] ptlrpcd+0x233/0x390 [ptlrpc] Sep 4 16:08:14 cache kernel: [<ffffffff81063310>] ? default_wake_function+0x0/0x20 Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 [ptlrpc] Sep 4 16:08:14 cache kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20 Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 [ptlrpc] Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 [ptlrpc] Sep 4 16:08:14 cache kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 Sometimes its the smbd daemon that spits similar call trace. I`m using lustre 2.4.0-2.6.32_358.6.2.el6.x86_64_gd3f91c4.x86_64. This is the only client I use for exporting lustre via samba. No other client is having errors or issues like that. Sometimes I can see that this particular client disconnects from some of the OSTs too. I`ll test the NIC to see if there are any hardware problems. but besides that, does any one have any clues or hints that want to share with me ? Cheers, On Tue, Jun 25, 2013 at 8:55 AM, Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:> Thank you all for your quick responses. Unfortunately my own stupidity > played me this time ... the MDS server was not a domain member. After > joining it it works OK. > > Cheers, > :( > > > On Mon, Jun 24, 2013 at 8:48 PM, Michael Watters <wattersmt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: > >> Does your uid in Windows match the uid on the samba server? Does the >> samba account exist? I ran into similar issues with NFS. >> >> >> On Mon, Jun 24, 2013 at 5:59 AM, Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org >> > wrote: >> >>> Hello guys, >>> >>> I`m using the latest feature release >>> (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm) >>> + centos 6.4. Lustre itself is working fine, but when I export it with >>> samba and try to connect with a windows 7 client I get : >>> >>> Jun 24 12:53:14 R-82L kernel: LustreError: >>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>> Jun 24 12:53:16 R-82L kernel: LustreError: >>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>> Jun 24 12:53:16 R-82L kernel: LustreError: >>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >>> Jun 24 12:53:16 R-82L kernel: LustreError: >>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>> Jun 24 12:53:16 R-82L kernel: LustreError: >>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >>> >>> And on the windows client I get "You don''t have permissions to access >>> ....." error. Permissions are 777. I created a local share just to test the >>> samba server and its working. The error pops up only when trying to access >>> samba share with lustre backend storage. >>> >>> Any help will be greatly appreciated. >>> >>> Cheers, >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
What samba distribution did you use? Redhat or CentOS samba package is not working correctly. 2013/9/4 Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>> I don''t to start a new thread. But something fishy is going on. The client > I write you about exports Lustre via samba for some windows users. The > client has a 10Gigabit Ethernet NIC. Usually it works fine, but sometimes > this happens : > > Sep 4 16:07:22 cache kernel: LustreError: Skipped 1 previous similar > message > Sep 4 16:07:22 cache kernel: LustreError: 167-0: > lustre0-MDT0000-mdc-ffff88062edb0c00: This client was evicted by > lustre0-MDT0000; in progress operations using this service will fail. > Sep 4 16:07:22 cache kernel: LustreError: > 7411:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -5 > Sep 4 16:07:22 cache kernel: LustreError: > 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115380745406215 > mdc close failed: rc = -108 > Sep 4 16:07:22 cache smbd[7580]: [2013/09/04 16:07:22.451276, 0] > smbd/process.c:2440(keepalive_fn) > Sep 4 16:07:22 cache kernel: LustreError: > 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115307009623849 > mdc close failed: rc = -108 > Sep 4 16:07:22 cache kernel: LustreError: > 7411:0:(file.c:159:ll_close_inode_openhandle()) Skipped 1 previous similar > message > Sep 4 16:07:22 cache kernel: LustreError: > 6553:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -108 > Sep 4 16:07:22 cache kernel: LustreError: > 6553:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 21 previous similar messages > Sep 4 16:07:22 cache kernel: LustreError: > 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout > [0x200002b8a:0x119dd:0x0] error -108. > Sep 4 16:07:22 cache kernel: LustreError: > 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout > [0x200002b8a:0x119dd:0x0] error -108. > Sep 4 16:07:22 cache kernel: LustreError: > 6519:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at > 0: rc -108 > Sep 4 16:07:22 cache kernel: LustreError: > 6519:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at > 0: rc -108 > Sep 4 16:07:22 cache kernel: LustreError: > 6553:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at > 0: rc -108 > Sep 4 16:07:22 cache kernel: LustreError: > 6553:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at > 0: rc -108 > Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.469729, 0] > smbd/dfree.c:137(sys_disk_free) > Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. Error > was : Cannot send after transport endpoint shutdown > Sep 4 16:07:22 cache kernel: LustreError: > 6517:0:(lmv_obd.c:1289:lmv_statfs()) can''t stat MDS #0 > (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108 > Sep 4 16:07:22 cache kernel: LustreError: > 6517:0:(llite_lib.c:1610:ll_statfs_internal()) md_statfs fails: rc = -108 > Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470531, 0] > smbd/dfree.c:137(sys_disk_free) > Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. Error > was : Cannot send after transport endpoint shutdown > Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470922, 0] > smbd/dfree.c:137(sys_disk_free) > Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. Error > was : Cannot send after transport endpoint shutdown > Sep 4 16:07:22 cache kernel: LustreError: > 6517:0:(lmv_obd.c:1289:lmv_statfs()) can''t stat MDS #0 > (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108 > Sep 4 16:07:22 cache kernel: LustreError: > 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir > [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553] > Sep 4 16:07:22 cache kernel: LustreError: > 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir > [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553] > Sep 4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517112, 0] > smbd/dfree.c:137(sys_disk_free) > Sep 4 16:07:22 cache smbd[8958]: disk_free: sys_fsusage() failed. Error > was : Cannot send after transport endpoint shutdown > Sep 4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517496, 0] > smbd/dfree.c:137(sys_disk_free) > > Eventually it connects again to the MDS: > > Sep 4 16:07:31 cache kernel: LustreError: > 6582:0:(ldlm_resource.c:811:ldlm_resource_complain()) Resource: > ffff880a125f7e40 (8589945618/117308/0/0) (rc: 1) > Sep 4 16:07:31 cache kernel: LustreError: > 6582:0:(ldlm_resource.c:1423:ldlm_resource_dump()) --- Resource: > ffff880a125f7e40 (8589945618/117308/0/0) (rc: 2) > Sep 4 16:07:31 cache kernel: Lustre: > lustre0-MDT0000-mdc-ffff88062edb0c00: Connection restored to > lustre0-MDT0000 (at 192.168.11.23@tcp) > > But now the load average hits to high heaven ... 25 or even 50 for a 16 > CPU machine. And shortly after that : > > Sep 4 16:08:14 cache kernel: INFO: task ptlrpcd_6:2425 blocked for more > than 120 seconds. > Sep 4 16:08:14 cache kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Sep 4 16:08:14 cache kernel: ptlrpcd_6 D 000000000000000c 0 2425 > 2 0x00000080 > Sep 4 16:08:14 cache kernel: ffff880c34ab7a10 0000000000000046 > 0000000000000000 ffffffffa0a01736 > Sep 4 16:08:14 cache kernel: ffff880c34ab79d0 ffffffffa09fc199 > ffff88062e274000 ffff880c34cf8000 > Sep 4 16:08:14 cache kernel: ffff880c34aae638 ffff880c34ab7fd8 > 000000000000fb88 ffff880c34aae638 > Sep 4 16:08:14 cache kernel: Call Trace: > Sep 4 16:08:14 cache kernel: [<ffffffffa0a01736>] ? > ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] > Sep 4 16:08:14 cache kernel: [<ffffffffa09fc199>] ? > ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] > Sep 4 16:08:14 cache kernel: [<ffffffff8150f1ee>] > __mutex_lock_slowpath+0x13e/0x180 > Sep 4 16:08:14 cache kernel: [<ffffffff8150f08b>] mutex_lock+0x2b/0x50 > Sep 4 16:08:14 cache kernel: [<ffffffffa0751ecf>] > cl_lock_mutex_get+0x6f/0xd0 [obdclass] > Sep 4 16:08:14 cache kernel: [<ffffffffa0b5b469>] > lovsub_parent_lock+0x49/0x120 [lov] > Sep 4 16:08:14 cache kernel: [<ffffffffa0b5c60f>] > lovsub_lock_modify+0x7f/0x1e0 [lov] > Sep 4 16:08:14 cache kernel: [<ffffffffa07514d8>] > cl_lock_modify+0x98/0x310 [obdclass] > Sep 4 16:08:14 cache kernel: [<ffffffffa0748dae>] ? > cl_object_attr_unlock+0xe/0x20 [obdclass] > Sep 4 16:08:14 cache kernel: [<ffffffffa0ac1e52>] ? > osc_lock_lvb_update+0x1a2/0x470 [osc] > Sep 4 16:08:14 cache kernel: [<ffffffffa0ac2302>] > osc_lock_granted+0x1e2/0x2b0 [osc] > Sep 4 16:08:14 cache kernel: [<ffffffffa0ac30b0>] > osc_lock_upcall+0x3f0/0x5e0 [osc] > Sep 4 16:08:14 cache kernel: [<ffffffffa0ac2cc0>] ? > osc_lock_upcall+0x0/0x5e0 [osc] > Sep 4 16:08:14 cache kernel: [<ffffffffa0aa3876>] > osc_enqueue_fini+0x106/0x240 [osc] > Sep 4 16:08:14 cache kernel: [<ffffffffa0aa82c2>] > osc_enqueue_interpret+0xe2/0x1e0 [osc] > Sep 4 16:08:14 cache kernel: [<ffffffffa0884d2c>] > ptlrpc_check_set+0x2ac/0x1b20 [ptlrpc] > Sep 4 16:08:14 cache kernel: [<ffffffffa08b1c7b>] > ptlrpcd_check+0x53b/0x560 [ptlrpc] > Sep 4 16:08:14 cache kernel: [<ffffffffa08b21a3>] ptlrpcd+0x233/0x390 > [ptlrpc] > Sep 4 16:08:14 cache kernel: [<ffffffff81063310>] ? > default_wake_function+0x0/0x20 > Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 > [ptlrpc] > Sep 4 16:08:14 cache kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20 > Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 > [ptlrpc] > Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 > [ptlrpc] > Sep 4 16:08:14 cache kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 > > Sometimes its the smbd daemon that spits similar call trace. I`m using > lustre 2.4.0-2.6.32_358.6.2.el6.x86_64_gd3f91c4.x86_64. This is the only > client I use for exporting lustre via samba. No other client is having > errors or issues like that. Sometimes I can see that this particular client > disconnects from some of the OSTs too. I`ll test the NIC to see if there > are any hardware problems. but besides that, does any one have any clues or > hints that want to share with me ? > > > Cheers, > > > > On Tue, Jun 25, 2013 at 8:55 AM, Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: > >> Thank you all for your quick responses. Unfortunately my own stupidity >> played me this time ... the MDS server was not a domain member. After >> joining it it works OK. >> >> Cheers, >> :( >> >> >> On Mon, Jun 24, 2013 at 8:48 PM, Michael Watters <wattersmt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: >> >>> Does your uid in Windows match the uid on the samba server? Does the >>> samba account exist? I ran into similar issues with NFS. >>> >>> >>> On Mon, Jun 24, 2013 at 5:59 AM, Nikolay Kvetsinski < >>> nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> >>>> Hello guys, >>>> >>>> I`m using the latest feature release >>>> (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm) >>>> + centos 6.4. Lustre itself is working fine, but when I export it with >>>> samba and try to connect with a windows 7 client I get : >>>> >>>> Jun 24 12:53:14 R-82L kernel: LustreError: >>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >>>> >>>> And on the windows client I get "You don''t have permissions to access >>>> ....." error. Permissions are 777. I created a local share just to test the >>>> samba server and its working. The error pops up only when trying to access >>>> samba share with lustre backend storage. >>>> >>>> Any help will be greatly appreciated. >>>> >>>> Cheers, >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >>> >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-- Jongwoo Han Principal consultant jw.han-AU4hkaGsGNxBDgjK7y7TUQ@public.gmane.org Tel: +82- 2-3413-1704 Mobile:+82-505-227-6108 Fax: : +82- 2-544-7962 _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello, I`m using : Version 3.6.9-151.el6. I`ve added locking = No posix locking = No To smb.conf and we`ll see if there is some improvement. Any suggestion on samba version, or if you say that CentOS package is not working correctly should I build samba from source ? On Fri, Sep 6, 2013 at 11:07 AM, 한종우 <jw.han@apexcns.com> wrote:> What samba distribution did you use? > > Redhat or CentOS samba package is not working correctly. > > > 2013/9/4 Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > >> I don''t to start a new thread. But something fishy is going on. The >> client I write you about exports Lustre via samba for some windows users. >> The client has a 10Gigabit Ethernet NIC. Usually it works fine, but >> sometimes this happens : >> >> Sep 4 16:07:22 cache kernel: LustreError: Skipped 1 previous similar >> message >> Sep 4 16:07:22 cache kernel: LustreError: 167-0: >> lustre0-MDT0000-mdc-ffff88062edb0c00: This client was evicted by >> lustre0-MDT0000; in progress operations using this service will fail. >> Sep 4 16:07:22 cache kernel: LustreError: >> 7411:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -5 >> Sep 4 16:07:22 cache kernel: LustreError: >> 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115380745406215 >> mdc close failed: rc = -108 >> Sep 4 16:07:22 cache smbd[7580]: [2013/09/04 16:07:22.451276, 0] >> smbd/process.c:2440(keepalive_fn) >> Sep 4 16:07:22 cache kernel: LustreError: >> 7411:0:(file.c:159:ll_close_inode_openhandle()) inode 144115307009623849 >> mdc close failed: rc = -108 >> Sep 4 16:07:22 cache kernel: LustreError: >> 7411:0:(file.c:159:ll_close_inode_openhandle()) Skipped 1 previous similar >> message >> Sep 4 16:07:22 cache kernel: LustreError: >> 6553:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -108 >> Sep 4 16:07:22 cache kernel: LustreError: >> 6553:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 21 previous similar messages >> Sep 4 16:07:22 cache kernel: LustreError: >> 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout >> [0x200002b8a:0x119dd:0x0] error -108. >> Sep 4 16:07:22 cache kernel: LustreError: >> 6517:0:(vvp_io.c:1228:vvp_io_init()) lustre0: refresh file layout >> [0x200002b8a:0x119dd:0x0] error -108. >> Sep 4 16:07:22 cache kernel: LustreError: >> 6519:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at >> 0: rc -108 >> Sep 4 16:07:22 cache kernel: LustreError: >> 6519:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at >> 0: rc -108 >> Sep 4 16:07:22 cache kernel: LustreError: >> 6553:0:(dir.c:389:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at >> 0: rc -108 >> Sep 4 16:07:22 cache kernel: LustreError: >> 6553:0:(dir.c:595:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at >> 0: rc -108 >> Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.469729, 0] >> smbd/dfree.c:137(sys_disk_free) >> Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. >> Error was : Cannot send after transport endpoint shutdown >> Sep 4 16:07:22 cache kernel: LustreError: >> 6517:0:(lmv_obd.c:1289:lmv_statfs()) can''t stat MDS #0 >> (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108 >> Sep 4 16:07:22 cache kernel: LustreError: >> 6517:0:(llite_lib.c:1610:ll_statfs_internal()) md_statfs fails: rc = -108 >> Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470531, 0] >> smbd/dfree.c:137(sys_disk_free) >> Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. >> Error was : Cannot send after transport endpoint shutdown >> Sep 4 16:07:22 cache smbd[6517]: [2013/09/04 16:07:22.470922, 0] >> smbd/dfree.c:137(sys_disk_free) >> Sep 4 16:07:22 cache smbd[6517]: disk_free: sys_fsusage() failed. >> Error was : Cannot send after transport endpoint shutdown >> Sep 4 16:07:22 cache kernel: LustreError: >> 6517:0:(lmv_obd.c:1289:lmv_statfs()) can''t stat MDS #0 >> (lustre0-MDT0000-mdc-ffff88062edb0c00), error -108 >> Sep 4 16:07:22 cache kernel: LustreError: >> 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir >> [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553] >> Sep 4 16:07:22 cache kernel: LustreError: >> 6553:0:(statahead.c:1397:is_first_dirent()) error reading dir >> [0x200000007:0x1:0x0] at 0: [rc -108] [parent 6553] >> Sep 4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517112, 0] >> smbd/dfree.c:137(sys_disk_free) >> Sep 4 16:07:22 cache smbd[8958]: disk_free: sys_fsusage() failed. >> Error was : Cannot send after transport endpoint shutdown >> Sep 4 16:07:22 cache smbd[8958]: [2013/09/04 16:07:22.517496, 0] >> smbd/dfree.c:137(sys_disk_free) >> >> Eventually it connects again to the MDS: >> >> Sep 4 16:07:31 cache kernel: LustreError: >> 6582:0:(ldlm_resource.c:811:ldlm_resource_complain()) Resource: >> ffff880a125f7e40 (8589945618/117308/0/0) (rc: 1) >> Sep 4 16:07:31 cache kernel: LustreError: >> 6582:0:(ldlm_resource.c:1423:ldlm_resource_dump()) --- Resource: >> ffff880a125f7e40 (8589945618/117308/0/0) (rc: 2) >> Sep 4 16:07:31 cache kernel: Lustre: >> lustre0-MDT0000-mdc-ffff88062edb0c00: Connection restored to >> lustre0-MDT0000 (at 192.168.11.23@tcp) >> >> But now the load average hits to high heaven ... 25 or even 50 for a 16 >> CPU machine. And shortly after that : >> >> Sep 4 16:08:14 cache kernel: INFO: task ptlrpcd_6:2425 blocked for more >> than 120 seconds. >> Sep 4 16:08:14 cache kernel: "echo 0 > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> Sep 4 16:08:14 cache kernel: ptlrpcd_6 D 000000000000000c 0 >> 2425 2 0x00000080 >> Sep 4 16:08:14 cache kernel: ffff880c34ab7a10 0000000000000046 >> 0000000000000000 ffffffffa0a01736 >> Sep 4 16:08:14 cache kernel: ffff880c34ab79d0 ffffffffa09fc199 >> ffff88062e274000 ffff880c34cf8000 >> Sep 4 16:08:14 cache kernel: ffff880c34aae638 ffff880c34ab7fd8 >> 000000000000fb88 ffff880c34aae638 >> Sep 4 16:08:14 cache kernel: Call Trace: >> Sep 4 16:08:14 cache kernel: [<ffffffffa0a01736>] ? >> ksocknal_queue_tx_locked+0x136/0x530 [ksocklnd] >> Sep 4 16:08:14 cache kernel: [<ffffffffa09fc199>] ? >> ksocknal_find_conn_locked+0x159/0x290 [ksocklnd] >> Sep 4 16:08:14 cache kernel: [<ffffffff8150f1ee>] >> __mutex_lock_slowpath+0x13e/0x180 >> Sep 4 16:08:14 cache kernel: [<ffffffff8150f08b>] mutex_lock+0x2b/0x50 >> Sep 4 16:08:14 cache kernel: [<ffffffffa0751ecf>] >> cl_lock_mutex_get+0x6f/0xd0 [obdclass] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0b5b469>] >> lovsub_parent_lock+0x49/0x120 [lov] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0b5c60f>] >> lovsub_lock_modify+0x7f/0x1e0 [lov] >> Sep 4 16:08:14 cache kernel: [<ffffffffa07514d8>] >> cl_lock_modify+0x98/0x310 [obdclass] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0748dae>] ? >> cl_object_attr_unlock+0xe/0x20 [obdclass] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0ac1e52>] ? >> osc_lock_lvb_update+0x1a2/0x470 [osc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0ac2302>] >> osc_lock_granted+0x1e2/0x2b0 [osc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0ac30b0>] >> osc_lock_upcall+0x3f0/0x5e0 [osc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0ac2cc0>] ? >> osc_lock_upcall+0x0/0x5e0 [osc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0aa3876>] >> osc_enqueue_fini+0x106/0x240 [osc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0aa82c2>] >> osc_enqueue_interpret+0xe2/0x1e0 [osc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa0884d2c>] >> ptlrpc_check_set+0x2ac/0x1b20 [ptlrpc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa08b1c7b>] >> ptlrpcd_check+0x53b/0x560 [ptlrpc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa08b21a3>] ptlrpcd+0x233/0x390 >> [ptlrpc] >> Sep 4 16:08:14 cache kernel: [<ffffffff81063310>] ? >> default_wake_function+0x0/0x20 >> Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 >> [ptlrpc] >> Sep 4 16:08:14 cache kernel: [<ffffffff8100c0ca>] child_rip+0xa/0x20 >> Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 >> [ptlrpc] >> Sep 4 16:08:14 cache kernel: [<ffffffffa08b1f70>] ? ptlrpcd+0x0/0x390 >> [ptlrpc] >> Sep 4 16:08:14 cache kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 >> >> Sometimes its the smbd daemon that spits similar call trace. I`m using >> lustre 2.4.0-2.6.32_358.6.2.el6.x86_64_gd3f91c4.x86_64. This is the only >> client I use for exporting lustre via samba. No other client is having >> errors or issues like that. Sometimes I can see that this particular client >> disconnects from some of the OSTs too. I`ll test the NIC to see if there >> are any hardware problems. but besides that, does any one have any clues or >> hints that want to share with me ? >> >> >> Cheers, >> >> >> >> On Tue, Jun 25, 2013 at 8:55 AM, Nikolay Kvetsinski <nkvecinski-Re5JQEeQqe8@public.gmane.orgm >> > wrote: >> >>> Thank you all for your quick responses. Unfortunately my own stupidity >>> played me this time ... the MDS server was not a domain member. After >>> joining it it works OK. >>> >>> Cheers, >>> :( >>> >>> >>> On Mon, Jun 24, 2013 at 8:48 PM, Michael Watters <wattersmt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote: >>> >>>> Does your uid in Windows match the uid on the samba server? Does the >>>> samba account exist? I ran into similar issues with NFS. >>>> >>>> >>>> On Mon, Jun 24, 2013 at 5:59 AM, Nikolay Kvetsinski < >>>> nkvecinski-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>>> >>>>> Hello guys, >>>>> >>>>> I`m using the latest feature release >>>>> (lustre-2.4.0-2.6.32_358.6.2.el6_lustre.g230b174.x86_64_gd3f91c4.x86_64.rpm) >>>>> + centos 6.4. Lustre itself is working fine, but when I export it with >>>>> samba and try to connect with a windows 7 client I get : >>>>> >>>>> Jun 24 12:53:14 R-82L kernel: LustreError: >>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >>>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) ldlm_cli_enqueue: -13 >>>>> Jun 24 12:53:16 R-82L kernel: LustreError: >>>>> 2326:0:(mdc_locks.c:840:mdc_enqueue()) Skipped 7 previous similar messages >>>>> >>>>> And on the windows client I get "You don''t have permissions to access >>>>> ....." error. Permissions are 777. I created a local share just to test the >>>>> samba server and its working. The error pops up only when trying to access >>>>> samba share with lustre backend storage. >>>>> >>>>> Any help will be greatly appreciated. >>>>> >>>>> Cheers, >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > > -- > Jongwoo Han > Principal consultant > jw.han-AU4hkaGsGNxBDgjK7y7TUQ@public.gmane.org > Tel: +82- 2-3413-1704 > Mobile:+82-505-227-6108 > Fax: : +82- 2-544-7962 >_______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss