Hello, we have a _test_ setup for a lustre 1.6.5.1 installation with 2 Raid Systems (64 Bit Systems) counting for 4 OSTs with 6TB each. One combined MDS and MDT server (32 Bit system , for testing only). OST lustre mkfs: "mkfs.lustre --param="failover.mode=failout" --fsname scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b 4096'' --mgsnode=mds1lustre at tcp0 /dev/sdb" (Our files are quite large 100MB+ on the system) Kernel: Vanilla Kernel 2.6.22.19, lustre compiled from the sources on Gentoo 2008.0 The client mount point is /misc/testfs via automount. The access can be done through a link from /mnt/testfs -> /misc/testfs The following procedure hangs a client: 1) copy files to the lustre system 2) do a ''du -sh /mnt/testfs/willi'' while copying 3) unmount an OST (here OST0003) while copying The ''du'' job hangs and the lustre file system cannot be acessed any longer on this client even from other logins. The only way to restore normal op is IMHO a hard reset of the machine. A reboot hangs because the filesystem is still active. Other clients and there mount points are not affected as long as they do not access the file system with ''du'' ''ls'' or so. I know that this is drastic but may happen in production by our users. Deactivating/Reactivating or remounting the OST does not have any effect on the ''du'' job. The ''du'' job (#29665 see process list below) and the correpsonding lustre thread (#29694) cannot be killed manually. This behaviour is reproducable. The OST0003 is not reactivated on the client side though the MDS does so. It seems that this info does not propagate to the client. See last lines of dmesg below. What is the proper way (besides avoiding the use of ''du'') to reactivate the client file system ? Thanks and Regards Heiko The process list on the CLIENT: <snip> root 29175 5026 0 08:36 ? 00:00:00 sshd: laura [priv] laura 29177 29175 0 08:36 ? 00:00:01 sshd: laura at pts/0 laura 29178 29177 0 08:36 pts/0 00:00:00 -bash laura 29665 29178 0 09:15 pts/0 00:00:03 du -sh /mnt/testfs/foo/fam/ schell 29694 2 0 09:15 ? 00:00:00 [ll_sa_29665] root 29695 4846 0 09:15 ? 00:00:00 /usr/sbin/automount --timeout 60 --pid-file /var/run/autofs.misc.pid /misc yp auto.misc <snap> and CLIENT dmesg: Lustre: 5361:0:(import.c:395:import_select_connection()) scia-OST0003-osc-ffff8100ea24a000: tried all connections, increasing latency to 6s Lustre: 5361:0:(import.c:395:import_select_connection()) Skipped 10 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.16.97 at tcp. The ost_connect operation failed with -19 LustreError: Skipped 20 previous similar messages Lustre: 5361:0:(import.c:395:import_select_connection()) scia-OST0003-osc-ffff8100ea24a000: tried all connections, increasing latency to 51s Lustre: 5361:0:(import.c:395:import_select_connection()) Skipped 20 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.16.97 at tcp. The ost_connect operation failed with -19 LustreError: Skipped 24 previous similar messages Lustre: 5361:0:(import.c:395:import_select_connection()) scia-OST0003-osc-ffff8100ea24a000: tried all connections, increasing latency to 51s Lustre: 5361:0:(import.c:395:import_select_connection()) Skipped 24 previous similar messages LustreError: 167-0: This client was evicted by scia-OST0003; in progress operations using this service will fail. The MDS dmesg: <snip> Lustre: 6108:0:(import.c:395:import_select_connection()) scia-OST0003-osc: tried all connections, increasing latency to 51s Lustre: 6108:0:(import.c:395:import_select_connection()) Skipped 10 previous similar messages LustreError: 11-0: an error occurred while communicating with 192.168.16.97 at tcp. The ost_connect operation failed with -19 LustreError: Skipped 10 previous similar messages Lustre: 6108:0:(import.c:395:import_select_connection()) scia-OST0003-osc: tried all connections, increasing latency to 51s Lustre: 6108:0:(import.c:395:import_select_connection()) Skipped 20 previous similar messages Lustre: Permanently deactivating scia-OST0003 Lustre: Setting parameter scia-OST0003-osc.osc.active in log scia-client Lustre: Skipped 3 previous similar messages Lustre: setting import scia-OST0003_UUID INACTIVE by administrator request Lustre: scia-OST0003-osc.osc: set parameter active=0 Lustre: Skipped 2 previous similar messages Lustre: scia-MDT0000: haven''t heard from client 9111f740-b7a7-e2ff-b672-288a66decfab (at 192.168.16.106 at tcp) in 1269 seconds. I think it''s dead, and I am evicting it. Lustre: Permanently reactivating scia-OST0003 Lustre: Modifying parameter scia-OST0003-osc.osc.active in log scia-client Lustre: Skipped 1 previous similar message Lustre: 15406:0:(import.c:395:import_select_connection()) scia-OST0003-osc: tried all connections, increasing latency to 51s Lustre: 15406:0:(import.c:395:import_select_connection()) Skipped 2 previous similar messages LustreError: 167-0: This client was evicted by scia-OST0003; in progress operations using this service will fail. Lustre: scia-OST0003-osc: Connection restored to service scia-OST0003 using nid 192.168.16.97 at tcp. Lustre: scia-OST0003-osc.osc: set parameter active=1 Lustre: MDS scia-MDT0000: scia-OST0003_UUID now active, resetting orphans <snap>
We are experiencing the same problem with 1.6.4.2. We thought it was the statahead problems. After turning off the statahead code, we experienced the same problem again. I had hoped going to 1.6.5 would resolve the issue. If you open a bug, would you mind sending the bug number to the list? I would like to get on the CC list.> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of > Heiko Schroeter > Sent: Thursday, July 10, 2008 2:25 AM > To: lustre-discuss at clusterfs.com > Subject: [Lustre-discuss] lustre client 1.6.5.1 hangs > > Hello, > > we have a _test_ setup for a lustre 1.6.5.1 installation with > 2 Raid Systems > (64 Bit Systems) counting for 4 OSTs with 6TB each. One > combined MDS and MDT > server (32 Bit system , for testing only). > > OST lustre mkfs: > "mkfs.lustre --param="failover.mode=failout" --fsname > scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b > 4096'' --mgsnode=mds1lustre at tcp0 /dev/sdb" > (Our files are quite large 100MB+ on the system) > > Kernel: Vanilla Kernel 2.6.22.19, lustre compiled from the > sources on Gentoo > 2008.0 > > The client mount point is /misc/testfs via automount. > The access can be done through a link from /mnt/testfs -> /misc/testfs > > The following procedure hangs a client: > 1) copy files to the lustre system > 2) do a ''du -sh /mnt/testfs/willi'' while copying > 3) unmount an OST (here OST0003) while copying > > The ''du'' job hangs and the lustre file system cannot be > acessed any longer on > this client even from other logins. The only way to restore > normal op is IMHO > a hard reset of the machine. A reboot hangs because the > filesystem is still > active. > Other clients and there mount points are not affected as long > as they do not > access the file system with ''du'' ''ls'' or so. > I know that this is drastic but may happen in production by our users. > > Deactivating/Reactivating or remounting the OST does not have > any effect on > the ''du'' job. The ''du'' job (#29665 see process list below) and the > correpsonding lustre thread (#29694) cannot be killed manually. > > This behaviour is reproducable. The OST0003 is not > reactivated on the client > side though the MDS does so. It seems that this info does not > propagate to > the client. See last lines of dmesg below. > > What is the proper way (besides avoiding the use of ''du'') to > reactivate the > client file system ? > > Thanks and Regards > Heiko > > > > > The process list on the CLIENT: > <snip> > root 29175 5026 0 08:36 ? 00:00:00 sshd: laura [priv] > laura 29177 29175 0 08:36 ? 00:00:01 sshd: laura at pts/0 > laura 29178 29177 0 08:36 pts/0 00:00:00 -bash > laura 29665 29178 0 09:15 pts/0 00:00:03 du -sh > /mnt/testfs/foo/fam/ > schell 29694 2 0 09:15 ? 00:00:00 [ll_sa_29665] > root 29695 4846 0 09:15 ? 00:00:00 > /usr/sbin/automount --timeout > 60 --pid-file /var/run/autofs.misc.pid /misc yp auto.misc > <snap> > > and CLIENT dmesg: > Lustre: 5361:0:(import.c:395:import_select_connection()) > scia-OST0003-osc-ffff8100ea24a000: tried all connections, > increasing latency > to 6s > Lustre: 5361:0:(import.c:395:import_select_connection()) > Skipped 10 previous > similar messages > LustreError: 11-0: an error occurred while communicating with > 192.168.16.97 at tcp. The ost_connect operation failed with -19 > LustreError: Skipped 20 previous similar messages > Lustre: 5361:0:(import.c:395:import_select_connection()) > scia-OST0003-osc-ffff8100ea24a000: tried all connections, > increasing latency > to 51s > Lustre: 5361:0:(import.c:395:import_select_connection()) > Skipped 20 previous > similar messages > LustreError: 11-0: an error occurred while communicating with > 192.168.16.97 at tcp. The ost_connect operation failed with -19 > LustreError: Skipped 24 previous similar messages > Lustre: 5361:0:(import.c:395:import_select_connection()) > scia-OST0003-osc-ffff8100ea24a000: tried all connections, > increasing latency > to 51s > Lustre: 5361:0:(import.c:395:import_select_connection()) > Skipped 24 previous > similar messages > LustreError: 167-0: This client was evicted by scia-OST0003; > in progress > operations using this service will fail. > > The MDS dmesg: > <snip> > Lustre: 6108:0:(import.c:395:import_select_connection()) > scia-OST0003-osc: > tried all connections, increasing latency to 51s > Lustre: 6108:0:(import.c:395:import_select_connection()) > Skipped 10 previous > similar messages > LustreError: 11-0: an error occurred while communicating with > 192.168.16.97 at tcp. The ost_connect operation failed with -19 > LustreError: Skipped 10 previous similar messages > Lustre: 6108:0:(import.c:395:import_select_connection()) > scia-OST0003-osc: > tried all connections, increasing latency to 51s > Lustre: 6108:0:(import.c:395:import_select_connection()) > Skipped 20 previous > similar messages > Lustre: Permanently deactivating scia-OST0003 > Lustre: Setting parameter scia-OST0003-osc.osc.active in log > scia-client > Lustre: Skipped 3 previous similar messages > Lustre: setting import scia-OST0003_UUID INACTIVE by > administrator request > Lustre: scia-OST0003-osc.osc: set parameter active=0 > Lustre: Skipped 2 previous similar messages > Lustre: scia-MDT0000: haven''t heard from client > 9111f740-b7a7-e2ff-b672-288a66decfab (at 192.168.16.106 at tcp) > in 1269 seconds. > I think it''s dead, and I am evicting it. > Lustre: Permanently reactivating scia-OST0003 > Lustre: Modifying parameter scia-OST0003-osc.osc.active in > log scia-client > Lustre: Skipped 1 previous similar message > Lustre: 15406:0:(import.c:395:import_select_connection()) > scia-OST0003-osc: > tried all connections, increasing latency to 51s > Lustre: 15406:0:(import.c:395:import_select_connection()) > Skipped 2 previous > similar messages > LustreError: 167-0: This client was evicted by scia-OST0003; > in progress > operations using this service will fail. > Lustre: scia-OST0003-osc: Connection restored to service > scia-OST0003 using > nid 192.168.16.97 at tcp. > Lustre: scia-OST0003-osc.osc: set parameter active=1 > Lustre: MDS scia-MDT0000: scia-OST0003_UUID now active, > resetting orphans > <snap> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Thu, 2008-07-10 at 10:25 +0200, Heiko Schroeter wrote:> Hello,Hi.> OST lustre mkfs: > "mkfs.lustre --param="failover.mode=failout" --fsname^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Given this (above) parameter setting...> scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b > 4096'' --mgsnode=mds1lustre at tcp0 /dev/sdb"> The following procedure hangs a client: > 1) copy files to the lustre system > 2) do a ''du -sh /mnt/testfs/willi'' while copying > 3) unmount an OST (here OST0003) while copyingDo you expect that the copy and du (which are both running at the same time while you unmount the OST, right?) should both get EIOs?> Deactivating/Reactivating or remounting the OST does not have any effect on > the ''du'' job. The ''du'' job (#29665 see process list below) and the > correpsonding lustre thread (#29694) cannot be killed manually.That latter process (ll_sa_29665) is statahead at work.> What is the proper way (besides avoiding the use of ''du'') to reactivate the > client file system ?Well, in fact the du and the copy should both EIO when they get to trying to write to the unmounted OST. Can you get a stack trace (sysrq-t) on the client after you have unmounted the OST and processes are hung/blocked? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080710/dcaeb63f/attachment.bin
Am Donnerstag, 10. Juli 2008 19:35:57 schrieben Sie: Hi.> > > OST lustre mkfs: > > "mkfs.lustre --param="failover.mode=failout" --fsname > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Given this (above) parameter setting...Is ''failout'' not ok ? Actually we like to use it because we like to use the lustre system as a huge expandable data archive system. If one OST breaks down and destroys the data on it we can restore them.> > scia --ost --mkfsoptions=''-i 2097152 -E stride=16 -b > > 4096'' --mgsnode=mds1lustre at tcp0 /dev/sdb" > > > > The following procedure hangs a client: > > 1) copy files to the lustre system > > 2) do a ''du -sh /mnt/testfs/willi'' while copying > > 3) unmount an OST (here OST0003) while copying > > Do you expect that the copy and du (which are both running at the same > time while you unmount the OST, right?Right.> ) should both get EIOs?Actually i do expect the client not tho hang any job that acesses the file systerm in this moment. If that needs an EIO and KILL of that process this is fine by me.> > What is the proper way (besides avoiding the use of ''du'') to reactivate > > the client file system ? > > Well, in fact the du and the copy should both EIO when they get to > trying to write to the unmounted OST. > > Can you get a stack trace (sysrq-t) on the client after you have > unmounted the OST and processes are hung/blocked?I will get this done today. If the output is very large can i zip it and attach it ? Thank you. Heiko
Am Donnerstag, 10. Juli 2008 19:35:57 schrieb Brian J. Murrell:> Well, in fact the du and the copy should both EIO when they get to > trying to write to the unmounted OST. > > Can you get a stack trace (sysrq-t) on the client after you have > unmounted the OST and processes are hung/blocked?Here is the stack trace. I hope it is the one you requested. Regards Heiko -------------- next part -------------- A non-text attachment was scrubbed... Name: stack_trace.txt.gz Type: application/x-gzip Size: 30198 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080711/0592fb67/attachment-0003.bin
On Fri, 2008-07-11 at 08:24 +0200, Heiko Schroeter wrote:> > Is ''failout'' not ok ?That''s up to you. Failout means that if an OST becomes unreachable (because it has failed or taken off the network, or unmounted or turned off, etc.) then any I/O to get objects from that OST will cause a client to get an EIO (Input/Output error). Failover means that a client that tries to do I/O to a failed OST will continue to try (forever) until it gets an answer. A userspace sees nothing strange, other than an I/O that takes, potentially, a very long time to complete.> Actually we like to use it because we like to use the > lustre system as a huge expandable data archive system.I''m not sure what using failout has to do with that.> If one OST breaks > down and destroys the data on it we can restore them.Again, failout/failover really has nothing to do with this. It has everything to do with what a client does when it sees an OST fail.> Actually i do expect the client not tho hang any job that acesses the file > systerm in this moment. If that needs an EIO and KILL of that process this is > fine by me.Well, no kill should be necessary. An EIO should terminate an application. Unless it has a retry handler for EIOs written into it. That''s not very common. EIO usually should be interpreted as fatal. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080711/22917b90/attachment.bin
On Fri, 2008-07-11 at 10:14 +0200, Heiko Schroeter wrote:> > Here is the stack trace. I hope it is the one you requested.Hrm. What is strange is that you have configured failout but not getting EIOs. Maybe you should file a bug in our bugzilla about this one. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080711/702728df/attachment-0001.bin