Hi, with lustre-1.6.0.1 (and linux-2.6.20 :) ) and a lustre-1.4.10 server I get the following timeout messages every 25s 32638.793767] Lustre: Added LNI 192.168.41.101@o2ib [8/64] [32638.851883] Lustre: Added LNI 192.168.42.101@tcp [8/256] [32638.851948] Lustre: Accept secure, port 988 [32639.661422] Lustre: Lustre Client File System; info@clusterfs.com [32639.700411] LustreError: 11314:0:(mgc_request.c:63:mgc_logname2resid()) fsname too long: mds-beo/client-client [32639.710514] LustreError: MGC192.168.41.106@o2ib: The configuration from log ''mds-beo/client-client'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. [32639.727611] LustreError: MGC192.168.41.106@o2ib: The configuration from log ''mds-beo/client-client'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [32639.751320] Lustre: This looks like an old mount command; I will try to contact MDT ''mds-beo'' for profile ''client'' [32639.947580] Lustre: Client client has started [32644.937607] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031579, 5s ago) req@ffff81013d49dc00 x1/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 [32669.883922] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031604, 5s ago) req@ffff81007d651200 x16/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 [32694.834240] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031629, 5s ago) req@ffff810037f37a00 x20/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 [32719.780559] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031654, 5s ago) req@ffff81000170ca00 x27/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 [...] Besides of these messages (which will my too small log dir rather soon) it seems to work fine. Any ideas? Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH
fixed with a patch to 1.6 in bug 11691, will be in 1.6.1 Bernd Schubert wrote:> Hi, > > with lustre-1.6.0.1 (and linux-2.6.20 :) ) and a lustre-1.4.10 server I get > the following timeout messages every 25s > > > 32638.793767] Lustre: Added LNI 192.168.41.101@o2ib [8/64] > [32638.851883] Lustre: Added LNI 192.168.42.101@tcp [8/256] > [32638.851948] Lustre: Accept secure, port 988 > [32639.661422] Lustre: Lustre Client File System; info@clusterfs.com > [32639.700411] LustreError: 11314:0:(mgc_request.c:63:mgc_logname2resid()) fsname too long: mds-beo/client-client > [32639.710514] LustreError: MGC192.168.41.106@o2ib: The configuration from log ''mds-beo/client-client'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. > [32639.727611] LustreError: MGC192.168.41.106@o2ib: The configuration from log ''mds-beo/client-client'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. > [32639.751320] Lustre: This looks like an old mount command; I will try to contact MDT ''mds-beo'' for profile ''client'' > [32639.947580] Lustre: Client client has started > [32644.937607] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031579, 5s ago) req@ffff81013d49dc00 x1/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 > [32669.883922] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031604, 5s ago) req@ffff81007d651200 x16/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 > [32694.834240] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031629, 5s ago) req@ffff810037f37a00 x20/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 > [32719.780559] LustreError: 11383:0:(client.c:950:ptlrpc_expire_one_request()) @@@ timeout (sent at 1180031654, 5s ago) req@ffff81000170ca00 x27/t0 o250->MGS@beo-106_UUID:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 > [...] > > Besides of these messages (which will my too small log dir rather soon) > it seems to work fine. > > Any ideas? > > > Thanks, > Bernd > > >
On Thursday 24 May 2007 21:10:12 Nathaniel Rutman wrote:> fixed with a patch to 1.6 in bug 11691, will be in 1.6.1 >Thanks, going to apply the patch. -- Bernd Schubert Q-Leap Networks GmbH
On May 24, 2007 21:25 +0200, Bernd Schubert wrote:> On Thursday 24 May 2007 21:10:12 Nathaniel Rutman wrote: > > fixed with a patch to 1.6 in bug 11691, will be in 1.6.1 > > Thanks, going to apply the patch.In the meantime you can just "lctl --device {mgc device} deactivate" on the client. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Friday 25 May 2007 01:25:14 Andreas Dilger wrote:> On May 24, 2007 21:25 +0200, Bernd Schubert wrote: > > On Thursday 24 May 2007 21:10:12 Nathaniel Rutman wrote: > > > fixed with a patch to 1.6 in bug 11691, will be in 1.6.1 > > > > Thanks, going to apply the patch. > > In the meantime you can just "lctl --device {mgc device} deactivate" on > the client.Hmm, somehow that does not work: root@beo-101:~# lctl --device MGC192.168.41.106@o2ib deactivate error: deactivate: failed: Operation not supported root@beo-101:~# lctl dl 0 UP mgc MGC192.168.41.106@o2ib f73789d4-c9fa-7ed5-2e34-c82758219ffa 5 2 UP lov lov-beo-ffff810067e05800 c46170c4-5a4b-78f9-03b1-348c2ce59579 4 [...] Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH
On Fri, May 25, 2007 at 11:13:20AM +0200, Bernd Schubert wrote:> Hmm, somehow that does not work: > > root@beo-101:~# lctl --device MGC192.168.41.106@o2ib deactivate > error: deactivate: failed: Operation not supported > > root@beo-101:~# lctl dl > 0 UP mgc MGC192.168.41.106@o2ib f73789d4-c9fa-7ed5-2e34-c82758219ffa 5 > 2 UP lov lov-beo-ffff810067e05800 c46170c4-5a4b-78f9-03b1-348c2ce59579 4In fact, it is ''lctl --device <device number> deactivate'', so in your case: # lctl --device 0 deactivate Johann
On Friday 25 May 2007 11:21:11 Johann Lombardi wrote:> On Fri, May 25, 2007 at 11:13:20AM +0200, Bernd Schubert wrote: > > Hmm, somehow that does not work: > > > > root@beo-101:~# lctl --device MGC192.168.41.106@o2ib deactivate > > error: deactivate: failed: Operation not supported > > > > root@beo-101:~# lctl dl > > 0 UP mgc MGC192.168.41.106@o2ib f73789d4-c9fa-7ed5-2e34-c82758219ffa 5 > > 2 UP lov lov-beo-ffff810067e05800 c46170c4-5a4b-78f9-03b1-348c2ce59579 > > 4 > > In fact, it is ''lctl --device <device number> deactivate'', so in your case: > # lctl --device 0 deactivateThanks, but still does not work :( root@beo-101:~# lctl --device 0 deactivate error: deactivate: failed: Operation not supported Bernd -- Bernd Schubert Q-Leap Networks GmbH
Bernd Schubert wrote:> On Friday 25 May 2007 11:21:11 Johann Lombardi wrote: > >> On Fri, May 25, 2007 at 11:13:20AM +0200, Bernd Schubert wrote: >> >>> Hmm, somehow that does not work: >>> >>> root@beo-101:~# lctl --device MGC192.168.41.106@o2ib deactivate >>> error: deactivate: failed: Operation not supported >>> >>> root@beo-101:~# lctl dl >>> 0 UP mgc MGC192.168.41.106@o2ib f73789d4-c9fa-7ed5-2e34-c82758219ffa 5 >>> 2 UP lov lov-beo-ffff810067e05800 c46170c4-5a4b-78f9-03b1-348c2ce59579 >>> 4 >>> >> In fact, it is ''lctl --device <device number> deactivate'', so in your case: >> # lctl --device 0 deactivate >> > > Thanks, but still does not work :( > > root@beo-101:~# lctl --device 0 deactivate > error: deactivate: failed: Operation not supported >As of 1.6, you can use the device # or device name with --device. But the error message is correct - you can''t deactivate the MGC this way because I never implemented a mgc_iocontrol. (Actually, it''s commented out in mgc_request.c, and we''d just need to copy the IOC_OSC_SET_ACTIVE ioctl from mdc_iocontrol(), but at this point you may as well just use the patch in bug 11691.)