On Aug 26, 2005 11:40 +0200, Roland Fehrenbacher wrote:> I permanently (every couple of minutes) get messages like the one below > on my MDS, while running stress tests (bonnie++ on 4 > clients, and some unpacking, copying, diffing of large tar files on > one other client). Again, I''m running Lustre 1.4.1 with kernel > 2.6.12.5 + bugzilla patches with 2 OSTs over Gigabit Ethernet. Here is > my configuration:These errors are actually rather harmless. It indicates that a lock callback operation on the MDS is taking too long for some reason.> lmc -m config.xml --format --add mds --node ha-beo-2 --mds mds-beo \ > --fstype ldiskfs --dev /dev/drbd/2I suspect the fact that the MDS is running atop drdb may be a contributing factor to the MDS slowness.> Aug 26 11:32:17 ha-beo-2 kernel: Lustre: 0:0:(watchdog.c:122:lcw_cb()) Watchdog > triggered for pid 25181: it was inactive for 1500usTry changing ldlm/ldlm_lockd.c::ldlm_setup() to use "ldlm_timeout * 1000" instead of "1500" where ldlm_cb_service is initialized via ptlrpc_init_svc(). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Aug 26, 2005 13:28 -0600, Andreas Dilger wrote:> On Aug 26, 2005 11:40 +0200, Roland Fehrenbacher wrote: > > I permanently (every couple of minutes) get messages like the one below > > on my MDS, while running stress tests (bonnie++ on 4 > > clients, and some unpacking, copying, diffing of large tar files on > > one other client). Again, I''m running Lustre 1.4.1 with kernel > > 2.6.12.5 + bugzilla patches with 2 OSTs over Gigabit Ethernet. Here is > > my configuration: > > These errors are actually rather harmless. It indicates that a lock > callback operation on the MDS is taking too long for some reason. > > > lmc -m config.xml --format --add mds --node ha-beo-2 --mds mds-beo \ > > --fstype ldiskfs --dev /dev/drbd/2 > > I suspect the fact that the MDS is running atop drdb may be a contributing > factor to the MDS slowness. > > > Aug 26 11:32:17 ha-beo-2 kernel: Lustre: 0:0:(watchdog.c:122:lcw_cb()) > > Watchdog triggered for pid 25181: it was inactive for 1500us > > Try changing ldlm/ldlm_lockd.c::ldlm_setup() to use "ldlm_timeout * 1000" > instead of "1500" where ldlm_cb_service is initialized via ptlrpc_init_svc().Actually, in hindsight this is bug 5515, already fixed in 1.4.2. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Fri, 2005-08-26 at 11:40 +0200, Roland Fehrenbacher wrote:> Hi, > > I permanently (every couple of minutes) get messages like the one below > on my MDS, while running stress tests (bonnie++ on 4 > clients, and some unpacking, copying, diffing of large tar files on > one other client). Again, I''m running Lustre 1.4.1 with kernel > 2.6.12.5 + bugzilla patches with 2 OSTs over Gigabit Ethernet. Here is > my configuration:I have no idea what the problem is, but isn''t an outdated release on the latest mainline kernel the least-supported of all possible unsupported configurations? -jwb
Hi,
I permanently (every couple of minutes) get messages like the one below
on my MDS, while running stress tests (bonnie++ on 4
clients, and some unpacking, copying, diffing of large tar files on
one other client). Again, I''m running Lustre 1.4.1 with kernel
2.6.12.5 + bugzilla patches with 2 OSTs over Gigabit Ethernet. Here is
my configuration:
-------------------------------------------------------------------------
lmc -m config.xml --add net --node ha-beo-2 --nid ha-beo-i-2 --nettype tcp
lmc -m config.xml --add net --node sn-03-1 --nid sn-03-1-i --nettype tcp
lmc -m config.xml --add net --node sn-03-2 --nid sn-03-2-i --nettype tcp
lmc -m config.xml --add net --node client --nid ''*'' --nettype
tcp
# MDS
lmc -m config.xml --format --add mds --node ha-beo-2 --mds mds-beo \
    --fstype ldiskfs --dev /dev/drbd/2
# OSS
lmc -m config.xml --add lov --lov lov-beo --mds mds-beo --stripe_sz 1048576 \
  --stripe_cnt 0 --stripe_pattern 0
lmc -m config.xml --add ost --node sn-03-1 --lov lov-beo --ost sn-03-1 \
    --failover --fstype ldiskfs --dev /dev/vgraid50/ost
lmc -m config.xml --add ost --node sn-03-2 --lov lov-beo --ost sn-03-2 \
    --failover --fstype ldiskfs --dev /dev/vgraid50/ost
# Clients
lmc -m config.xml --add mtpt --node client --path /l/1 \
    --mds mds-beo --lov lov-beo
-------------------------------------------------------------------------
------------------------- error message ---------------------------------------
Aug 26 11:32:17 ha-beo-2 kernel: Lustre: 0:0:(watchdog.c:122:lcw_cb()) Watchdog
triggered for pid 25181: it was inactive for 1500us
Aug 26 11:32:17 ha-beo-2 kernel: ldlm_cb_27    D 000206a1fe5da43f     0 25181
   1         25182 25180 (L-TLB)
Aug 26 11:32:17 ha-beo-2 kernel: ffff810011eefc48 0000000000000046 ffff8100344a8
599 000000738893b46b
Aug 26 11:32:17 ha-beo-2 kernel:        ffff810011eefc48 000000737cc1e224 000000
0100000000 0000000000000934
Aug 26 11:32:17 ha-beo-2 kernel:        000206a1fe5da43f ffff81007c50b800
Aug 26 11:32:17 ha-beo-2 kernel: Call
Trace:<ffffffff88279d93>{:libcfs:portals_d
ebug_msg+883} <ffffffff803de0bd>{__down_write+141}
Aug 26 11:32:17 ha-beo-2 kernel:       
<ffffffff8830bcea>{:obdclass:llog_cat_ca
ncel_records+762}
Aug 26 11:32:17 ha-beo-2 kernel:       
<ffffffff8854e420>{:ptlrpc:llog_origin_h
andle_cancel+3984}
Aug 26 11:32:17 ha-beo-2 kernel:       
<ffffffff88503f78>{:ptlrpc:ldlm_callback
_handler+3768}
Aug 26 11:32:17 ha-beo-2 kernel:       
<ffffffff88539cdb>{:ptlrpc:ptlrpc_server
_handle_request+4011}
Aug 26 11:32:17 ha-beo-2 kernel:       
<ffffffff8853b871>{:ptlrpc:ptlrpc_main+2
177} <ffffffff8012e0f0>{default_wake_function+0}
Aug 26 11:32:17 ha-beo-2 kernel:       
<ffffffff8853afe0>{:ptlrpc:ptlrpc_retry_
rqbds+0} <ffffffff8853afe0>{:ptlrpc:ptlrpc_retry_rqbds+0}
Aug 26 11:32:17 ha-beo-2 kernel:        <ffffffff8010e577>{child_rip+8}
<fffffff
f8853aff0>{:ptlrpc:ptlrpc_main+0}
Aug 26 11:32:17 ha-beo-2 kernel:        <ffffffff8010e56f>{child_rip+0}
-------------------------------------------------------------------------------
Any idea, what is causing this, and how serious it is?
Thanks for your help,
Roland