Displaying 20 results from an estimated 24 matches for "lustreerror".
2008 Feb 22
0
lustre error
...ustre:
7:0:(linux-debug.c:98:libcfs_run_upcall()) Skipped 2 previous similar
messages
Feb 22 03:25:29 node4 kernel: Lustre:
7:0:(linux-debug.c:98:libcfs_run_upcall()) Invoked LNET upcall
/usr/lib/lustre/lnet_upcall ROUTER_NOTIFY,192.168.0.11 at t
cp,down,1203647123
Feb 22 03:25:33 node4 kernel: LustreError:
4567:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection
request from 192.168.0.13
Feb 22 03:25:43 node4 kernel: LustreError:
4567:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection
request from 192.168.0.17
Feb 22 03:25:59 node4 kernel: LustreError:
4567:0:(acceptor....
2008 Jan 10
4
1.6.4.1 - active client evicted
...and did
umount -f, which also hung.
So, what should happen in situations like this? What can I do to debug
it? Has anyone else seen this?
130.239.78.238 is the client, 130.239.78.233 is the MGS/MDT.
Logs from the client:
----------------------------8<------------------------
Jan 10 12:40:38 LustreError: 11-0: an error occurred while communicating with 130.239.78.233 at tcp. The ldlm_enqueue operation failed with -107
Jan 10 12:40:38 LustreError: Skipped 1 previous similar message
Jan 10 12:40:38 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection to service hpfs-MDT0000 via nid 130.239.78.233 a...
2007 Sep 28
0
llog_origin_handle_cancel and other LustreErrors
Hi again!
Same setup as before (Lustre 1.6.2 + 2.6.18 kernel).
This time things suddenly started to be very slow (as in periodically
stalling), and we found a bunch of llog_ LustreErrors on the MDS. Some
time later stuff had automagically recovered and is back to normal
speed.
Any idea on the meaning/cause of these errors?
What are the seriousness of "LustreError" errors in general? Does it
mean "this is bad, but normal", "this is bad, shouldn'...
2008 Mar 07
2
Multihomed question: want Lustre over IB andEthernet
...your client modprobe.conf lnet option
> should be this:
>
>
> options lnet networks=o2ib(ib0)
>
> (not ''o2ib0'').
It still seems to want the TCP connection:
Lustre: Added LNI 36.122.255.1 at o2ib [8/64]
Lustre: Lustre Client File System; info at clusterfs.com
LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found
for 36.121.255.201 at tcp
LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot
find peer 36.121.255.201 at tcp!
LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can''t add
initial connection
LustreError: 110...
2010 Aug 11
0
OSS: IMP_CLOSED errors
Hello.
OS CentOS 5.4
uname -a
Linux oss0 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 SMP Tue Oct 6 05:48:57 MDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Lustre 1.8.1.1
OSS server.
A lot of errors in /var/log/messages:
Aug 10 14:46:34 oss0 kernel: LustreError: 2802:0:(client.c:775:ptlrpc_import_delay_req()) Skipped 1 previous similar message
Aug 10 15:07:01 oss0 kernel: LustreError: 2802:0:(client.c:775:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req at ffff810210710c00 x1343624153290636/t0 o401->@10.10.10.6 at tcp:17/18 lens 4096/384 e 0 to 1 dl 0 r...
2007 Nov 07
1
ll_cfg_requeue process timeouts
Hi,
Our environment is: 2.6.9-55.0.9.EL_lustre.1.6.3smp
I am getting following errors from two OSS''s
...
Nov 7 10:39:51 storage09.beowulf.cluster kernel: LustreError:
23045:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req at 00000100b410be00 x4190687/t0 o101->MGS at MGC10.143.245.201@tcp_0:26
lens 232/240 ref 1 fl Rpc:/0/0 rc 0/0
Nov 7 10:39:51 storage09.beowulf.cluster kernel: LustreError:
23045:0:(client.c:519:ptlrpc_import_delay_re...
2007 Oct 25
1
Error message
I''m seeing this error message on one of my OSS''s but not the other
three. Any idea what is causing it?
Oct 25 13:58:56 oss2 kernel: LustreError:
3228:0:(client.c:519:ptlrpc_import_delay_req()) @@@ IMP_INVALID
req at f6b13200 x18040/t0 o101->MGS at MGC192.168.0.200@tcp_0:26 lens 176/184
ref 1 fl Rpc:/0/0 rc 0/0
Oct 25 13:58:56 oss2 kernel: LustreError:
3228:0:(client.c:519:ptlrpc_import_delay_req()) Skipped 39 previous
similar messages...
2013 Apr 29
1
OSTs inactive on one client (only)
...T #6
So, the machine is seeing two of three OSTs on OSS #1 and one
of three OSTs on OSS #2. It is showing some OSTs on an OSS as
active and others as inactive. So this does not seem to be a
networking
issue.
I am getting a set of errors on that client periodically:
Apr 29 16:21:18 abacus kernel: LustreError:
28707:0:(import.c:324:ptlrpc_invalidate_import()) lustre-OST0003_UUID:
rc = -110 waiting for callback (3 != 0)
Apr 29 16:21:18 abacus kernel: LustreError:
28707:0:(import.c:324:ptlrpc_invalidate_import()) Skipped 18 previous
similar messages
Apr 29 16:21:18 abacus kernel: LustreError:
28707:0...
2007 Oct 22
0
The mds_connect operation failed with -11
...stre --fsname=datafs00
--mdt --mgsnode=192.168.3.100 /dev/sda3 ; mount -t lustre ...
4 ost -----------> 192.168.3.102-104 with mkfs.lustre --fsname=datafs00
--ost --mgsnode=192.168.3.100 at tcp0 /dev/sda3 ; mount -t lustre.....
foreach node
But when I try mount from any node:
LOG IN NODE:
LustreError: 4743:0:(obd_mount.c:1927:lustre_fill_super()) Unable to
mount (-22)
LustreError: 11-0: an error occurred while communicating with
192.168.3.101 at tcp. The mds_connect operation failed with -11
LustreError: Skipped 1 previous similar message
LustreError: 11-0: an error occurred while communicatin...
2008 Mar 06
2
strange lustre errors
Hi,
On a few of the hpc cluster nodes, i am seeing a new lustre
error that is pasted below. The volumes are working fine and there
is nothing on the oss and mds to report.
LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret())
data3-OST0000_UUID at 192.168.2.98@tcp changed handle from
0xfe51139158c64fae to 0xfe511392a35878b3; copying, but this may
foreshadow disaster
LustreError: 5080:0:(import.c:607:ptlrpc_connect_interpret())
data4-OST0000_UUID at 192.168.2.98@tcp c...
2010 Sep 16
2
Lustre module not getting loaded in MDS
...help.
Lnet is configured in /etc/modprobe.conf.local as below.
options lnet networks=tcp0(eth0) accept=all
For loading lustre module i tried including lustre module in the variable
MODULES_LOADED_ON_BOOT using yast2 sysconfig . But still it is not getting
loaded.
Error from dmesg is as below.
LustreError: 1393:0:(socklnd.c:2543:ksocknal_enumerate_interfaces()) Can''t
enumerate interfaces: 0
LustreError: 105-4: Error -100 starting up LNI tcp
LustreError: 1393:0:(events.c:725:ptlrpc_init_portals()) network
initialisation failed
Currently this is present in the VM.
Kindly suggest.
Regards,
P...
2008 Feb 04
32
Luster clients getting evicted
on our cluster that has been running lustre for about 1 month. I have
1 MDT/MGS and 1 OSS with 2 OST''s.
Our cluster uses all Gige and has about 608 nodes 1854 cores.
We have allot of jobs that die, and/or go into high IO wait, strace
shows processes stuck in fstat().
The big problem is (i think) I would like some feedback on it that of
these 608 nodes 209 of them have in dmesg
2007 Nov 07
9
How To change server recovery timeout
Hi,
Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp
I would like to change recovery timeout from default value 250s to
something longer
I tried example from manual:
set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.
We performed that experiment on our test lustre installation with one
OST.
storage02 is our OSS
[root at
2013 Sep 15
0
Lustre 2.4 MDT: LustreError: Communicating with 0@lo: operation mds_connect failed with -11
I''m a Lustre newbie who just joined this list. I''d appreciate any help on
the following Lustre 2.4 issue I''m running into:
Every time I mount the MDT, the mount appears to succeed but
/var/log/messages contains the message: "LustreError: 11-0:
lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect
failed with -11". The MDT uses 4 local drives in a RAID10 configuration.
Each OSS has their own RAID10 of 36 drives each. The OSS''s mount
correctly without any errors.
I''ve seen this error...
2008 Jan 31
2
lustre+samba
Dear All,
I try to use our cluster though samba share. Everything work fine, but
I think, we should have -o flock at lustre mount time.
Great, it''s work. But when I want to save a file on the share, I get
this on the logs:
Jan 31 10:45:24 opteron-ren-11 kernel: LustreError: 24836:0:(file.c:2309:ll_file_flock()) unknown fcntl lock type: 32
Jan 31 10:45:24 opteron-ren-11 kernel: LustreError: 24836:0:(file.c:2310:ll_file_flock()) LBUG
Jan 31 10:45:24 opteron-ren-11 kernel: Lustre: 24836:0:(linux-debug.c:171:libcfs_debug_dumpstack()) can''t show stack: kernel doe...
2008 Feb 12
0
Lustre-discuss Digest, Vol 25, Issue 17
...9;'m having a similar issue with lustre 1.6.4.2 and infiniband. Under
load, the clients hand about every 10 minutes which is really bad for
a production machine. The only way to fix the hang is to reboot the
server. My users are getting extremely impatient :-/
I see this on the clients-
LustreError: 2814:0:(client.c:975:ptlrpc_expire_one_request()) @@@
timeout (sent at 1202756629, 301s ago) req at ffff8100af233600 x1796079/
t0 o6->data-OST0000_UUID at 192.168.64.71@o2ib:28 lens 336/336 ref 1 fl
Rpc:/0/0 rc 0/-22
Lustre: data-OST0000-osc-ffff810139ce4800: Connection to service data-...
2008 Apr 15
5
o2ib module prevents shutdown
...f this is the right forum: I''m encountering difficulties
with o2ib which prevents an LNET shutdown from proceeding:
Unloading OpenIB kernel modules:NET: Unregistered protocal family 27
Failed to unload rdma_cm
Failed to unload rdma_cm
Failed to unload ib_cm
Failed to unload ib_sa
LustreError: 131-3: Received notification of device removal
Please shutdown LNET to allow this to proceed
This happens on server and client nodes alike. We run RHEL5.1 and
OFED 1.2, kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp from CFS/Sun.
I narrowed it down to module ko2iblnd, which I attempt to remove...
2007 Dec 11
2
lustre + nfs + alphas
...verything is fine. The lustre mount on the export server can take a real pounding (ive seen it push 300MB/sec) so I don''t know why nfs is crashing it.
On the nfs export server i see these messages--
Lustre: 4224:0:(o2iblnd_cb.c:412:kiblnd_handle_rx()) PUT_NACK from 192.168.64.70 at o2ib
LustreError: 4400:0:(client.c:969:ptlrpc_expire_one_request()) @@@ timeout (sent at 1197415542, 100s ago) req at ffff810827bfbc00 x38827/t0 o36->data-MDT0000_UUID at 192.168.64.70@o2ib:12 lens 14256/672 ref 1 fl Rpc:/0/0 rc 0/-22
Lustre: data-MDT0000-mdc-ffff81082d702000: Connection to service data-MDT0000...
2008 Mar 03
1
Quota setup fails because of OST ordering
Hi all,
after installing a Lustre test file system consisting of 34 OSTs, I
encountered a strange error when trying to set up quotas:
lfs quotacheck gave me an "Input/Output error", while in
/var/log/kern.log I found a Lustre error
LustreError: 20807:0:(quota_check.c:227:lov_quota_check()) lov idx 32
inactive
Indeed, in /proc/fs/lustre/lov/.../target_obd all 34 OSTs were listed
and numbered, but number 32 was missing, instead I had a number 34.
Now, in this test cluster each OSS serves two OSTs. I remember I had
made a mistake with...
2008 Jan 15
19
How do you make an MGS/OSS listen on 2 NICs?
...192.168.135.81 at tcp<mailto:12345-192.168.135.81 at tcp>
# lctl ping 192.168.135.80 at tcp<mailto:192.168.135.80 at tcp>
failed to ping 192.168.135.80 at tcp<mailto:192.168.135.80 at tcp>: Input/output error
The following is in /var/log/messages
Jan 15 17:18:15 dint0001 kernel: LustreError: 120-3: Refusing connection from 192.168.135.80 for 192.168.135.80 at tcp<mailto:192.168.135.80 at tcp>: No matching NI
Jan 15 17:18:15 dint0001 kernel: LustreError: 3251:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.135.80
Jan 15 17:18:15 dint0001 kernel:...