Verdi March
2007-Apr-20 05:05 UTC
[Lustre-discuss] Example "local" fails on node with two IP addresses
Hi,
I''m encountering problem when starting the "local" example
(one
MSD, LOV, OST, and client, all on node "sun-n1-console").
# lmc -m test.xml --batch test.txt
# cat test.txt
--add node --node sun-n1-console
--add net --node sun-n1-console --nettype lnet --nid sun-n1-console@tcp
--add mds --node sun-n1-console --mds mds1 --fstype ldiskfs --dev
/tmp/mds1-sun-n1-console --size 400000
--add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
--stripe_pattern 0
--add ost --node sun-n1-console --lov lov1 --ost ost1-sun-n1-console --fstype
ldiskfs --dev /tmp/ost1-sun-n1-console --size 400000
--add mtpt --node sun-n1-console --path /mnt/lustre --mds mds1 --lov lov1
The node has two ethernets, eth0 and eth1, both on separate subnets.
I deploys all lustre components on eth1 (IP: 192.168.123.45, hostname:
sun-n1-console).
# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
xxx.yyy.zzz.ab public-host
192.168.123.45 sun-n1-console
When eth0 is down, I successfully deployed the "local" example.
Only when eth0 is up that Lustre fails to start (see attachment)
The error messages from /var/log/messages indicates that MDS does
not respond (see below). I believe it''s not caused by firewall cause
I''ve switched it off:
# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
And here''re are the error messages:
# tail /var/log/messages
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@f7fe7e00
x22/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at
1177061855, 0s ago) req@f7fe7e00 x22/t0
o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 1 fl
Rpc:/0/0 rc 0/0
Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) Skipped 2 previous similar
messages
Apr 20 17:38:00 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ed133e00
x23/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.683:64): avc: denied {
rawip_recv } for pid=6537 comm="socknal_cd03" saddr=192.168.123.45
src=1023 daddr=192.168.123.45 dest=988 netif=lo
scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t
tclass=netif
Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.884:65): avc: denied {
rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc: denied {
rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:27 sun-n1-console kernel: audit(1177061907.090:67): avc: denied {
rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:28 sun-n1-console kernel: audit(1177061908.698:68): avc: denied {
rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988
netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6539:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection request
from 192.168.123.45
Apr 20 17:38:30 sun-n1-console kernel: audit(1177061910.683:69): avc: denied {
rawip_send } for pid=6539 comm="acceptor_988" saddr=192.168.123.45
src=988 daddr=192.168.123.45 dest=1023 netif=lo
scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t
tclass=netif
Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6537:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.123.45
Apr 20 17:38:30 sun-n1-console kernel: LustreError: Connection to
192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it running a
compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs?
Apr 20 17:38:50 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ec698e00
x25/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
Apr 20 17:39:15 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@e97c8c00
x26/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
Any advices how to make this simple example work?
Regards,
Verdi
--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
-------------- next part --------------
[root@sun-n1-console tmp]# lconf --reformat --verbose hoho.xml
configuring for host: [''sun-n1-console'']
setting /proc/sys/net/core/rmem_max to at least 16777216
setting /proc/sys/net/core/wmem_max to at least 16777216
Service: network NET_sun-n1-console_lnet NET_sun-n1-console_lnet_UUID
loading module: libcfs srcdir None devdir libcfs
+ /sbin/modprobe libcfs
loading module: lnet srcdir None devdir lnet
+ /sbin/modprobe lnet
+ /sbin/modprobe lnet
loading module: ksocklnd srcdir None devdir klnds/socklnd
+ /sbin/modprobe ksocklnd
Service: ldlm ldlm ldlm_UUID
loading module: lvfs srcdir None devdir lvfs
+ /sbin/modprobe lvfs
loading module: obdclass srcdir None devdir obdclass
+ /sbin/modprobe obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
+ /sbin/modprobe ptlrpc
Service: osd OSD_ost1-sun-n1-console_sun-n1-console
-n1-console_sun-n1-console_UUID
loading module: ost srcdir None devdir ost
+ /sbin/modprobe ost
loading module: ldiskfs srcdir None devdir ldiskfs
+ /sbin/modprobe ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
+ /sbin/modprobe fsfilt_ldiskfs
loading module: obdfilter srcdir None devdir obdfilter
+ /sbin/modprobe obdfilter
Service: mdsdev MDD_mds1_sun-n1-console MDD_mds1_sun-n1-console_UUID
original inode_size 0
stripe_count 1 inode_size 512
loading module: mdc srcdir None devdir mdc
+ /sbin/modprobe mdc
loading module: osc srcdir None devdir osc
+ /sbin/modprobe osc
loading module: lov srcdir None devdir lov
+ /sbin/modprobe lov
loading module: mds srcdir None devdir mds
+ /sbin/modprobe mds
Service: mountpoint MNT_sun-n1-console MNT_sun-n1-console_UUID
get_lov_tgts failed, using get_refs
dbg LOV __init__: [(<__main__.OSC instance at 0xb7cd952c>, 0, 1, 1)]
[u''ost1-sun-n1-console_UUID''] 1
loading module: llite srcdir None devdir llite
+ /sbin/modprobe llite
+ sysctl lnet/debug_path /tmp/lustre-log-sun-n1-console
+ /usr/sbin/lctl modules > /tmp/ogdb-sun-n1-console
Service: network NET_sun-n1-console_lnet NET_sun-n1-console_lnet_UUID
NETWORK: NET_sun-n1-console_lnet NET_sun-n1-console_lnet_UUID lnet
sun-n1-console@tcp
Service: ldlm ldlm ldlm_UUID
Service: osd OSD_ost1-sun-n1-console_sun-n1-console
-n1-console_sun-n1-console_UUID
OSD: ost1-sun-n1-console ost1-sun-n1-console_UUID obdfilter
/tmp/ost1-sun-n1-console 400000 ldiskfs no 0 256
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ dd if=/dev/zero bs=1k count=0 seek=400000 of=/tmp/ost1-sun-n1-console
+ mkfs.ext2 -j -b 4096 -F -I 256 /tmp/ost1-sun-n1-console 100000
+ tune2fs -O dir_index /tmp/ost1-sun-n1-console
+ losetup /dev/loop0
+ losetup /dev/loop0 /tmp/ost1-sun-n1-console
+ dumpe2fs -f -h /dev/loop0
no external journal found for /dev/loop0
OST mount options: errors=remount-ro
+ /usr/sbin/lctl
attach obdfilter ost1-sun-n1-console ost1-sun-n1-console_UUID
quit
+ /usr/sbin/lctl
cfg_device ost1-sun-n1-console
setup /dev/loop0 ldiskfs f errors=remount-ro
quit
+ /usr/sbin/lctl
attach ost OSS OSS_UUID
quit
+ /usr/sbin/lctl
cfg_device OSS
setup
quit
Service: mdsdev MDD_mds1_sun-n1-console MDD_mds1_sun-n1-console_UUID
original inode_size 0
stripe_count 1 inode_size 512
MDSDEV: mds1 mds1_UUID /tmp/mds1-sun-n1-console ldiskfs no
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ dd if=/dev/zero bs=1k count=0 seek=400000 of=/tmp/mds1-sun-n1-console
+ mkfs.ext2 -j -b 4096 -F -i 4096 -I 512 /tmp/mds1-sun-n1-console 100000
+ tune2fs -O dir_index /tmp/mds1-sun-n1-console
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop1 /tmp/mds1-sun-n1-console
+ /usr/sbin/lctl
attach mds mds1 mds1_UUID
quit
+ /usr/sbin/lctl
cfg_device mds1
setup /dev/loop1 ldiskfs
quit
recording clients for filesystem: FS_fsname_UUID
get_lov_tgts failed, using get_refs
dbg LOV __init__: [(<__main__.OSC instance at 0xb7cd988c>, 0, 1, 1)]
[u''ost1-sun-n1-console_UUID''] 1
+ /usr/sbin/lctl
device $mds1
probe
clear_log mds1
quit
Recording log mds1 on mds1
dbg LOV prepare
dbg LOV prepare: [(<__main__.OSC instance at 0xb7cd988c>, 0, 1, 1)]
[u''ost1-sun-n1-console_UUID'']
LOV: lov_mds1 4300b_lov_mds1_fe6fd41018 mds1_UUID 1 1048576 0 0
[u''ost1-sun-n1-console_UUID''] mds1
+ /usr/sbin/lctl
device $mds1
record mds1
attach lov lov_mds1 4300b_lov_mds1_fe6fd41018
lov_setup lov1_UUID 1 1048576 0 0
quit
OSC: OSC_sun-n1-console_ost1-sun-n1-console_mds1 4300b_lov_mds1_fe6fd41018
ost1-sun-n1-console_UUID
dbg CLIENT __prepare__: ost1-sun-n1-console_UUID [<__main__.Network instance
at 0xb7cd9c6c>]
+ /usr/sbin/lctl
device $mds1
record mds1
add_uuid sun-n1-console_UUID sun-n1-console@tcp
ost1-sun-n1-console_UUID active
+ /usr/sbin/lctl
device $mds1
record mds1
attach osc OSC_sun-n1-console_ost1-sun-n1-console_mds1
4300b_lov_mds1_fe6fd41018
quit
+ /usr/sbin/lctl
device $mds1
record mds1
cfg_device OSC_sun-n1-console_ost1-sun-n1-console_mds1
setup ost1-sun-n1-console_UUID sun-n1-console_UUID
quit
+ /usr/sbin/lctl
device $mds1
record mds1
cfg_device lov_mds1
lov_modify_tgts add lov_mds1 ost1-sun-n1-console_UUID 0 1
quit
+ /usr/sbin/lctl
device $mds1
record mds1
mount_option mds1 lov_mds1
quit
End recording log mds1 on mds1
Recording log sun-n1-console on mds1
+ /usr/sbin/lconf -v --record --nomod --old_conf --record_log sun-n1-console
--record_device mds1 --node sun-n1-console hoho.xml
record> configuring for host: [''sun-n1-console'']
record> Checking XML modification time
record> + debugfs -c -R ''stat /LOGS''
/tmp/mds1-sun-n1-console 2>&1 | grep mtime
record> Can not get mtime info of MDS LOGS directory
record> + /usr/sbin/lctl
record> device $mds1
record> probe
record> clear_log sun-n1-console
record> quit
record> Recording log sun-n1-console on mds1
record> Service: network NET_sun-n1-console_lnet
NET_sun-n1-console_lnet_UUID
record> Service: ldlm ldlm ldlm_UUID
record> Service: osd OSD_ost1-sun-n1-console_sun-n1-console
-n1-console_sun-n1-console_UUID
record> Service: mdsdev MDD_mds1_sun-n1-console MDD_mds1_sun-n1-console_UUID
record> original inode_size 0
record> stripe_count 1 inode_size 512
record> Service: mountpoint MNT_sun-n1-console MNT_sun-n1-console_UUID
record> get_lov_tgts failed, using get_refs
record> dbg LOV __init__: [(<__main__.OSC instance at 0xb7cf64cc>, 0,
1, 1)] [u''ost1-sun-n1-console_UUID''] 1
record> dbg LOV prepare
record> dbg LOV prepare: [(<__main__.OSC instance at 0xb7cf64cc>, 0,
1, 1)] [u''ost1-sun-n1-console_UUID'']
record> LOV: lov1 028ec_lov1_fa9d4fa5b7 mds1_UUID 1 1048576 0 0
[u''ost1-sun-n1-console_UUID''] mds1
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> attach lov lov1 028ec_lov1_fa9d4fa5b7
record> lov_setup lov1_UUID 1 1048576 0 0
record> quit
record> OSC: OSC_sun-n1-console_ost1-sun-n1-console_MNT_sun-n1-console
028ec_lov1_fa9d4fa5b7 ost1-sun-n1-console_UUID
record> dbg CLIENT __prepare__: ost1-sun-n1-console_UUID
[<__main__.Network instance at 0xb7cf66cc>]
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> add_uuid sun-n1-console_UUID sun-n1-console@tcp
record> ost1-sun-n1-console_UUID active
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> attach osc OSC_sun-n1-console_ost1-sun-n1-console_MNT_sun-n1-console
028ec_lov1_fa9d4fa5b7
record> quit
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> cfg_device OSC_sun-n1-console_ost1-sun-n1-console_MNT_sun-n1-console
record> setup ost1-sun-n1-console_UUID sun-n1-console_UUID
record> quit
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> cfg_device lov1
record> lov_modify_tgts add lov1 ost1-sun-n1-console_UUID 0 1
record> quit
record> MDC: MDC_sun-n1-console_mds1_MNT_sun-n1-console
0cf7b_MNT_sun-n1-console_dd8b963906 mds1_UUID
record> dbg CLIENT __prepare__: mds1_UUID [<__main__.Network instance at
0xb7cf6a4c>]
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> add_uuid sun-n1-console_UUID sun-n1-console@tcp
record> mds1_UUID active
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> attach mdc MDC_sun-n1-console_mds1_MNT_sun-n1-console
0cf7b_MNT_sun-n1-console_dd8b963906
record> quit
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> cfg_device MDC_sun-n1-console_mds1_MNT_sun-n1-console
record> setup mds1_UUID sun-n1-console_UUID
record> quit
record> MTPT: MNT_sun-n1-console MNT_sun-n1-console_UUID /mnt/lustre
mds1_UUID lov1_UUID
record> + /usr/sbin/lctl
record> device $mds1
record> record sun-n1-console
record>
record> mount_option sun-n1-console lov1
MDC_sun-n1-console_mds1_MNT_sun-n1-console
record> quit
record> End recording log sun-n1-console on mds1
+ /usr/sbin/lctl
ignore_errors
cfg_device $mds1
cleanup
detach
quit
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup -d /dev/loop1
changing mtime of LOGS to 1177060884
+ mktemp /tmp/lustre-cmd.XXXXXXXX
+ debugfs -w -R "mi /LOGS" </tmp/lustre-cmd.mEPL5082
/tmp/mds1-sun-n1-console
MDSDEV: mds1 mds1_UUID /tmp/mds1-sun-n1-console ldiskfs 400000 no
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop1 /tmp/mds1-sun-n1-console
+ /usr/sbin/lctl
attach mdt MDT MDT_UUID
quit
+ /usr/sbin/lctl
cfg_device MDT
setup
quit
+ dumpe2fs -f -h /dev/loop1
no external journal found for /dev/loop1
MDS mount options: errors=remount-ro
+ /usr/sbin/lctl
attach mds mds1 mds1_UUID
quit
+ /usr/sbin/lctl
cfg_device mds1
setup /dev/loop1 ldiskfs mds1 errors=remount-ro
quit
Alexey Lyashkov
2007-Apr-20 05:19 UTC
[Lustre-discuss] Example "local" fails on node with two IP addresses
looks you need selinux disable.
==Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc:
denied { rawip_recv } for saddr=192.168.123.45 src=1023
daddr=192.168.123.45 dest=988 netif=lo
=
On Fri, 2007-04-20 at 14:04, Verdi March wrote:> Hi,
>
> I''m encountering problem when starting the "local"
example (one
> MSD, LOV, OST, and client, all on node "sun-n1-console").
>
> # lmc -m test.xml --batch test.txt
> # cat test.txt
> --add node --node sun-n1-console
> --add net --node sun-n1-console --nettype lnet --nid sun-n1-console@tcp
> --add mds --node sun-n1-console --mds mds1 --fstype ldiskfs --dev
/tmp/mds1-sun-n1-console --size 400000
> --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
--stripe_pattern 0
> --add ost --node sun-n1-console --lov lov1 --ost ost1-sun-n1-console
--fstype ldiskfs --dev /tmp/ost1-sun-n1-console --size 400000
> --add mtpt --node sun-n1-console --path /mnt/lustre --mds mds1 --lov lov1
>
>
>
> The node has two ethernets, eth0 and eth1, both on separate subnets.
> I deploys all lustre components on eth1 (IP: 192.168.123.45, hostname:
> sun-n1-console).
>
> # cat /etc/hosts
> 127.0.0.1 localhost.localdomain localhost
> xxx.yyy.zzz.ab public-host
> 192.168.123.45 sun-n1-console
>
>
> When eth0 is down, I successfully deployed the "local" example.
> Only when eth0 is up that Lustre fails to start (see attachment)
>
> The error messages from /var/log/messages indicates that MDS does
> not respond (see below). I believe it''s not caused by firewall
cause
> I''ve switched it off:
>
> # iptables -L
> Chain INPUT (policy ACCEPT)
> target prot opt source destination
>
> Chain FORWARD (policy ACCEPT)
> target prot opt source destination
>
> Chain OUTPUT (policy ACCEPT)
> target prot opt source destination
>
>
>
>
> And here''re are the error messages:
>
> # tail /var/log/messages
> Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@f7fe7e00
x22/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
> Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at
1177061855, 0s ago) req@f7fe7e00 x22/t0
o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 1 fl
Rpc:/0/0 rc 0/0
> Apr 20 17:37:35 sun-n1-console kernel: LustreError:
6840:0:(client.c:947:ptlrpc_expire_one_request()) Skipped 2 previous similar
messages
> Apr 20 17:38:00 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ed133e00
x23/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
> Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.683:64): avc:
denied { rawip_recv } for pid=6537 comm="socknal_cd03"
saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988 netif=lo
scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t
tclass=netif
> Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.884:65): avc:
denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45
dest=988 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
> Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc:
denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45
dest=988 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
> Apr 20 17:38:27 sun-n1-console kernel: audit(1177061907.090:67): avc:
denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45
dest=988 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
> Apr 20 17:38:28 sun-n1-console kernel: audit(1177061908.698:68): avc:
denied { rawip_recv } for saddr=192.168.123.45 src=1023 daddr=192.168.123.45
dest=988 netif=lo scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
> Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6539:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection request
from 192.168.123.45
> Apr 20 17:38:30 sun-n1-console kernel: audit(1177061910.683:69): avc:
denied { rawip_send } for pid=6539 comm="acceptor_988"
saddr=192.168.123.45 src=988 daddr=192.168.123.45 dest=1023 netif=lo
scontext=system_u:object_r:unlabeled_t tcontext=system_u:object_r:netif_lo_t
tclass=netif
> Apr 20 17:38:30 sun-n1-console kernel: LustreError:
6537:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.123.45
> Apr 20 17:38:30 sun-n1-console kernel: LustreError: Connection to
192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it running a
compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs?
> Apr 20 17:38:50 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@ec698e00
x25/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
> Apr 20 17:39:15 sun-n1-console kernel: LustreError:
6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5 req@e97c8c00
x26/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2
fl Rpc:/0/0 rc 0/0
>
>
>
> Any advices how to make this simple example work?
>
>
> Regards,
> Verdi
--
Alexey Lyashkov <shadow@clusterfs.com>
Beaver team
Nathaniel Rutman
2007-Apr-20 13:56 UTC
[Lustre-discuss] Example "local" fails on node with two IP addresses
Wouldn''t it be awesome to write a script that would look for various common configuration errors in the logs and print out a sensible message? e.g. why_no_lustre.sh look for SE linux messages ip tables disks set readonly I''m sure there''s more... Alexey Lyashkov wrote:> looks you need selinux disable. > ==> Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc: > denied { rawip_recv } for saddr=192.168.123.45 src=1023 > daddr=192.168.123.45 dest=988 netif=lo > => >
Verdi March
2007-Apr-23 00:29 UTC
[Lustre-discuss] Example "local" fails on node with two IP addresses
Hi Alexey,
I''m still encountering a problem even after disabling SELinux.
# cat /proc/cmdline
ro root=LABEL=/ splash=0 rhgb selinux=0 quiet
# grep ^SELINUX /etc/selinux/config
SELINUX=disabled
SELINUXTYPE=targeted
Below is a snippet of /var/log/messages (more complete log is attached):
=========Apr 23 12:57:06 sun-n1-console kernel: Lustre: OBD class driver Build
Version:
1.4.10-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_10_RC2-2.6-rhel4-i686.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.10.EL_lustre.1.4.10smp,
info@clusterfs.com
Apr 23 12:57:07 sun-n1-console kernel: Lustre: Added LNI 129.158.130.75@tcp
[8/256]
Apr 23 12:57:07 sun-n1-console kernel: Lustre: Accept secure, port 988
Apr 23 12:57:12 sun-n1-console kernel: LustreError: Refusing connection from
192.168.123.45 for 192.168.123.45@tcp: No matching NI
Apr 23 12:57:12 sun-n1-console kernel: LustreError:
4416:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.123.45
Apr 23 12:57:12 sun-n1-console kernel: LustreError: Connection to
192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it running a
compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs?
Apr 23 12:57:12 sun-n1-console kernel: Lustre:
10:0:(linux-debug.c:98:libcfs_run_upcall()) Invoked LNET upcall
/usr/lib/lustre/lnet_upcall ROUTER_NOTIFY,192.168.123.45@tcp,down,1177304206
Apr 23 12:57:17 sun-n1-console kernel: LustreError:
4854:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at
1177304232, 5s ago) req@ef64ec00 x1/t0
o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 1 fl
Rpc:/0/0 rc 0/0
Apr 23 12:57:31 sun-n1-console kernel: LustreError:
5170:0:(mds_lov.c:589:mds_lov_start_synchronize()) mds1: error starting
mds_lov_synchronize: -4
Apr 23 12:57:31 sun-n1-console kernel: LustreError:
5170:0:(quota_master.c:1103:mds_quota_recovery()) Cannot start quota recovery
thread: rc -4
Apr 23 12:57:37 sun-n1-console kernel: LustreError: Refusing connection from
192.168.123.45 for 192.168.123.45@tcp: No matching NI
Apr 23 12:57:37 sun-n1-console kernel: LustreError:
4417:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from
192.168.123.45
Apr 23 12:57:37 sun-n1-console kernel: LustreError: Connection to
192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it running a
compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs?
Apr 23 12:57:42 sun-n1-console kernel: LustreError:
4854:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at
1177304257, 5s ago) req@f5024a00 x3/t0
o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 1 fl
Rpc:/0/0 rc 0/0
=========
It looks to me that there''s a confusion over which network interface
to use (eth0 = 129.158.130.75, and eth1 = 192.168.123.45).
I intended to deploy MDS on eth1; this is specified using IP address
when creating a node:
--add net --node sun-n1-console --nettype lnet --nid 192.168.123.45@tcp
I''ve emptied /etc/resolv.conf to ensured that
"sun-n1-console" is
resolved to 192.168.12.45,
# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.123.45 sun-n1-console
129.158.130.75 public-host
# hostname -f ; hostname -i
sun-n1-console
192.168.123.45
And results of ifconfig:
eth0 Link encap:Ethernet HWaddr 00:07:E9:06:AC:5C
inet addr:129.158.130.75 Bcast:129.158.130.255 Mask:255.255.255.0
eth1 Link encap:Ethernet HWaddr 00:07:E9:06:AC:5D
inet addr:192.168.123.45 Bcast:192.168.123.255 Mask:255.255.255.0
Are there anything else that I missed?
Regards,
Verdi
Alexey Lyashkov wrote:> looks you need selinux disable.
> ==> Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66):
avc:
> denied { rawip_recv } for saddr=192.168.123.45 src=1023
> daddr=192.168.123.45 dest=988 netif=lo
> =>
>
> On Fri, 2007-04-20 at 14:04, Verdi March wrote:
> > Hi,
> >
> > I''m encountering problem when starting the "local"
example (one
> > MSD, LOV, OST, and client, all on node "sun-n1-console").
> >
> > # lmc -m test.xml --batch test.txt
> > # cat test.txt
> > --add node --node sun-n1-console
> > --add net --node sun-n1-console --nettype lnet --nid
sun-n1-console@tcp
> > --add mds --node sun-n1-console --mds mds1 --fstype ldiskfs --dev
> /tmp/mds1-sun-n1-console --size 400000
> > --add lov --lov lov1 --mds mds1 --stripe_sz 1048576 --stripe_cnt 1
> --stripe_pattern 0
> > --add ost --node sun-n1-console --lov lov1 --ost ost1-sun-n1-console
> --fstype ldiskfs --dev /tmp/ost1-sun-n1-console --size 400000
> > --add mtpt --node sun-n1-console --path /mnt/lustre --mds mds1 --lov
> lov1
> >
> >
> >
> > The node has two ethernets, eth0 and eth1, both on separate subnets.
> > I deploys all lustre components on eth1 (IP: 192.168.123.45, hostname:
> > sun-n1-console).
> >
> > # cat /etc/hosts
> > 127.0.0.1 localhost.localdomain localhost
> > xxx.yyy.zzz.ab public-host
> > 192.168.123.45 sun-n1-console
> >
> >
> > When eth0 is down, I successfully deployed the "local"
example.
> > Only when eth0 is up that Lustre fails to start (see attachment)
> >
> > The error messages from /var/log/messages indicates that MDS does
> > not respond (see below). I believe it''s not caused by
firewall cause
> > I''ve switched it off:
> >
> > # iptables -L
> > Chain INPUT (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain FORWARD (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain OUTPUT (policy ACCEPT)
> > target prot opt source destination
> >
> >
> >
> >
> > And here''re are the error messages:
> >
> > # tail /var/log/messages
> > Apr 20 17:37:35 sun-n1-console kernel: LustreError:
> 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5
req@f7fe7e00 x22/t0
> o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0
> rc 0/0
> > Apr 20 17:37:35 sun-n1-console kernel: LustreError:
> 6840:0:(client.c:947:ptlrpc_expire_one_request()) @@@ timeout (sent at
1177061855, 0s ago)
> req@f7fe7e00 x22/t0 o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6
lens
> 240/272 ref 1 fl Rpc:/0/0 rc 0/0
> > Apr 20 17:37:35 sun-n1-console kernel: LustreError:
> 6840:0:(client.c:947:ptlrpc_expire_one_request()) Skipped 2 previous
similar messages
> > Apr 20 17:38:00 sun-n1-console kernel: LustreError:
> 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5
req@ed133e00 x23/t0
> o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0
> rc 0/0
> > Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.683:64): avc:
> denied { rawip_recv } for pid=6537 comm="socknal_cd03"
> saddr=192.168.123.45 src=1023 daddr=192.168.123.45 dest=988 netif=lo
> scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
> > Apr 20 17:38:25 sun-n1-console kernel: audit(1177061905.884:65): avc:
> denied { rawip_recv } for saddr=192.168.123.45 src=1023
> daddr=192.168.123.45 dest=988 netif=lo
scontext=system_u:object_r:unlabeled_t
> tcontext=system_u:object_r:netif_lo_t tclass=netif
> > Apr 20 17:38:26 sun-n1-console kernel: audit(1177061906.286:66): avc:
> denied { rawip_recv } for saddr=192.168.123.45 src=1023
> daddr=192.168.123.45 dest=988 netif=lo
scontext=system_u:object_r:unlabeled_t
> tcontext=system_u:object_r:netif_lo_t tclass=netif
> > Apr 20 17:38:27 sun-n1-console kernel: audit(1177061907.090:67): avc:
> denied { rawip_recv } for saddr=192.168.123.45 src=1023
> daddr=192.168.123.45 dest=988 netif=lo
scontext=system_u:object_r:unlabeled_t
> tcontext=system_u:object_r:netif_lo_t tclass=netif
> > Apr 20 17:38:28 sun-n1-console kernel: audit(1177061908.698:68): avc:
> denied { rawip_recv } for saddr=192.168.123.45 src=1023
> daddr=192.168.123.45 dest=988 netif=lo
scontext=system_u:object_r:unlabeled_t
> tcontext=system_u:object_r:netif_lo_t tclass=netif
> > Apr 20 17:38:30 sun-n1-console kernel: LustreError:
> 6539:0:(acceptor.c:442:lnet_acceptor()) Error -11 reading connection
request from
> 192.168.123.45
> > Apr 20 17:38:30 sun-n1-console kernel: audit(1177061910.683:69): avc:
> denied { rawip_send } for pid=6539 comm="acceptor_988"
> saddr=192.168.123.45 src=988 daddr=192.168.123.45 dest=1023 netif=lo
> scontext=system_u:object_r:unlabeled_t
tcontext=system_u:object_r:netif_lo_t tclass=netif
> > Apr 20 17:38:30 sun-n1-console kernel: LustreError:
> 6537:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO
from 192.168.123.45
> > Apr 20 17:38:30 sun-n1-console kernel: LustreError: Connection to
> 192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it
running a
> compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs?
> > Apr 20 17:38:50 sun-n1-console kernel: LustreError:
> 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5
req@ec698e00 x25/t0
> o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0
> rc 0/0
> > Apr 20 17:39:15 sun-n1-console kernel: LustreError:
> 6840:0:(events.c:53:request_out_callback()) @@@ type 4, status -5
req@e97c8c00 x26/t0
> o8->ost1-sun-n1-console_UUID@sun-n1-console_UUID:6 lens 240/272 ref 2 fl
Rpc:/0/0
> rc 0/0
> >
> >
> >
> > Any advices how to make this simple example work?
> >
> >
> > Regards,
> > Verdi
> --
> Alexey Lyashkov <shadow@clusterfs.com>
> Beaver team
--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail
Oleg Drokin
2007-Apr-23 02:37 UTC
[Lustre-discuss] Example "local" fails on node with two IP addresses
Hwllo! On Mon, Apr 23, 2007 at 08:28:59AM +0200, Verdi March wrote:> Apr 23 12:57:07 sun-n1-console kernel: Lustre: Added LNI 129.158.130.75@tcp [8/256]You should have included full log like this from the very beginning,> > Apr 23 12:57:12 sun-n1-console kernel: LustreError: Refusing connection from 192.168.123.45 for 192.168.123.45@tcp: No matching NI > Apr 23 12:57:12 sun-n1-console kernel: LustreError: 4416:0:(socklnd_cb.c:2160:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.123.45 > Apr 23 12:57:12 sun-n1-console kernel: LustreError: Connection to 192.168.123.45@tcp at host 192.168.123.45 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.123.45@tcp one of its NIDs?> It looks to me that there''s a confusion over which network interface > to use (eth0 = 129.158.130.75, and eth1 = 192.168.123.45).Right.> I intended to deploy MDS on eth1; this is specified using IP address > when creating a node: > --add net --node sun-n1-console --nettype lnet --nid 192.168.123.45@tcpThis won''t help.> Are there anything else that I missed?Yes, you need to pass lnet module option ''networks'' like this in your /etc/modprobe.conf: options lnet networks=tcp(eth1) (naturally replacing eth1 with interface that has the address you want to listen on) Bye, Oleg
Verdi March
2007-Apr-23 04:45 UTC
[Lustre-discuss] Example "local" fails on node with two IP addresses
Hi Oleg, Oleg Drokin wrote:> Yes, you need to pass lnet module option ''networks'' like this in your > /etc/modprobe.conf: > options lnet networks=tcp(eth1) > > (naturally replacing eth1 with interface that has the address you want to > listen > on)Thanks. With this, I managed to get Lustre works even when SELinux is enabled. Regards, Verdi -- "Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ... Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail