I''ve compiled Lustre on Ubuntu Breezy using the vanilla 2.6.12 patches
for
1.4.5 found in the bug database, and everything seemed to go pretty
smoothly, until I tried to start Lustre on the MDS node. lconf
consistently hangs up when attempting an add_uuid for the MDS node, and
there''s an lctl process at 100% CPU. Has anyone else seen anything
like
that before? Any ideas (apart from the obvious of using a supported
kernel)?
The machine uses a single channel-bonded IP.
Many Thanks,
Brent Nelson
Director of Computing
Dept. of Physics
University of Florida
---------------------------------------------------------------------
Sample output using just the basic local.sh/local.xml from the Howto:
lconf -v --reformat local.xml
configuring for host: [''ganymede'',
''localhost'']
add_local NET_localhost_tcp_UUID
find_local_routes: []
+ mknod /dev/portals c 10 240
+ mknod /dev/obd c 10 241
setting /proc/sys/net/core/rmem_max to at least 16777216
setting /proc/sys/net/core/wmem_max to at least 16777216
Service: network NET_localhost_tcp NET_localhost_tcp_UUID
loading module: libcfs srcdir None devdir libcfs
+ /sbin/modprobe libcfs
loading module: portals srcdir None devdir portals
+ /sbin/modprobe portals
loading module: ksocknal srcdir None devdir knals/socknal
+ /sbin/modprobe ksocknal
Service: ldlm ldlm ldlm_UUID
loading module: lvfs srcdir None devdir lvfs
+ /sbin/modprobe lvfs
loading module: obdclass srcdir None devdir obdclass
+ /sbin/modprobe obdclass
loading module: ptlrpc srcdir None devdir ptlrpc
+ /sbin/modprobe ptlrpc
Service: osd OSD_ost1-test_localhost OSD_ost1-test_localhost_UUID
loading module: ost srcdir None devdir ost
+ /sbin/modprobe ost
loading module: ldiskfs srcdir None devdir ldiskfs
+ /sbin/modprobe ldiskfs
loading module: fsfilt_ldiskfs srcdir None devdir lvfs
+ /sbin/modprobe fsfilt_ldiskfs
loading module: obdfilter srcdir None devdir obdfilter
+ /sbin/modprobe obdfilter
Service: osd OSD_ost2-test_localhost OSD_ost2-test_localhost_UUID
Service: mdsdev MDD_mds-test_localhost MDD_mds-test_localhost_UUID
stripe_count %d, inode_size %d 1 512
loading module: mdc srcdir None devdir mdc
+ /sbin/modprobe mdc
loading module: osc srcdir None devdir osc
+ /sbin/modprobe osc
loading module: lov srcdir None devdir lov
+ /sbin/modprobe lov
loading module: mds srcdir None devdir mds
+ /sbin/modprobe mds
Service: mountpoint MNT_localhost MNT_localhost_UUID
get_lov_tgts failed, using get_refs
dbg LOV __init__: [(<__main__.OSC instance at 0xb7c19cac>, 0, 1, 1),
(<__main__.OSC instance at 0xb7c2722c>, 1, 1, 1)]
[u''ost1-test_UUID'',
u''ost2-test_UUID''] 0
loading module: llite srcdir None devdir llite
+ /sbin/modprobe llite
+ sysctl portals/debug_path /tmp/lustre-log-ganymede
+ /usr/sbin/lctl modules > /tmp/ogdb-ganymede
Service: network NET_localhost_tcp NET_localhost_tcp_UUID
NETWORK: NET_localhost_tcp NET_localhost_tcp_UUID tcp localhost 988
+ /usr/sbin/lctl
network tcp
mynid localhost
quit
Service: ldlm ldlm ldlm_UUID
Service: osd OSD_ost1-test_localhost OSD_ost1-test_localhost_UUID
OSD: ost1-test ost1-test_UUID obdfilter /tmp/ost1-test 100000 ldiskfs no 0
0
+ /usr/sbin/acceptor 988
+ losetup /dev/loop/0
+ dd if=/dev/zero bs=1k count=0 seek=100000 of=/tmp/ost1-test
+ mkfs.ext2 -j -b 4096 -F /tmp/ost1-test 25000
+ tune2fs -O dir_index /tmp/ost1-test
+ losetup /dev/loop0
+ losetup /dev/loop0 /tmp/ost1-test
OST mount options: errors=remount-ro
+ /usr/sbin/lctl
attach obdfilter ost1-test ost1-test_UUID
quit
+ /usr/sbin/lctl
cfg_device ost1-test
setup /dev/loop0 ldiskfs f errors=remount-ro
quit
+ /usr/sbin/lctl
attach ost OSS OSS_UUID
quit
+ /usr/sbin/lctl
cfg_device OSS
setup
quit
Service: osd OSD_ost2-test_localhost OSD_ost2-test_localhost_UUID
OSD: ost2-test ost2-test_UUID obdfilter /tmp/ost2-test 100000 ldiskfs no 0
0
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ dd if=/dev/zero bs=1k count=0 seek=100000 of=/tmp/ost2-test
+ mkfs.ext2 -j -b 4096 -F /tmp/ost2-test 25000
+ tune2fs -O dir_index /tmp/ost2-test
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop1 /tmp/ost2-test
OST mount options: errors=remount-ro
+ /usr/sbin/lctl
attach obdfilter ost2-test ost2-test_UUID
quit
+ /usr/sbin/lctl
cfg_device ost2-test
setup /dev/loop1 ldiskfs f errors=remount-ro
quit
Service: mdsdev MDD_mds-test_localhost MDD_mds-test_localhost_UUID
stripe_count %d, inode_size %d 1 512
MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs yes
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop3
+ losetup /dev/loop4
+ losetup /dev/loop5
+ losetup /dev/loop6
+ losetup /dev/loop7
+ dd if=/dev/zero bs=1k count=0 seek=50000 of=/tmp/mds-test
+ mkfs.ext2 -j -b 4096 -F -i 4096 -I 512 /tmp/mds-test 12500
+ tune2fs -O dir_index /tmp/mds-test
+ losetup /dev/loop0
+ losetup /dev/loop1
+ losetup /dev/loop2
+ losetup /dev/loop2 /tmp/mds-test
+ /usr/sbin/lctl
attach mds mds-test mds-test_UUID
quit
+ /usr/sbin/lctl
cfg_device mds-test
setup /dev/loop2 ldiskfs
quit
recording clients for filesystem: FS_fsname_UUID
get_lov_tgts failed, using get_refs
dbg LOV __init__: [(<__main__.OSC instance at 0xb7c19dcc>, 0, 1, 1),
(<__main__.OSC instance at 0xb7c274ec>, 1, 1, 1)]
[u''ost1-test_UUID'',
u''ost2-test_UUID''] 0
+ /usr/sbin/lctl
device $mds-test
probe
clear_log mds-test
quit
Recording log mds-test on mds-test
dbg LOV prepare
dbg LOV prepare: [(<__main__.OSC instance at 0xb7c19dcc>, 0, 1, 1),
(<__main__.OSC instance at 0xb7c274ec>, 1, 1, 1)]
[u''ost1-test_UUID'',
u''ost2-test_UUID'']
LOV: lov_mds-test d281c_lov_mds-test_603ebead3b mds-test_UUID 0 1048576 0
0 [u''ost1-test_UUID'', u''ost2-test_UUID'']
mds-test
+ /usr/sbin/lctl
device $mds-test
record mds-test
attach lov lov_mds-test d281c_lov_mds-test_603ebead3b
lov_setup lov-test_UUID 0 1048576 0 0
quit
OSC: OSC_ganymede_ost1-test_mds-test d281c_lov_mds-test_603ebead3b
ost1-test_UUID
+ /usr/sbin/lctl
device $mds-test
record mds-test
add_uuid NID_localhost_UUID localhost tcp
-------------------------------------------------------------------
Ctrl-C yields:
Traceback (most recent call last):
File "/usr/sbin/lconf", line 3433, in ?
main()
File "/usr/sbin/lconf", line 3426, in main
doHost(lustreDB, node_list)
File "/usr/sbin/lconf", line 2866, in doHost
for_each_profile(node_db, prof_list, doSetup)
File "/usr/sbin/lconf", line 2682, in for_each_profile
operation(services)
File "/usr/sbin/lconf", line 2697, in doSetup
n.prepare()
File "/usr/sbin/lconf", line 1769, in prepare
self.write_conf()
File "/usr/sbin/lconf", line 1830, in write_conf
client.prepare()
File "/usr/sbin/lconf", line 2283, in prepare
self.osc.prepare()
File "/usr/sbin/lconf", line 1646, in prepare
osc.prepare(ignore_connect_failure=0)
File "/usr/sbin/lconf", line 2160, in prepare
lctl.connect(srv)
File "/usr/sbin/lconf", line 502, in connect
self.add_uuid(srv.net_type, srv.nid_uuid, srv.nid)
File "/usr/sbin/lconf", line 476, in add_uuid
self.run(cmds)
File "/usr/sbin/lconf", line 406, in run
ready = select.select([outfd,errfd],[],[]) # Wait for input
KeyboardInterrupt
-------------------------------------------------------------------
dmesg output:
[ 384.467029] Lustre: 7742:0:(module.c:530:init_libcfs_module()) maximum
lustre stack 8192
[ 384.588485] Lustre: Routing socket NAL loaded (Routing disabled,
initial mem 0, incarnation 0x4084be4801267)
[ 384.797967] Lustre: OBD class driver Build Version:
1.4.5-19691231190000-PRISTINE-.usr.src.linux-source-2.6.12-2.6.12.051209,
info@clusterfs.com
[ 385.029030] ptlrpc: no version for "inter_module_get" found: kernel
tainted.
[ 385.361412] Lustre: Filtering OBD driver; info@clusterfs.com
[ 385.989375] Lustre: Lustre Lite Client File System; info@clusterfs.com
[ 386.217568] loop: loaded (max 8 devices)
[ 387.609714] kjournald starting. Commit interval 5 seconds
[ 387.609859] LDISKFS FS on loop0, internal journal
[ 387.609881] LDISKFS-fs: mounted filesystem with ordered data mode.
[ 387.610115] Lustre: 8006:0:(filter.c:383:filter_init_server_data())
ost1-test: initializing new last_rcvd
[ 387.627340] Lustre: OST ost1-test now serving /dev/loop0 with recovery
enabled.
[ 388.467420] kjournald starting. Commit interval 5 seconds
[ 388.467798] LDISKFS FS on loop1, internal journal
[ 388.467831] LDISKFS-fs: mounted filesystem with ordered data mode.
[ 388.468554] Lustre: 8173:0:(filter.c:383:filter_init_server_data())
ost2-test: initializing new last_rcvd
[ 388.481778] Lustre: OST ost2-test now serving /dev/loop1 with recovery
enabled.
[ 389.396602] kjournald starting. Commit interval 5 seconds
[ 389.397018] LDISKFS FS on loop2, internal journal
[ 389.397049] LDISKFS-fs: mounted filesystem with ordered data mode.
[ 389.398184] Lustre: 8209:0:(mds_fs.c:243:mds_init_server_data())
mds-test: initializing new last_rcvd
[ 389.400669] Lustre: MDT mds-test now serving /dev/loop2 with recovery
enabled.