I downloaded the Lustre 1.0.4 source, am using a Debian 2.4.24 kernel, patched
it with the following patches, compiled kernel, then lustre utils and modules,
made a configuration which follows. With a bit of playing around it will finally
mount all the clients correctly and it will work for a while, but bonnie++ will
cease functioning immediately after it finishes writing with putc and just sit
there. After I kill bonnie, which takes around a minute for it to die, the
filesystem does not work correctly anymore. I don''t see anything in the
logs
labled error, I have seen a few "tool6 kernel: Lustre:
339:(socknal_cb.c:1547:ksocknal_process_receive()) [ee59f800] EOF from 0xa14076d
ip 10.20.7.109:32780" lines, in /var/log/messages but am unsure what they
refer
to. Everything seems to be working correctly in the fact that nothing complains,
except for at startup. I have to make sure I am starting them all at nearly the
same time, and even then, I may have to restart one or two because it
doesn''t
mount correctly. Again, no errors, but a df shows an input/output error when
listing the lustre mountpoint. Also, I am seeing multiple ost files on the
server that also has the mds file, is this usual? I basically just want some
confirmation that I set things up correctly and am using the correct commands.
Following are the list of patches, some patches had to be modified in order to
work with the debian kernel and as such were renamed:
dev_read_only_2.4.20-rh.patch
exports_2.4.20-rh-hp.patch
lustre_version.patch
vfs_intent-2.4.20-vanilla.patch
invalidate_show.patch
export-truncate.patch
iod-stock-24-exports.patch
ext3-htree-suse.patch
linux-2.4.24-xattr-0.8.54.patch
ext3-orphan_lock-suse.patch
ext3-noread-2.4.20.patch
ext3-delete_thread-suse.patch
extN-wantedi.patch
ext3-san-2.4.20.patch
ext3-map_inode_page.patch
ext3-error-export.patch
iopen-2.4.20.patch
tcp-zero-copy_2.4.20_chaos.patch
jbd-dont-account-blocks-twice.patch
jbd-commit-tricks.patch
ext3-no-write-super-chaos.patch
add_page_private.patch
socket-exports-2.4.24-deb.patch
nfs_export_kernel-2.4.24-deb.patch
ext3-raw-lookup.patch
ext3-ea-in-inode-2.4.22-rh.patch
listman-2.4.20.patch
ext3-trusted_ea-2.4.20.patch
kernel_text_address-2.4.22-vanilla.patch
gfp_memalloc-2.4.24.patch
ext3-xattr-ptr-arith-fix.patch
procfs-ndynamic-2.4.patch
ext3-truncate-buffer-head.patch
My lustre-conf.xml:
<?xml version=''1.0'' encoding=''UTF-8''?>
<lustre version=''2003070801''>
<ldlm name=''ldlm'' uuid=''ldlm_UUID''/>
<node uuid=''tool6_UUID''
name=''tool6''>
<profile_ref uuidref=''PROFILE_tool6_UUID''/>
<network uuid=''NET_tool6_tcp_UUID''
nettype=''tcp'' name=''NET_tool6_tcp''>
<nid>10.20.7.106</nid>
<clusterid>0</clusterid>
<port>988</port>
</network>
</node>
<profile uuid=''PROFILE_tool6_UUID''
name=''PROFILE_tool6''>
<ldlm_ref uuidref=''ldlm_UUID''/>
<network_ref uuidref=''NET_tool6_tcp_UUID''/>
<mdsdev_ref uuidref=''MDD_mds1_tool6_UUID''/>
<osd_ref uuidref=''OSD_ost1_tool6_UUID''/>
<mountpoint_ref uuidref=''MNT_tool6_UUID''/>
</profile>
<node uuid=''tool7_UUID''
name=''tool7''>
<profile_ref uuidref=''PROFILE_tool7_UUID''/>
<network uuid=''NET_tool7_tcp_UUID''
nettype=''tcp'' name=''NET_tool7_tcp''>
<nid>10.20.7.107</nid>
<clusterid>0</clusterid>
<port>988</port>
</network>
</node>
<profile uuid=''PROFILE_tool7_UUID''
name=''PROFILE_tool7''>
<ldlm_ref uuidref=''ldlm_UUID''/>
<network_ref uuidref=''NET_tool7_tcp_UUID''/>
<osd_ref uuidref=''OSD_ost2_tool7_UUID''/>
<mountpoint_ref uuidref=''MNT_tool7_UUID''/>
</profile>
<node uuid=''tool8_UUID''
name=''tool8''>
<profile_ref uuidref=''PROFILE_tool8_UUID''/>
<network uuid=''NET_tool8_tcp_UUID''
nettype=''tcp'' name=''NET_tool8_tcp''>
<nid>10.20.7.108</nid>
<clusterid>0</clusterid>
<port>988</port>
</network>
</node>
<profile uuid=''PROFILE_tool8_UUID''
name=''PROFILE_tool8''>
<ldlm_ref uuidref=''ldlm_UUID''/>
<network_ref uuidref=''NET_tool8_tcp_UUID''/>
<osd_ref uuidref=''OSD_ost3_tool8_UUID''/>
<mountpoint_ref uuidref=''MNT_tool8_UUID''/>
</profile>
<node uuid=''tool9_UUID''
name=''tool9''>
<profile_ref uuidref=''PROFILE_tool9_UUID''/>
<network uuid=''NET_tool9_tcp_UUID''
nettype=''tcp'' name=''NET_tool9_tcp''>
<nid>10.20.7.109</nid>
<clusterid>0</clusterid>
<port>988</port>
</network>
</node>
<profile uuid=''PROFILE_tool9_UUID''
name=''PROFILE_tool9''>
<ldlm_ref uuidref=''ldlm_UUID''/>
<network_ref uuidref=''NET_tool9_tcp_UUID''/>
<osd_ref uuidref=''OSD_ost4_tool9_UUID''/>
<mountpoint_ref uuidref=''MNT_tool9_UUID''/>
</profile>
<node uuid=''tool10_UUID''
name=''tool10''>
<profile_ref uuidref=''PROFILE_tool10_UUID''/>
<network uuid=''NET_tool10_tcp_UUID''
nettype=''tcp'' name=''NET_tool10_tcp''>
<nid>10.20.7.110</nid>
<clusterid>0</clusterid>
<port>988</port>
</network>
</node>
<profile uuid=''PROFILE_tool10_UUID''
name=''PROFILE_tool10''>
<ldlm_ref uuidref=''ldlm_UUID''/>
<network_ref uuidref=''NET_tool10_tcp_UUID''/>
<osd_ref uuidref=''OSD_ost5_tool10_UUID''/>
<mountpoint_ref uuidref=''MNT_tool10_UUID''/>
</profile>
<mds uuid=''mds1_UUID'' name=''mds1''>
<active_ref uuidref=''MDD_mds1_tool6_UUID''/>
<lovconfig_ref uuidref=''LVCFG_lov1_UUID''/>
<filesystem_ref uuidref=''FS_fsname_UUID''/>
</mds>
<mdsdev uuid=''MDD_mds1_tool6_UUID''
name=''MDD_mds1_tool6''>
<fstype>ext3</fstype>
<devpath>/scratch/mds1</devpath>
<autoformat>no</autoformat>
<devsize>1000000</devsize>
<journalsize>0</journalsize>
<inodesize>0</inodesize>
<node_ref uuidref=''tool6_UUID''/>
<target_ref uuidref=''mds1_UUID''/>
</mdsdev>
<lov stripesize=''65536'' stripecount=''0''
stripepattern=''0'' name=''lov1''
uuid=''lov1_UUID''>
<mds_ref uuidref=''mds1_UUID''/>
<obd_ref uuidref=''ost1_UUID''/>
<obd_ref uuidref=''ost2_UUID''/>
<obd_ref uuidref=''ost3_UUID''/>
<obd_ref uuidref=''ost4_UUID''/>
<obd_ref uuidref=''ost5_UUID''/>
</lov>
<lovconfig uuid=''LVCFG_lov1_UUID''
name=''LVCFG_lov1''>
<lov_ref uuidref=''lov1_UUID''/>
</lovconfig>
<ost uuid=''ost1_UUID'' name=''ost1''>
<active_ref uuidref=''OSD_ost1_tool6_UUID''/>
</ost>
<osd osdtype=''obdfilter''
uuid=''OSD_ost1_tool6_UUID''
name=''OSD_ost1_tool6''>
<target_ref uuidref=''ost1_UUID''/>
<node_ref uuidref=''tool6_UUID''/>
<fstype>ext3</fstype>
<devpath>/scratch/ost1</devpath>
<autoformat>no</autoformat>
<devsize>2000000</devsize>
<journalsize>0</journalsize>
<inodesize>0</inodesize>
</osd>
<ost uuid=''ost2_UUID'' name=''ost2''>
<active_ref uuidref=''OSD_ost2_tool7_UUID''/>
</ost>
<osd osdtype=''obdfilter''
uuid=''OSD_ost2_tool7_UUID''
name=''OSD_ost2_tool7''>
<target_ref uuidref=''ost2_UUID''/>
<node_ref uuidref=''tool7_UUID''/>
<fstype>ext3</fstype>
<devpath>/scratch/ost2</devpath>
<autoformat>no</autoformat>
<devsize>2000000</devsize>
<journalsize>0</journalsize>
<inodesize>0</inodesize>
</osd>
<ost uuid=''ost3_UUID'' name=''ost3''>
<active_ref uuidref=''OSD_ost3_tool8_UUID''/>
</ost>
<osd osdtype=''obdfilter''
uuid=''OSD_ost3_tool8_UUID''
name=''OSD_ost3_tool8''>
<target_ref uuidref=''ost3_UUID''/>
<node_ref uuidref=''tool8_UUID''/>
<fstype>ext3</fstype>
<devpath>/scratch/ost3</devpath>
<autoformat>no</autoformat>
<devsize>2000000</devsize>
<journalsize>0</journalsize>
<inodesize>0</inodesize>
</osd>
<ost uuid=''ost4_UUID'' name=''ost4''>
<active_ref uuidref=''OSD_ost4_tool9_UUID''/>
</ost>
<osd osdtype=''obdfilter''
uuid=''OSD_ost4_tool9_UUID''
name=''OSD_ost4_tool9''>
<target_ref uuidref=''ost4_UUID''/>
<node_ref uuidref=''tool9_UUID''/>
<fstype>ext3</fstype>
<devpath>/scratch/ost4</devpath>
<autoformat>no</autoformat>
<devsize>2000000</devsize>
<journalsize>0</journalsize>
<inodesize>0</inodesize>
</osd>
<ost uuid=''ost5_UUID'' name=''ost5''>
<active_ref uuidref=''OSD_ost5_tool10_UUID''/>
</ost>
<osd osdtype=''obdfilter''
uuid=''OSD_ost5_tool10_UUID''
name=''OSD_ost5_tool10''>
<target_ref uuidref=''ost5_UUID''/>
<node_ref uuidref=''tool10_UUID''/>
<fstype>ext3</fstype>
<devpath>/scratch/ost5</devpath>
<autoformat>no</autoformat>
<devsize>2000000</devsize>
<journalsize>0</journalsize>
<inodesize>0</inodesize>
</osd>
<filesystem uuid=''FS_fsname_UUID''
name=''FS_fsname''>
<mds_ref uuidref=''mds1_UUID''/>
<obd_ref uuidref=''lov1_UUID''/>
</filesystem>
<mountpoint uuid=''MNT_tool6_UUID''
name=''MNT_tool6''>
<filesystem_ref uuidref=''FS_fsname_UUID''/>
<path>/lustre</path>
</mountpoint>
<mountpoint uuid=''MNT_tool7_UUID''
name=''MNT_tool7''>
<filesystem_ref uuidref=''FS_fsname_UUID''/>
<path>/lustre</path>
</mountpoint>
<mountpoint uuid=''MNT_tool8_UUID''
name=''MNT_tool8''>
<filesystem_ref uuidref=''FS_fsname_UUID''/>
<path>/lustre</path>
</mountpoint>
<mountpoint uuid=''MNT_tool9_UUID''
name=''MNT_tool9''>
<filesystem_ref uuidref=''FS_fsname_UUID''/>
<path>/lustre</path>
</mountpoint>
<mountpoint uuid=''MNT_tool10_UUID''
name=''MNT_tool10''>
<filesystem_ref uuidref=''FS_fsname_UUID''/>
<path>/lustre</path>
</mountpoint>
</lustre>
Commands used to start lustre:
run on every node:
lconf -v --reformat --node `hostname` /etc/lustre-conf.xml
if a node fails to start:
lconf -v --node `hostname` /etc/lustre-conf.xml
if it "successfully" started, but df shows i/o error:
umount /lustre
lconf -d -v --node `hostname` /etc/lustre-conf.xml
lconf -v --node `hostname` /etc/lustre-conf.xml
I was going to attach the output of these commands, but it was a lot of text and
the non-verbose looks like nothing is wrong, and I''m not sure if the
verbose
version looks correct or not. Please contact me if you would like any additional
output. Oh yeah, and running with the option --gdb doesn''t give me any
additional debug info, the file says something about debug not being available
for 2.5 kernels or something like that, I forgot to save one of the files.
Thanks in advance,
David Schwenker