Hi Daniel-- On Feb 14, 2006, at 10:49, Daniel Shearer wrote:> > Creating a generic ''client'' profile didn''t work, python errors, > however > creating a seperate client profile for each client did fix it.You might be running afoul of shell substitution, if you didn''t protect the *. ie, running something like: lmc ... --nid ''*'' instead of: lmc ... --nid * If not that, post your Python errors, maybe someone can help. -Phil
Hi Daniel-- On Feb 13, 2006, at 12:27, Daniel Shearer wrote:> There are some errors occuring whenever I try to start and stop > lustre. > > When starting lustre it says there are kernel modules missing. > > bash$ lconf --node `hostname` /etc/lustre/sparrow_lustre.xml > loading module: libcfs srcdir None devdir libcfs > ! modprobe (error 1): >> FATAL: Module libcfs not found.Let''s tackle this one first. You can pass -v to lconf to get more details, but I assume it will just tell you that it''s running modprobe. Do you have any idea why modprobe can''t find libcfs? If you run "modprobe libcfs" rather than insmod, do you get the same error? Thanks, -Phil
It is probably shell substitution, I will give it a try next time the cluster is down. -- Daniel Shearer On Tue, 14 Feb 2006, Phil Schwan wrote:> Hi Daniel-- > > On Feb 14, 2006, at 10:49, Daniel Shearer wrote: > > > > Creating a generic ''client'' profile didn''t work, python errors, > > however > > creating a seperate client profile for each client did fix it. > > You might be running afoul of shell substitution, if you didn''t > protect the *. ie, running something like: > > lmc ... --nid ''*'' > > instead of: > > lmc ... --nid * > > If not that, post your Python errors, maybe someone can help. > > -Phil >
Daniel, As Phil says in his message, verbose output from lconf would probably be a help. Not sure about the modprobe issues. I presume you have run a depmod since installing? Are you starting the server devices and mounting as a client all in one go with lconf? It looks like it, unless you are using a zeroconfig mount elsewhere that you haven''t documented If so I suggest that you define a client profile entry (a node named client with a nid of ''*'') and then try starting the with the lconf command you have now, which should start the server devices, and then once that is done you can mount with the client profile (change `hostname` to client) to get the file system mounted. Or use zeroconf to mount. Similarly when shutting down, perform the unmount phase first, and then once that is finished then perform the server device shutdown. Finally, on the node that is a client, MDS, and OST, try unmounting the client, the perform a MDS device shutdown first (add --group <mds_name>) and then the normal servcer device shutdown. This is because the MDS is also a client of the OSTs running on the same node. Fergal. -- Fergal.McCarthy@HP.com (The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated, you should consider this message and attachments as "HP CONFIDENTIAL".) -----Original Message----- From: lustre-discuss-bounces@clusterfs.com [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Daniel Shearer Sent: 13 February 2006 17:27 To: lustre-discuss@clusterfs.com Subject: [Lustre-discuss] Lustre start and stop errors There are some errors occuring whenever I try to start and stop lustre. When starting lustre it says there are kernel modules missing. bash$ lconf --node `hostname` /etc/lustre/sparrow_lustre.xml loading module: libcfs srcdir None devdir libcfs ! modprobe (error 1): > FATAL: Module libcfs not found. There are 15 modules missing in total which have to be loaded manually. (complete list at the end of this email). When I try to stop lustre with the command: bash$ lconf --cleanup --node `hostname` /etc/lustre/sparrow_lustre.xml --force it runs through the shutdown process up until it unloads the kernel modules and then fails with the message: unloading module: llite unloading module: mdc unloading module: lov unloading module: osc unloading module: ptlrpc ! unable to unload module: ptlrpc ERROR: Module ptlrpc is in use by mds,obdfilter,ost unloading module: obdclass ! unable to unload module: obdclass ERROR: Module obdclass is in use by mds,obdfilter,fsfilt_ldiskfs,ost,ptlrpc unloading module: lvfs ! unable to unload module: lvfs ERROR: Module lvfs is in use by mds,obdfilter,fsfilt_ldiskfs,ost,ptlrpc,obdclass Lustre is unmounted by this point but it continues to write messages to the logfile and it leaves processes running. If I then try to start lustre again it says: MDC: MDC_captain_sparrow-mds_MNT_captain a6a93_MNT_captain_cb5e7e3d0f sparrow-mds_UUID MDC: MDC_captain_sparrow-mds_MNT_captain a6a93_MNT_captain_cb5e7e3d0f ! /usr/sbin/lctl (17): error: attach: LCFG_ATTACH File exists This generally leaves me restarting each machine before remounting the filesystem. The Cluster consists of 35 OST/clients, one OST/Client/MDS and one client. Lustre version is: 1.4.5 Kernel is: 2.6.9-5.0.5.EL Lustre is started with this script: #!/bin/bash /sbin/insmod /lib/modules/`uname -r`/kernel/net/lustre/libcfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/net/lustre/portals.ko /sbin/insmod /lib/modules/`uname -r`/kernel/net/lustre/ksocknal.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/lvfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/obdclass.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/ptlrpc.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/ost.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/ldiskfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/fsfilt_ldiskfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/obdfilter.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/mdc.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/osc.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/lov.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/mds.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/llite.ko /usr/sbin/lconf --node `hostname` /etc/lustre/sparrow_lustre.xml _______________________________________________ Lustre-discuss mailing list Lustre-discuss@clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
On Mon, 13 Feb 2006, Mc Carthy, Fergal wrote:> Not sure about the modprobe issues. I presume you have run a depmod > since installing?Apparently I had not, running that fixed this problem.> Are you starting the server devices and mounting as a client all in one > go with lconf? It looks like it, unless you are using a zeroconfig mount > elsewhere that you haven''t documented > > If so I suggest that you define a client profile entry (a node named > client with a nid of ''*'') and then try starting the with the lconf > command you have now, which should start the server devices, and then > once that is done you can mount with the client profile (change > `hostname` to client) to get the file system mounted. Or use zeroconf to > mount.Creating a generic ''client'' profile didn''t work, python errors, however creating a seperate client profile for each client did fix it. Thanks for your help -- Daniel Shearer
There are some errors occuring whenever I try to start and stop lustre. When starting lustre it says there are kernel modules missing. bash$ lconf --node `hostname` /etc/lustre/sparrow_lustre.xml loading module: libcfs srcdir None devdir libcfs ! modprobe (error 1): > FATAL: Module libcfs not found. There are 15 modules missing in total which have to be loaded manually. (complete list at the end of this email). When I try to stop lustre with the command: bash$ lconf --cleanup --node `hostname` /etc/lustre/sparrow_lustre.xml --force it runs through the shutdown process up until it unloads the kernel modules and then fails with the message: unloading module: llite unloading module: mdc unloading module: lov unloading module: osc unloading module: ptlrpc ! unable to unload module: ptlrpc ERROR: Module ptlrpc is in use by mds,obdfilter,ost unloading module: obdclass ! unable to unload module: obdclass ERROR: Module obdclass is in use by mds,obdfilter,fsfilt_ldiskfs,ost,ptlrpc unloading module: lvfs ! unable to unload module: lvfs ERROR: Module lvfs is in use by mds,obdfilter,fsfilt_ldiskfs,ost,ptlrpc,obdclass Lustre is unmounted by this point but it continues to write messages to the logfile and it leaves processes running. If I then try to start lustre again it says: MDC: MDC_captain_sparrow-mds_MNT_captain a6a93_MNT_captain_cb5e7e3d0f sparrow-mds_UUID MDC: MDC_captain_sparrow-mds_MNT_captain a6a93_MNT_captain_cb5e7e3d0f ! /usr/sbin/lctl (17): error: attach: LCFG_ATTACH File exists This generally leaves me restarting each machine before remounting the filesystem. The Cluster consists of 35 OST/clients, one OST/Client/MDS and one client. Lustre version is: 1.4.5 Kernel is: 2.6.9-5.0.5.EL Lustre is started with this script: #!/bin/bash /sbin/insmod /lib/modules/`uname -r`/kernel/net/lustre/libcfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/net/lustre/portals.ko /sbin/insmod /lib/modules/`uname -r`/kernel/net/lustre/ksocknal.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/lvfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/obdclass.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/ptlrpc.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/ost.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/ldiskfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/fsfilt_ldiskfs.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/obdfilter.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/mdc.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/osc.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/lov.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/mds.ko /sbin/insmod /lib/modules/`uname -r`/kernel/fs/lustre/llite.ko /usr/sbin/lconf --node `hostname` /etc/lustre/sparrow_lustre.xml