Reto Gantenbein
2008-Aug-29 13:12 UTC
[Lustre-discuss] Mount error with message: "Err -22 on cfg command:"
Dear Lustre users Some days ago we had a problem that four OSTs were disconnecting themselves. To recover, I deactivated them with ''lctl conf_param homefs-OST0002.osc.active=0'', remounted them and waited until they were recovered and activated them again. Some hosts which kept the Lustre file system mounted at this time, resumed to work correctly on the paused devices. But when I want to mount Lustre with on a new client: node01 ~ # mount -t lustre lustre01 at tcp:lustre02 at tcp:/homefs /home it refuses with the following message: LustreError: 3794:0:(obd_config.c:897:class_process_proc_param()) homefs-OST0002-osc-ffff81022f630000: unknown param activate=0 LustreError: 3794:0:(obd_config.c:1062:class_config_llog_handler()) Err -22 on cfg command: Lustre: cmd=cf00f 0:homefs-OST0002-osc 1:osc.activate=0 LustreError: 15b-f: MGC10.1.140.1 at tcp: The configuration from log ''homefs-client'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. LustreError: 15c-8: MGC10.1.140.1 at tcp: The configuration from log ''homefs-client'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 3794:0:(llite_lib.c:1021:ll_fill_super()) Unable to process log: -22 LustreError: 3794:0:(mdc_request.c:1273:mdc_precleanup()) client import never connected LustreError: 3794:0:(connection.c:142:ptlrpc_put_connection()) NULL connection Lustre: client ffff81022f630000 umount complete LustreError: 3794:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-22) There are no wrong parameters because the same command did work on all the previous attempts. Also there is no connection problem between the hosts: lctl > peer_list 12345-10.1.140.1 at tcp [1]node01->lustre01:988 #6 12345-10.1.140.2 at tcp [1]node01->lustre02:988 #6 Why does this cfg command error arise? homefs-OST0002 is properly mounted on the lustre-server and is fully working with the other clients, as far as I can say. Any hints about this or anything I can do to troubleshoot this problem? Kind regards, Reto Gantenbein
Reto Gantenbein
2008-Aug-31 02:58 UTC
[Lustre-discuss] Mount error with message: "Err -22 on cfg command:"
Dear Lustre users and developers I couldn''t find a solution to work around this problem. So I was hoping that restarting the MGS/MDT would be a good try. But I was definitely wrong. When trying to remote the MGS/MDT device I got the same error: Aug 31 03:27:59 lustre01 LDISKFS FS on sde, internal journal Aug 31 03:27:59 lustre01 LDISKFS-fs: recovery complete. Aug 31 03:27:59 lustre01 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 31 03:27:59 lustre01 kjournald starting. Commit interval 5 seconds Aug 31 03:27:59 lustre01 LDISKFS FS on sde, internal journal Aug 31 03:27:59 lustre01 LDISKFS-fs: mounted filesystem with ordered data mode. Aug 31 03:27:59 lustre01 Lustre: MGS MGS started Aug 31 03:27:59 lustre01 Lustre: Enabling user_xattr Aug 31 03:27:59 lustre01 Lustre: 6934:0:(mds_fs.c: 446:mds_init_server_data()) RECOVERY: service homefs-MDT0000, 26 recoverable clients, last_transno 5217310552 Aug 31 03:27:59 lustre01 Lustre: MDT homefs-MDT0000 now serving dev (homefs-MDT0000/983b4a03-68de-a879-44c3-b91decd23fba), but will be in recovery until 26 clients reconnect, or if no clients reconnect for 4:10; during that time new clients will not be allowed to connect. Recovery progress can be monitored by watching /proc/fs/lustre/mds/ homefs-MDT0000/recovery_status. Aug 31 03:27:59 lustre01 Lustre: 6934:0:(lproc_mds.c: 260:lprocfs_wr_group_upcall()) homefs-MDT0000: group upcall set to / usr/sbin/l_getgroups Aug 31 03:27:59 lustre01 Lustre: homefs-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups Aug 31 03:27:59 lustre01 Lustre: 6934:0:(mds_lov.c:858:mds_notify()) MDS homefs-MDT0000: in recovery, not resetting orphans on homefs- OST0001_UUID Aug 31 03:27:59 lustre01 Lustre: 6934:0:(mds_lov.c:858:mds_notify()) MDS homefs-MDT0000: in recovery, not resetting orphans on homefs- OST0004_UUID Aug 31 03:27:59 lustre01 LustreError: 6842:0:(events.c: 55:request_out_callback()) @@@ type 4, status -5 req at ffff81011b56a400 x11/t0 o8->homefs-OST0003_UUID at 10.1.140.2@tcp:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/-22 Aug 31 03:27:59 lustre01 LustreError: 6842:0:(client.c: 975:ptlrpc_expire_one_request()) @@@ network error (sent at 1220146079, 0s ago) req at ffff81011b56a400 x11/t0 o8->homefs-OST0003_UUID at 10.1.140.2 @tcp:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 Aug 31 03:27:59 lustre01 LustreError: 6842:0:(events.c: 55:request_out_callback()) @@@ type 4, status -5 req at ffff81011b5bfa00 x13/t0 o8->homefs-OST0006_UUID at 10.1.140.2@tcp:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/-22 Aug 31 03:27:59 lustre01 LustreError: 6842:0:(client.c: 975:ptlrpc_expire_one_request()) @@@ network error (sent at 1220146079, 0s ago) req at ffff81011b5bfa00 x13/t0 o8->homefs-OST0006_UUID at 10.1.140.2 @tcp:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_config.c: 897:class_process_proc_param()) homefs-OST0002-osc: unknown param activate=0 Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_config.c: 1062:class_config_llog_handler()) Err -22 on cfg command: Aug 31 03:27:59 lustre01 Lustre: cmd=cf00f 0:homefs-OST0002-osc 1:osc.activate=0 Aug 31 03:27:59 lustre01 LustreError: 15b-f: MGC10.1.140.2 at tcp: The configuration from log ''homefs-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. Aug 31 03:27:59 lustre01 LustreError: 15c-8: MGC10.1.140.2 at tcp: The configuration from log ''homefs-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_mount.c: 1080:server_start_targets()) failed to start server homefs-MDT0000: -22 Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_mount.c: 1570:server_fill_super()) Unable to start targets: -22 Aug 31 03:27:59 lustre01 Lustre: Failing over homefs-MDT0000 Aug 31 03:27:59 lustre01 Lustre: *** setting obd homefs-MDT0000 device ''unknown-block(8,64)'' read-only *** Aug 31 03:27:59 lustre01 Turning device sde (0x800040) read-only Aug 31 03:27:59 lustre01 Lustre: MGS has stopped. Still here the -22 (unknown parameter -> homefs-OST0002-osc: unknown param activate=0) error is haunting me. WTF?! Where does this comes from? It doesn''t make any sense to me. When trying to mount the MGS/ MDT device a second time I get a kernel soft-lockup: Aug 31 03:34:32 lustre01 LustreError: 7456:0:(mgs_handler.c: 150:mgs_setup()) ASSERTION(!lvfs_check_rdonly(lvfs_sbdev(mnt- >mnt_sb))) failed Aug 31 03:34:32 lustre01 LustreError: 7456:0:(tracefile.c: 431:libcfs_assertion_failed()) LBUG Aug 31 03:34:32 lustre01 Lustre: 7456:0:(linux-debug.c: 168:libcfs_debug_dumpstack()) showing stack for process 7456 Aug 31 03:34:32 lustre01 mount.lustre R running task 0 7456 7455 (NOTLB) Aug 31 03:34:32 lustre01 ffff810077c9d598 000000000000000c 0000000000009c72 0000000000000004 Aug 31 03:34:32 lustre01 0000000000000004 0000000000000000 0000000000009c55 0000000000000004 Aug 31 03:34:32 lustre01 0000000000000018 ffff81011c54c180 0000000000000000 00000000ffffffff Aug 31 03:34:32 lustre01 Call Trace: Aug 31 03:34:32 lustre01 [<ffffffff80249faa>] module_text_address+0x3a/ 0x50 Aug 31 03:34:32 lustre01 [<ffffffff80240ada>] kernel_text_address+0x1a/ 0x30 Aug 31 03:34:32 lustre01 [<ffffffff80240ada>] kernel_text_address+0x1a/ 0x30 Aug 31 03:34:32 lustre01 [<ffffffff8020b3ba>] show_trace+0x20a/0x240 Aug 31 03:34:32 lustre01 [<ffffffff8020b4fb>] _show_stack+0xeb/0x100 Aug 31 03:34:32 lustre01 [<ffffffff880869fa>] :libcfs:lbug_with_loc +0x7a/0xc0 Aug 31 03:34:32 lustre01 [<ffffffff8808e724>] :libcfs:libcfs_assertion_failed+0x54/0x60 Aug 31 03:34:32 lustre01 [<ffffffff88307a71>] :mgs:cleanup_module +0xa71/0x2470 Aug 31 03:34:32 lustre01 [<ffffffff880f05cd>] :obdclass:class_new_export+0x52d/0x5b0 Aug 31 03:34:32 lustre01 [<ffffffff88105cdb>] :obdclass:class_setup +0x8bb/0xbe0 Aug 31 03:34:32 lustre01 [<ffffffff8810836a>] :obdclass:class_process_config+0x14ca/0x19f0 Aug 31 03:34:32 lustre01 [<ffffffff88112d94>] :obdclass:do_lcfg +0x9d4/0x15f0 Aug 31 03:34:32 lustre01 [<ffffffff8042b475>] scsi_disk_put+0x35/0x50 Aug 31 03:34:32 lustre01 [<ffffffff88114bd0>] :obdclass:lustre_common_put_super+0x1220/0x6890 Aug 31 03:34:32 lustre01 [<ffffffff88119a3f>] :obdclass:lustre_common_put_super+0x608f/0x6890 Aug 31 03:34:32 lustre01 [<ffffffff80293405>] __d_lookup+0x85/0x120 Aug 31 03:34:32 lustre01 [<ffffffff88086f48>] :libcfs:cfs_alloc +0x28/0x60 Aug 31 03:34:32 lustre01 [<ffffffff8810d8bf>] :obdclass:lustre_init_lsi +0x29f/0x660 Aug 31 03:34:32 lustre01 [<ffffffff8811a240>] :obdclass:lustre_fill_super+0x0/0x1ae0 Aug 31 03:34:32 lustre01 [<ffffffff8811bba3>] :obdclass:lustre_fill_super+0x1963/0x1ae0 Aug 31 03:34:32 lustre01 [<ffffffff802822d0>] set_anon_super+0x0/0xc0 Aug 31 03:34:32 lustre01 [<ffffffff8811a240>] :obdclass:lustre_fill_super+0x0/0x1ae0 Aug 31 03:34:32 lustre01 [<ffffffff80282583>] get_sb_nodev+0x63/0xe0 Aug 31 03:34:32 lustre01 [<ffffffff80281d62>] vfs_kern_mount+0x62/0xb0 Aug 31 03:34:32 lustre01 [<ffffffff80281e0a>] do_kern_mount+0x4a/0x80 Aug 31 03:34:32 lustre01 [<ffffffff8029955d>] do_mount+0x6cd/0x770 Aug 31 03:34:32 lustre01 [<ffffffff80260cb2>] __handle_mm_fault +0x5e2/0xa30 Aug 31 03:34:32 lustre01 [<ffffffff80384c21>] __up_read+0x21/0xb0 Aug 31 03:34:32 lustre01 [<ffffffff8021bae7>] do_page_fault+0x447/0x820 Aug 31 03:34:32 lustre01 [<ffffffff8025a006>] release_pages+0x186/0x1a0 Aug 31 03:34:32 lustre01 [<ffffffff8025da33>] zone_statistics+0x33/0x90 Aug 31 03:34:32 lustre01 [<ffffffff8025774b>] __get_free_pages+0x1b/0x40 Aug 31 03:34:32 lustre01 [<ffffffff8029969b>] sys_mount+0x9b/0x100 Aug 31 03:34:32 lustre01 [<ffffffff80209cf2>] system_call+0x7e/0x83 Is there some kind of log that is replayed when mounting the MGS/MDT? Can I clear it to be able to mount the device again? Or is this just an annoying bug? At the moment the entire file system is down. Is there a way to bring it back online or do I have to reformat it? Any help/hints/advice would be appreciated. I really cannot see where I made a mistake. Kind regards, Reto Gantenbein On Aug 29, 2008, at 3:12 PM, Reto Gantenbein wrote:> Dear Lustre users > > Some days ago we had a problem that four OSTs were disconnecting > themselves. To recover, I deactivated them with ''lctl conf_param > homefs-OST0002.osc.active=0'', remounted them and waited until they > were recovered and activated them again. Some hosts which kept the > Lustre file system mounted at this time, resumed to work correctly on > the paused devices. > > But when I want to mount Lustre with on a new client: > > node01 ~ # mount -t lustre lustre01 at tcp:lustre02 at tcp:/homefs /home > > it refuses with the following message: > > LustreError: 3794:0:(obd_config.c:897:class_process_proc_param()) > homefs-OST0002-osc-ffff81022f630000: unknown param activate=0 > LustreError: 3794:0:(obd_config.c:1062:class_config_llog_handler()) > Err -22 on cfg command: > Lustre: cmd=cf00f 0:homefs-OST0002-osc 1:osc.activate=0 > LustreError: 15b-f: MGC10.1.140.1 at tcp: The configuration from log > ''homefs-client'' failed (-22). Make sure this client and the MGS are > running compatible versions of Lustre. > LustreError: 15c-8: MGC10.1.140.1 at tcp: The configuration from log > ''homefs-client'' failed (-22). This may be the result of communication > errors between this node and the MGS, a bad configuration, or other > errors. See the syslog for more information. > LustreError: 3794:0:(llite_lib.c:1021:ll_fill_super()) Unable to > process log: -22 > LustreError: 3794:0:(mdc_request.c:1273:mdc_precleanup()) client > import never connected > LustreError: 3794:0:(connection.c:142:ptlrpc_put_connection()) NULL > connection > Lustre: client ffff81022f630000 umount complete > LustreError: 3794:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > mount (-22) > > There are no wrong parameters because the same command did work on all > the previous attempts. Also there is no connection problem between the > hosts: > > lctl > peer_list > 12345-10.1.140.1 at tcp [1]node01->lustre01:988 #6 > 12345-10.1.140.2 at tcp [1]node01->lustre02:988 #6 > > Why does this cfg command error arise? homefs-OST0002 is properly > mounted on the lustre-server and is fully working with the other > clients, as far as I can say. Any hints about this or anything I can > do to troubleshoot this problem? > > Kind regards, > Reto Gantenbein > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080831/f2831003/attachment-0001.html
Andreas Dilger
2008-Sep-01 05:44 UTC
[Lustre-discuss] Mount error with message: "Err -22 on cfg command:"
On Aug 29, 2008 15:12 +0200, Reto Gantenbein wrote:> Some days ago we had a problem that four OSTs were disconnecting > themselves. To recover, I deactivated them with ''lctl conf_param > homefs-OST0002.osc.active=0''Note that using "lctl conf_param" is intended to permanently set a configuration parameter, not for temprarily disabling an OSC. To disable the OSC temporarily you should have just done: lctl --device={device} deactivate and lctl --device={device} recover Now you have a parameter in the configuration log which disables this OSC as soon as any client mounts...> remounted them and waited until they > were recovered and activated them again. Some hosts which kept the > Lustre file system mounted at this time, resumed to work correctly on > the paused devices. > > But when I want to mount Lustre with on a new client: > > node01 ~ # mount -t lustre lustre01 at tcp:lustre02 at tcp:/homefs /home > > it refuses with the following message: > > LustreError: 3794:0:(obd_config.c:897:class_process_proc_param()) > homefs-OST0002-osc-ffff81022f630000: unknown param activate=0It seems you had a typo in your conf_param also... Handling (ignoring) of invalid config params is fixed with bug 14693 (fixed in 1.6.5). It doesn''t fix the problem of the _valid_ command that deactivates this OSC. I would suggest rewriting your configuration file with --writeconf, see "4.2.3.2 Running the Writeconf Command"... Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Reto Gantenbein
2008-Sep-02 03:49 UTC
[Lustre-discuss] Mount error with message: "Err -22 on cfg command:"
Hello Andreas Thanks a lot for your advice. Especially the Writeconf hint was very valuable. I knew that I already read about it before, but in the heat of the moment I couldn''t find it anymore. Its chapter has such a meaningful name "Other Configuration Tasks". Finally I could rescue the file system with unmounting all clients and all servers, then running tunefs.lustre --writeconf on all lustre devices and restarting all clients (!), before mounting the MGS and OST again. I''m aware that I also lost the client logs now, but before doing so, I could mount the MGS and OST on the servers, but not the clients. I always got some strange connection errors, imho because the clients were still trying to write some changes back to the servers even I did unmount them before with umount -f. This somehow prevented the clients from accessing the filesystem. Now I''m doing a fsck and I hope to be online again very soon. Kind regards, Reto Gantenbein On Sep 1, 2008, at 7:44 AM, Andreas Dilger wrote:> On Aug 29, 2008 15:12 +0200, Reto Gantenbein wrote: >> Some days ago we had a problem that four OSTs were disconnecting >> themselves. To recover, I deactivated them with ''lctl conf_param >> homefs-OST0002.osc.active=0'' > > Note that using "lctl conf_param" is intended to permanently set a > configuration parameter, not for temprarily disabling an OSC. To > disable the OSC temporarily you should have just done: > > lctl --device={device} deactivate > and > lctl --device={device} recover > > Now you have a parameter in the configuration log which disables > this OSC as soon as any client mounts... > >> remounted them and waited until they >> were recovered and activated them again. Some hosts which kept the >> Lustre file system mounted at this time, resumed to work correctly on >> the paused devices. >> >> But when I want to mount Lustre with on a new client: >> >> node01 ~ # mount -t lustre lustre01 at tcp:lustre02 at tcp:/homefs /home >> >> it refuses with the following message: >> >> LustreError: 3794:0:(obd_config.c:897:class_process_proc_param()) >> homefs-OST0002-osc-ffff81022f630000: unknown param activate=0 > > It seems you had a typo in your conf_param also... Handling > (ignoring) > of invalid config params is fixed with bug 14693 (fixed in 1.6.5). It > doesn''t fix the problem of the _valid_ command that deactivates this > OSC. > > I would suggest rewriting your configuration file with --writeconf, > see "4.2.3.2 Running the Writeconf Command"... > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
Brian J. Murrell
2008-Sep-03 13:49 UTC
[Lustre-discuss] [Lustre-devel] Mount error with message: "Err -22 on cfg command:"
On Sun, 2008-08-31 at 04:58 +0200, Reto Gantenbein wrote:> Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_config.c:897:class_process_proc_param()) homefs-OST0002-osc: unknown param activate=0^^^^^^^^^^^^^^^^^^^^^^^^ Any idea how this parameter, "activate", got set? Have you been messing with parameter settings? Can you do a "tunefs.lustre --print <mdt_device>" and copy the result here?> Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_config.c:1062:class_config_llog_handler()) Err -22 on cfg command: > Aug 31 03:27:59 lustre01 Lustre: cmd=cf00f 0:homefs-OST0002-osc 1:osc.activate=0 > Aug 31 03:27:59 lustre01 LustreError: 15b-f: MGC10.1.140.2 at tcp: The configuration from log ''homefs-MDT0000'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. > Aug 31 03:27:59 lustre01 LustreError: 15c-8: MGC10.1.140.2 at tcp: The configuration from log ''homefs-MDT0000'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. > Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_mount.c:1080:server_start_targets()) failed to start server homefs-MDT0000: -22 > Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: -22 > Aug 31 03:27:59 lustre01 Lustre: Failing over homefs-MDT0000 > Aug 31 03:27:59 lustre01 Lustre: *** setting obd homefs-MDT0000 device > ''unknown-block(8,64)'' read-only *** > Aug 31 03:27:59 lustre01 Turning device sde (0x800040) read-only > Aug 31 03:27:59 lustre01 Lustre: MGS has stopped.b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080903/569bbca7/attachment.bin
Reto Gantenbein
2008-Sep-11 16:05 UTC
[Lustre-discuss] [Lustre-devel] Mount error with message: "Err -22 on cfg command:"
Hi everybody On Wed, 2008-09-03 at 09:49 -0400, Brian J. Murrell wrote:> On Sun, 2008-08-31 at 04:58 +0200, Reto Gantenbein wrote: > > > Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_config.c:897:class_process_proc_param()) homefs-OST0002-osc: unknown param activate=0 > ^^^^^^^^^^^^^^^^^^^^^^^^ > > Any idea how this parameter, "activate", got set? Have you been messing > with parameter settings?Yeah, I did set it manually before rebooting the OST (see my first mail). Because it was written everywhere that I could activate the OST again, I didn''t thought much about it. Also the ''lctl conf_param homefs-OST0002.osc.active=1'' didn''t return any error.> Can you do a "tunefs.lustre --print <mdt_device>" and copy the result > here?In the meantime the file system is fully working again. See my previous mail. Anyway here the config: lustre01 ~ # tunefs.lustre --print /dev/sde checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: homefs-MDT0000 Index: 0 Lustre FS: homefs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: failover.node=10.1.140.2 at tcp mgsnode=10.1.140.2 at tcp mdt.group_upcall=/usr/sbin/l_getgroups Permanent disk data: Target: homefs-MDT0000 Index: 0 Lustre FS: homefs Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: failover.node=10.1.140.2 at tcp mgsnode=10.1.140.2 at tcp mdt.group_upcall=/usr/sbin/l_getgroups exiting before disk write.> > b.Hope it helps. Cheers Reto Gantenbein
Andreas Dilger
2008-Sep-12 00:45 UTC
[Lustre-discuss] [Lustre-devel] Mount error with message: "Err -22 on cfg command:"
On Sep 11, 2008 18:05 +0200, Reto Gantenbein wrote:> On Wed, 2008-09-03 at 09:49 -0400, Brian J. Murrell wrote: > > On Sun, 2008-08-31 at 04:58 +0200, Reto Gantenbein wrote: > > > > > Aug 31 03:27:59 lustre01 LustreError: 6934:0:(obd_config.c:897:class_process_proc_param()) homefs-OST0002-osc: unknown param activate=0 > > ^^^^^^^^^^^^^^^^^^^^^^^^ > > > > Any idea how this parameter, "activate", got set? Have you been messing > > with parameter settings? > > Yeah, I did set it manually before rebooting the OST (see my first > mail). Because it was written everywhere that I could activate the OST > again, I didn''t thought much about it. Also the ''lctl conf_param > homefs-OST0002.osc.active=1'' didn''t return any error.Note to everyone: "lctl conf_param" will set a parameter permanently into the filesystem configuration. This is almost certainly not what you want when an OST is deactivated for some short time. You can use instead "lctl --device {dev} deactivate" on the MDS + clients to deactivate an OST temporarily, and "lctl --device {dev} recover" to restore the clients when the OST is returned. Since this is a temporary setting, it will be lost if the client is rebooted, but that is likely right in this case. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.