Hi, I''m just testing how well the upgrades do work and somehow I have a problem: On Ost1: =======[ 271.985901] LustreError: Trying to start OBD lustre-OST0001_UUID using the wrong disk ost1_UUID. Were the /dev/ assignments rearranged? [ 271.998243] LustreError: 5058:0:(filter.c:1008:filter_prep()) cannot read last_rcvd: rc = -22 [ 272.006945] LustreError: 5058:0:(obd_config.c:299:class_setup()) setup lustre-OST0001 failed (-22) [ 272.016070] LustreError: 5058:0: (obd_config.c:1028:class_config_llog_handler()) Err -22 on cfg command: [ 272.025605] Lustre: cmd=cf003 0:lustre-OST0001 1:dev 2:type 3:f [ 272.025628] LustreError: MGC192.168.42.101@tcp: The configuration from log ''lustre-OST0001'' failed (-22). Make sure this client and the MGS are running compatible versions of Lustre. [ 272.042080] LustreError: MGC192.168.42.101@tcp: The configuration from log ''lustre-OST0001'' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Well, there''s not more in the syslog than this. root@sn-1:~# tunefs.lustre /dev/md3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: lustre-OST0001 Index: 1 Lustre FS: lustre Mount type: ldiskfs Flags: 0x402 (OST ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=192.168.42.101@tcp On the MGS/MDT ============= [ 343.129904] LustreError: lustre-OST0001 claims to have registered, but this MGS does not know about it. Assuming writeconf. [ 343.141706] Lustre: MGS: Regenerating lustre-OST0001 log by user request. [ 344.076044] LustreError: 8330:0:(events.c:55:request_out_callback()) @@@ type 4, status -113 req@ffff81007d58bc00 x11/t0 o8->lustre-OST0002_UUID@192.168.41.202@o2ib:6 lens 240/272 ref 2 fl Rpc:/0/0 rc 0/-22 root@beo-101:~# tunefs.lustre /dev/md3 checking for existing Lustre data: found CONFIGS/mountdata Reading CONFIGS/mountdata Read previous values: Target: lustre-MDT0000 Index: 0 UUID: mds-beo_UUID Lustre FS: lustre Mount type: ldiskfs Flags: 0x5 (MDT MGS ) Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr Parameters: Any idea whats wrong? Thanks, Bernd PS: Going back to lustre-1.4 works fine. -- Bernd Schubert Q-Leap Networks GmbH
Bernd Schubert wrote:> Hi, > > I''m just testing how well the upgrades do work and somehow I have a problem: > > On Ost1: > =======> [ 271.985901] LustreError: Trying to start OBD lustre-OST0001_UUID using the > wrong disk ost1_UUID. Were the /dev/ assignments rearranged? >Well, that''s the problem. Note the uuid is missing from the tunefs on the OST, but not the MDS. This a safety check to make sure you''re using the right disk; it should have been found when you did the initial tunefs upgrade to 1.6. You can erase the last_rcvd file manually out the of OST disk to get around this.> > > root@sn-1:~# tunefs.lustre /dev/md3 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre-OST0001 > Index: 1 > Lustre FS: lustre > Mount type: ldiskfs > Flags: 0x402 > (OST ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=192.168.42.101@tcp > > > On the MGS/MDT > =============> > root@beo-101:~# tunefs.lustre /dev/md3 > checking for existing Lustre data: found CONFIGS/mountdata > Reading CONFIGS/mountdata > > Read previous values: > Target: lustre-MDT0000 > Index: 0 > UUID: mds-beo_UUID > Lustre FS: lustre > Mount type: ldiskfs > Flags: 0x5 > (MDT MGS ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: > > > Any idea whats wrong? > > Thanks, > Bernd > > PS: Going back to lustre-1.4 works fine. > >
On Mon, Jun 25, 2007 at 11:15:09AM -0700, Nathaniel Rutman wrote:> Bernd Schubert wrote: > >Hi, > > > >I''m just testing how well the upgrades do work and somehow I have a > >problem: > > > >On Ost1: > >=======> >[ 271.985901] LustreError: Trying to start OBD lustre-OST0001_UUID using > >the wrong disk ost1_UUID. Were the /dev/ assignments rearranged? > > > Well, that''s the problem. Note the uuid is missing from the tunefs on > the OST, but not the MDS. > This a safety check to make sure you''re using the right disk; it should > have been found when you did the initial tunefs upgrade to 1.6. > You can erase the last_rcvd file manually out the of OST disk to get > around this. >Thanks, that did the trick! Thanks again, Bernd
On Monday 25 June 2007 20:15:09 Nathaniel Rutman wrote:> Bernd Schubert wrote: > > Hi, > > > > I''m just testing how well the upgrades do work and somehow I have a > > problem: > > > > On Ost1: > > =======> > [ 271.985901] LustreError: Trying to start OBD lustre-OST0001_UUID using > > the wrong disk ost1_UUID. Were the /dev/ assignments rearranged? > > Well, that''s the problem. Note the uuid is missing from the tunefs on > the OST, but not the MDS. > This a safety check to make sure you''re using the right disk; it should > have been found when you did the initial tunefs upgrade to 1.6. > You can erase the last_rcvd file manually out the of OST disk to get > around this.I think I figured out how this problem came up at all. When I did run tunefs.lustre for the first time it told I have to specify the index, since it couldn''t detect the index itself. Following the common ost-numeration I told tunefs.lustre OST1 has index=1 and OST2 has index=2. Seems I should have specified index=0 and index=1, respectively. After the deleting the last_rcvd files I could mount on the servers and on the clients, but on the clients the files didn''t appear properly and dmesg told me OST0000 is missing for a file. For the archives if someone should run into this in the future: To get around this I had to run "tunefs.lustre --writeconf" on all nodes and to delete the last_rcvd files on the OST nodes. To make sure there''s no corruption I also did run e2fsck on all systems. Specifying --writeconf also allows to correct an already given ost index. I still do not understand why tunefs.lustre couldn''t detect the indices itslf. The filesystems have been created with tools from lustre-1.4.9 and with kernel modules from lustre-1.4.10. Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
Bernd Schubert wrote:> On Monday 25 June 2007 20:15:09 Nathaniel Rutman wrote: > >> Bernd Schubert wrote: >> >>> Hi, >>> >>> I''m just testing how well the upgrades do work and somehow I have a >>> problem: >>> >>> On Ost1: >>> =======>>> [ 271.985901] LustreError: Trying to start OBD lustre-OST0001_UUID using >>> the wrong disk ost1_UUID. Were the /dev/ assignments rearranged? >>> >> Well, that''s the problem. Note the uuid is missing from the tunefs on >> the OST, but not the MDS. >> This a safety check to make sure you''re using the right disk; it should >> have been found when you did the initial tunefs upgrade to 1.6. >> You can erase the last_rcvd file manually out the of OST disk to get >> around this. >> > > I think I figured out how this problem came up at all. When I did run > tunefs.lustre for the first time it told I have to specify the index, since > it couldn''t detect the index itself. Following the common ost-numeration I > told tunefs.lustre OST1 has index=1 and OST2 has index=2. Seems I should have > specified index=0 and index=1, respectively. > > After the deleting the last_rcvd files I could mount on the servers and on the > clients, but on the clients the files didn''t appear properly and dmesg told > me OST0000 is missing for a file. > > For the archives if someone should run into this in the future: > To get around this I had to run "tunefs.lustre --writeconf" on all nodes and > to delete the last_rcvd files on the OST nodes. To make sure there''s no > corruption I also did run e2fsck on all systems. Specifying --writeconf also > allows to correct an already given ost index. > > I still do not understand why tunefs.lustre couldn''t detect the indices itslf. > The filesystems have been created with tools from lustre-1.4.9 and with > kernel modules from lustre-1.4.10. > >Did you get the message "ost with unknown index" at the first tunefs? The logic for all this is in mkfs_lustre.c read_local_files(). It should be able to identify the ost index from the last_rcvd files created in Lustre 1.4.6 on. A reason it might not is if the last_rcvd file was originally created in an earlier version of Lustre and upgraded to 1.4.6+. If you could send me the original last_rcvd, I''d be glad to take a look -- too bad I told you to erase it before :(
On Tuesday 26 June 2007 17:59:49 Nathaniel Rutman wrote:> Bernd Schubert wrote: > > On Monday 25 June 2007 20:15:09 Nathaniel Rutman wrote: > >> Bernd Schubert wrote: > > I still do not understand why tunefs.lustre couldn''t detect the indices > > itslf. The filesystems have been created with tools from lustre-1.4.9 and > > with kernel modules from lustre-1.4.10. > > Did you get the message "ost with unknown index" at the first tunefs? > The logic for all this is in mkfs_lustre.c read_local_files(). It > should be able to identify the > ost index from the last_rcvd files created in Lustre 1.4.6 on. A reason > it might not is if the last_rcvd > file was originally created in an earlier version of Lustre and upgraded > to 1.4.6+. If you could send me the original > last_rcvd, I''d be glad to take a look -- too bad I told you to erase it > before :(No problem, these are only test and development systems, so its easy to reproduce from scratch. root@sn-1:~# tunefs.lustre --ost --mgsnode=192.168.41.101@o2ib /dev/md3 checking for existing Lustre data: found last_rcvd tunefs.lustre: Unable to read /tmp/diraaRDyB/mountdata (No such file or directory). Trying last_rcvd Reading last_rcvd Feature compat=0, incompat=0 OST with unknown index Read previous values: Target: Index: unassigned Lustre FS: lustre Mount type: ldiskfs Flags: 0x212 (OST needs_index upgrade1.4 ) Persistent mount opts: Parameters: tunefs.lustre FATAL: Can''t find the target index, specify with --index tunefs.lustre: exiting with 22 (Invalid argument) I''ve attached the last_rcvd file from ost1. Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH -------------- next part -------------- A non-text attachment was scrubbed... Name: last_rcvd Type: application/octet-stream Size: 8448 bytes Desc: not available Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070627/942a8ffe/last_rcvd-0001.obj
On Tuesday 26 June 2007 17:59:49 you wrote:> > I still do not understand why tunefs.lustre couldn''t detect the indices > > itslf. The filesystems have been created with tools from lustre-1.4.9 and > > with kernel modules from lustre-1.4.10. > > Did you get the message "ost with unknown index" at the first tunefs? > The logic for all this is in mkfs_lustre.c read_local_files(). It > should be able to identify the > ost index from the last_rcvd files created in Lustre 1.4.6 on. A reasonI just looked into the sources myself and found the problem. Well, not really a problem of the code, but my ignorance. I only installed e2fsck.static-cfs to be able to run filesystem checks, but otherwise kept the ubuntu-default e2fsprogs. Looking into read_local_files() I see that debugfs is used to dump the last_rcvd file. Using debugs of ubuntus e2fsprogs a last_rcvd file is created which seems to consist of zeros and random data (first many many zeros and then something I can''t identify). After installing e2fsprogs-cfs I could update OST2 properly to lustre-1.6 and also properly connect it to the MDS. And now a big warning: Even though tune2fs.lustre failed on OST1, it seems to have modified the filesystem. As on OST2 I could convert it to lustre-1.6 with the proper tools, but when I tried to mount it, it failed and in dmesg I got these messsages [ 2521.326416] LDISKFS-fs error (device md3): ldiskfs_add_entry: bad entry in directory #75022340: rec_len % 4 != 0 - offset=0, inode=537537071, rec_len=2346, name_len=102 [ 2521.341692] Aborting journal on device md3. [ 2521.346044] Remounting filesystem read-only I already got this when I converted last time, but thought the filesystem might have been damaged before. May suggest to add to tunefs.lustre some tests if the proper e2fsprogs have been installed? Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH
On Jun 27, 2007 16:43 +0200, Bernd Schubert wrote:> And now a big warning: Even though tune2fs.lustre failed on OST1, it seems to > have modified the filesystem. As on OST2 I could convert it to lustre-1.6 > with the proper tools, but when I tried to mount it, it failed and in dmesg I > got these messsages > > [ 2521.326416] LDISKFS-fs error (device md3): ldiskfs_add_entry: bad entry in > directory #75022340: rec_len % 4 != 0 - offset=0, inode=537537071, > rec_len=2346, name_len=102 > [ 2521.341692] Aborting journal on device md3. > [ 2521.346044] Remounting filesystem read-only > > I already got this when I converted last time, but thought the filesystem > might have been damaged before.This is likely because the ubuntu e2fsprogs are patched to use the ext4 extent format changes (originally from CFS), but I suspect they are old and missing some important fixes.> May suggest to add to tunefs.lustre some tests if the proper e2fsprogs have > been installed?The e2fsprogs should THEMSELVES report problems like this (e.g. "trying to access a filesystem with unsupported features"). Could you please do the following: - set up your system with a minimum-sized OST filesystem (2100 4kB blocks is the smallest ext3 filesystem I think) - save the pre-upgrade image (gzipped) - run the upgrade process on your system (tune2fs.lustre) attach both to a new bug. Even though Ubuntu is not a supported platform, I want to be sure that there is no omission in the CFS e2fsprogs code. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Jun 27, 2007 16:43 +0200, Bernd Schubert wrote:> I just looked into the sources myself and found the problem. Well, not really > a problem of the code, but my ignorance. > I only installed e2fsck.static-cfs to be able to run filesystem checks, but > otherwise kept the ubuntu-default e2fsprogs. > > Looking into read_local_files() I see that debugfs is used to dump the > last_rcvd file. Using debugs of ubuntus e2fsprogs a last_rcvd file is created > which seems to consist of zeros and random data (first many many zeros and > then something I can''t identify).This is likely the fault of the ubuntu e2fsprogs using the "for testing" ext4 patches, which claim to have support for extent-mapped files, but this is incomplete.> And now a big warning: Even though tune2fs.lustre failed on OST1, it seems to > have modified the filesystem. As on OST2 I could convert it to lustre-1.6 > with the proper tools, but when I tried to mount it, it failed and in dmesg I > got these messsages > > [ 2521.326416] LDISKFS-fs error (device md3): ldiskfs_add_entry: bad entry in > directory #75022340: rec_len % 4 != 0 - offset=0, inode=537537071, > rec_len=2346, name_len=102 > [ 2521.341692] Aborting journal on device md3. > [ 2521.346044] Remounting filesystem read-onlyThe non-ubuntu e2fsprogs would have failed outright with a message like "Filesystem has unsupported feature(s)".> I already got this when I converted last time, but thought the filesystem > might have been damaged before. > > May suggest to add to tunefs.lustre some tests if the proper e2fsprogs have > been installed?This should already be handled appropriately by tune2fs, and any additional checks in tunefs.lustre would invariably be incorrect for some other system. I''d suggest upgrading entirely to e2fsprogs-1.39-cfs8 (you should be able to download the tarball and build .deb packages). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.