Thomas Roth
2007-Jun-20 11:10 UTC
[Lustre-discuss] oops with Debian Sarge and Lustre 1.5.97
Hi all, I have now tried to install Lustre following the report https://mail.clusterfs.com/pipermail/lustre-discuss/2006-October/002311.html It all works reasonably well until I try to access the Lustre FS after I mounted it on a client. A simple "ls" may hang immediately, it may also work, as well as creation of files, reading, moving and deletion, but at some point a command will hang and the kern.log says: kernel: [95414.189214] kernel BUG at fs/dcache.c:862! kernel: [95414.189229] invalid opcode: 0000 [#2] kernel: [95414.189245] Modules linked in: osc mgc lustre lov lquota mdc ksockl nd ptlrpc obdclass lnet lvfs libcfs ide_generic ipv6 af_packet e1000 piix xfs evdev ide_cd ide_disk a ta_piix libata tulip de4x5 hp100 dmfe floppy dm_mod ehci_hcd usbmouse usbhid usbkbd usbcore rtc sr_mo d cdrom sd_mod sg 3w_9xxx scsi_mod kernel: [95414.189375] CPU: 0 kernel: [95414.189376] EIP: 0060:[<c017498a>] Not tainted VLI kernel: [95414.189377] EFLAGS: 00010246 (2.6.18 #1) kernel: [95414.189428] EIP is at d_instantiate_unique+0x150/0x167 kernel: [95414.189447] eax: f43d4d14 ebx: f43d4d14 ecx: f43d4d14 edx: f3d481b8 kernel: [95414.189468] esi: f43d4d77 edi: f62f0af7 ebp: f6501748 esp: f6517af4 kernel: [95414.189487] ds: 007b es: 007b ss: 0068 kernel: [95414.189505] Process chown (pid: 31020, ti=f6516000 task=f63f2570 task.ti=f6516000) kernel: [95414.189524] Stack: f43d4d14 f8f69180 f43d4d50 f62f0ad0 00000080 00010000 32b110c4 f62f0af0 kernel: [95414.189568] 00000007 00000000 f62f0a94 f6517f34 f6517c64 f9383de6 f62f0a94 f6501730 kernel: [95414.189609] 00000000 f6d60000 f9384011 f6501730 f62f0a94 00010000 f93a6a53 f939afb7 kernel: [95414.189652] Call Trace: kernel: [95414.189680] [<f9383de6>] ll_find_alias+0x1f/0x4b [lustre] kernel: [95414.189749] [<f9384011>] lookup_it_finish+0x1ff/0x5a3 [lustre] ... Anybody seen this bug before? The failed command, of course, cannot be killed. The Lustre FS may be unmounted with "umount -lf" and mounted again, usable until the next command hit this bug. Btw, the code to set up the Debian package that is contained in pkg-lustre on svn.debian.org still uses the Lustre 1.5.97 sources. However, the code for 1.6.0.1 has already been uploaded to that repository - any plans to follow up with the Debian code soon? Many thanks, Thomas
Bernd Schubert
2007-Jun-20 11:26 UTC
[Lustre-discuss] oops with Debian Sarge and Lustre 1.5.97
Hi Thomas,> > > Btw, the code to set up the Debian package that is contained in > pkg-lustre on svn.debian.org still uses the Lustre 1.5.97 sources. > However, the code for 1.6.0.1 has already been uploaded to that > repository - any plans to follow up with the Debian code soon?I''ve CC''ed Goswin, I think he has new packages. In the mean time I recommend to install the utils and kernel modules seperately, so to keep the debian utils and only to update the kernel modules. If you want to keep linux-2.6.18 you have to wait for lustre-1.6.1 or you manually have to apply patches to be found in bug #11039 (https://bugzilla.lustre.org/show_bug.cgi?id=11039). If you can upgrade to 2.6.20, you may take our q-leap release. I''m not sure about its 2.6.18 support status, since we now only care about 2.6.20+. http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/1.6/ Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH
Goswin von Brederlow
2007-Jun-21 10:41 UTC
[Lustre-discuss] oops with Debian Sarge and Lustre 1.5.97
Thomas Roth <t.roth@gsi.de> writes:> Btw, the code to set up the Debian package that is contained in > pkg-lustre on svn.debian.org still uses the Lustre 1.5.97 sources. > However, the code for 1.6.0.1 has already been uploaded to that > repository - any plans to follow up with the Debian code soon? > > Many thanks, > > > ThomasI commited some updates but a few kernel patch updates are out of sync with our local lustre version so you will probably get some crashes. I hope I can sync those tomorrow. MfG Goswin
Thomas Roth
2007-Jul-04 06:21 UTC
[Lustre-discuss] oops with Debian Sarge and Lustre 1.5.97
Goswin von Brederlow wrote:> Thomas Roth <t.roth@gsi.de> writes: > >> Btw, the code to set up the Debian package that is contained in >> pkg-lustre on svn.debian.org still uses the Lustre 1.5.97 sources. >> However, the code for 1.6.0.1 has already been uploaded to that >> repository - any plans to follow up with the Debian code soon? >> >> Many thanks, >> >> >> Thomas > > I commited some updates but a few kernel patch updates are out of sync > with our local lustre version so you will probably get some crashes. I > hope I can sync those tomorrow. > > MfG > GoswinWell, I got some. After changing a few ''1.5.97'' strings to ''1.6.0.1'' compilation and commenting out unavailable patches, compilation got as far as dh_install -p lustre-utils which failed with cp: cannot stat `./lustre-1.6.0.1/lnet/utils/debugctl'': No such file or directory A bit strange, though, as this ultility had been compiled, and without problems the compiler messages would suggest. But it is not there in the end to be put into the package. Some premature dh_clean somewhere? Hm, at least I can check what has been built into the lustre-source and linux-patch packages. Regards, Thomas
Bernd Schubert
2007-Jul-04 06:30 UTC
[Lustre-discuss] oops with Debian Sarge and Lustre 1.5.97
Hi Thomas,> A bit strange, though, as this ultility had been compiled, and without > problems the compiler messages would suggest. But it is not there in the > end to be put into the package. Some premature dh_clean somewhere? > Hm, at least I can check what has been built into the lustre-source and > linux-patch packages.we recently discovered disk corruption with our ldiskfs patch set, so you probably don''t want to use it in its present form. A newer patchset also caused problems. So you have two choices, either to remove extents and mballoc patches or to wait. Cheers, Bernd