Jason Williams
2008-Oct-10 12:32 UTC
[Lustre-discuss] Kernel Oops on the stock RHEL 4 kernel?
Hi I have been playing around with lustre 1.6.5.1 as part of some testing that we are doing for an up and coming cluster. I installed it on 2 test machines, Dell 2950''s with 8 GB of ram to be exact, and fired up a test file system. The test file system was very simple: /dev/sdb - ~400GB for the MDT/MGS /dev/sdc - ~4TB for the OST Both the mdt/mgs and the ost mounted fine, as did the actual lustre filesystem when I mounted it with the mount command: mount -t lustre hostname at tcp0:/testfs /mnt/testfs Then I tried to do: dd if=/dev/zero of=/mnt/testfs/testfile bs=1M count=1024 Just as a preliminary test. And what happens is it''ll write ~700MB then I get a horrible kernel oops and the machine locks up. The whole thing is pretty stock, it''s running on RHEL4 with the Lustre pre-built RHEL4 kernel RPMS and the pre-built lustre-1.6.5.1 for RHEL4. So I will be relatively surprised if this is a bug that''s just slipped under the radar. I would be less surprised if it is something I am doing wrong, but I followed the quick-start for a simple lustre config and this is now where I am stuck. Attached is the kernel oops, if anyone is interested. I looked at the function it oops''ed on and nothing jumped out as out of the ordinary at first glance, so I am stumped. The last line of the oops is sort of cut off because it was captured via serial-over-lan console. I did find reference to a kernel oops in mballoc on google, but the resulting bugzilla bug is not readable by the public, so I am unsure if this has anything to do with that bug. Any thoughts anyone? -- Jason Williams Linux Systems Administrator Johns Hopkins University -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lustre-oops.txt Url: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081010/01ab67d8/attachment.txt
Jason Williams
2008-Oct-10 12:58 UTC
[Lustre-discuss] Kernel Oops on the stock RHEL 4 kernel?
Guy Coates wrote:> Jason Williams wrote: > >> Hi >> I have been playing around with lustre 1.6.5.1 as part of some testing >> that we are doing for an up and coming cluster. I installed it on 2 >> test machines, Dell 2950''s with 8 GB of ram to be exact, and fired up a >> test file system. >> The test file system was very simple: >> >> /dev/sdb - ~400GB for the MDT/MGS >> /dev/sdc - ~4TB for the OST >> >> > > Quick questions; did you mount the filesystem on the OST/MDS machine, or on a > separate client? Mounting filesystems on OST/MDS nodes is not supported. > > Does it work with an OST disk < 2TB? support for disks > 2TB is patchy > depending on your exact disk controller hardware. > > Cheers, > > Guy > >Hi Guy Hmm, yes the file system is mounting on the OST/MDS node. Lustre 1.4 seemed to not really have any issue with that. I wonder why it''s not supported. And thanks for the heads up on the > 2TB support. Looks like I get to go back to my boss and have a discussion about alternatives.... -- Jason
Brian J. Murrell
2008-Oct-10 18:38 UTC
[Lustre-discuss] Kernel Oops on the stock RHEL 4 kernel?
On Fri, 2008-10-10 at 08:32 -0400, Jason Williams wrote:> HiHi.> ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at mballoc:1334This looks like bug 16101 fixed in 1.6.6. There is a patch in that bug you can apply if you wish or you can wait for 1.6.6. Before you ask though, I don''t know when 1.6.6 will be released. b.
Jason Williams
2008-Oct-10 19:02 UTC
[Lustre-discuss] Kernel Oops on the stock RHEL 4 kernel?
Brian J. Murrell wrote:> On Fri, 2008-10-10 at 08:32 -0400, Jason Williams wrote: > >> Hi >> > > Hi. > > >> ----------- [cut here ] --------- [please bite here ] --------- >> Kernel BUG at mballoc:1334 >> > > This looks like bug 16101 fixed in 1.6.6. There is a patch in that bug > you can apply if you wish or you can wait for 1.6.6. Before you ask > though, I don''t know when 1.6.6 will be released. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >Brian, What about Guy''s comments about running the OST on a machine that also has the filesystem mounted via lustre client? Is that still technically unsupported in 1.6.6? -- Jason
Andreas Dilger
2008-Oct-12 05:24 UTC
[Lustre-discuss] Kernel Oops on the stock RHEL 4 kernel?
On Oct 10, 2008 15:02 -0400, Jason Williams wrote:> What about Guy''s comments about running the OST on a machine that also > has the filesystem mounted via lustre client? Is that still technically > unsupported in 1.6.6?Technically yes, but if you have enough RAM (i.e. your application is not consuming all of the client RAM) it _probably_ won''t be an issue. There is a risk of deadlock due to the VM if you run client-on-server, but in most cases this doesn''t get hit. The bigger issue in most cases is that if one of your clients dies then the data on that client will not be accessible. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.