ashok bharat bayana
2008-Mar-14 05:46 UTC
[Lustre-discuss] Help needed in Building lustre using pre-packaged releases
Hi, Can anyone guide me in building the lustre using pre-packaged lustre release.I''m using Ubuntu 7.10 I want to build lustre using RHEL2.6 rpms available on my system.I''m referring how_to in wiki. but in that no detailed step by step procedure is given for building lustre using pre-packed release. I''m in need of this. Thanks and Regards, Ashok Bharat -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org on behalf of lustre-discuss-request at lists.lustre.org Sent: Fri 3/14/2008 2:25 AM To: lustre-discuss at lists.lustre.org Subject: Lustre-discuss Digest, Vol 26, Issue 36 Send Lustre-discuss mailing list submissions to lustre-discuss at lists.lustre.org To subscribe or unsubscribe via the World Wide Web, visit http://lists.lustre.org/mailman/listinfo/lustre-discuss or, via email, send a message with subject or body ''help'' to lustre-discuss-request at lists.lustre.org You can reach the person managing the list at lustre-discuss-owner at lists.lustre.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Lustre-discuss digest..." Today''s Topics: 1. Re: OSS not healty (Andreas Dilger) 2. Re: e2scan for backup (Andreas Dilger) 3. Howto map block devices to Lustre devices? (Chris Worley) 4. Re: e2fsck mdsdb: DB_NOTFOUND (Aaron Knister) 5. Re: e2fsck mdsdb: DB_NOTFOUND (Karen M. Fernsler) 6. Re: Howto map block devices to Lustre devices? (Klaus Steden) ---------------------------------------------------------------------- Message: 1 Date: Thu, 13 Mar 2008 11:11:19 -0700 From: Andreas Dilger <adilger at sun.com> Subject: Re: [Lustre-discuss] OSS not healty To: "Brian J. Murrell" <Brian.Murrell at sun.com> Cc: lustre-discuss at lists.lustre.org Message-ID: <20080313181119.GB3217 at webber.adilger.int> Content-Type: text/plain; charset=us-ascii On Mar 13, 2008 13:44 +0100, Brian J. Murrell wrote:> On Thu, 2008-03-13 at 12:34 +0100, Frank Mietke wrote: > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701448] attempt to access beyond end of device > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701454] sda: rw=1, want=11287722456, limit=7796867072 > > This is pretty self-explanatory. Something tried to read beyond the end > of the disk. Something has a misunderstanding of how big the disk is. > Is it possible that the disk format process was misled about the disk > size during initialization?Unlikely.> Andreas, does mkfs do any bounds checking to verify the sanity of the > mkfs request? I.e. does it make sure that if/when you specify a number > of blocks for a filesystem that that many block are available?Yes, mke2fs will zero out the last ~128kB of the device to overwrite any MD RAID signatures, and also verify that the device is as big as requested. These kind of errors are usually a result of corruption internal to the filesystem, and some garbage is interpreted as a block number beyond the end of the device.> > Mar 13 06:17:31 chic2e24 kernel: [3068633.701555] attempt to access beyond end of device > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701558] sda: rw=1, want=25366292592, limit=7796867072 > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701562] Buffer I/O error on device sda, logical block 3170786573 > > Mar 13 06:17:31 chic2e24 kernel: [3068633.701785] lost page write due to I/O error on sda > > Mar 13 06:17:31 chic2e24 kernel: [3068633.702004] Aborting journal on device sda. > > This is all just fallout error messages from the attempted read beyond > EOF.Time to unmount the filesystem and run a full e2fsck "e2fsck -fp /dev/sdaNNN" Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ------------------------------ Message: 2 Date: Thu, 13 Mar 2008 11:22:48 -0700 From: Andreas Dilger <adilger at sun.com> Subject: Re: [Lustre-discuss] e2scan for backup To: Jakob Goldbach <jakob at goldbach.dk> Cc: Lustre User Discussion Mailing List <lustre-discuss at lists.lustre.org> Message-ID: <20080313182248.GD3217 at webber.adilger.int> Content-Type: text/plain; charset=us-ascii On Mar 13, 2008 12:59 +0100, Jakob Goldbach wrote:> On Wed, 2008-03-12 at 23:12 +0100, Brian J. Murrell wrote: > > On Wed, 2008-03-12 at 14:50 -0600, Lundgren, Andrew wrote: > > > How do you do the snapshot? > > > > lvcreate -s > > No need to freeze the filesystem while creating the snapshot to ensure a > consistent filesystem on the snapshot ?Yes, but this is handled internally by LVM and ext3 when the snapshot is created.> (xfs has a xfs_freeze function that does just this)In fact I was just discussing this with an XFS developer and this is a source of problems for them because if you do xfs_freeze before doing the LVM snapshot it will deadlock. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. ------------------------------ Message: 3 Date: Thu, 13 Mar 2008 13:50:51 -0600 From: "Chris Worley" <worleys at gmail.com> Subject: [Lustre-discuss] Howto map block devices to Lustre devices? To: lustre-discuss <lustre-discuss at lists.lustre.org> Message-ID: <f3177b9e0803131250n23084fd7g184ef07403a298cd at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 I''m trying to deactivate some OST''s, but to find them I''ve been searching through /var/log/messages, as in: # ssh io2 grep -e sde -e sdf -e sdj -e sdk -e sdd /var/log/messages"*" | grep Server /var/log/messages:Mar 10 13:27:54 io2 kernel: Lustre: Server ddnlfs-OST0035 on device /dev/sdf has started /var/log/messages.1:Mar 4 16:02:13 io2 kernel: Lustre: Server ddnlfs-OST0030 on device /dev/sdf has started /var/log/messages.1:Mar 6 14:34:44 io2 kernel: Lustre: Server ddnlfs-OST002e on device /dev/sdd has started /var/log/messages.1:Mar 6 14:34:55 io2 kernel: Lustre: Server ddnlfs-OST002f on device /dev/sde has started /var/log/messages.1:Mar 6 14:35:16 io2 kernel: Lustre: Server ddnlfs-OST0030 on device /dev/sdf has started /var/log/messages.1:Mar 6 15:20:48 io2 kernel: Lustre: Server ddnlfs-OST002f on device /dev/sde has started /var/log/messages.1:Mar 6 16:08:38 io2 kernel: Lustre: Server ddnlfs-OST002e on device /dev/sdd has started /var/log/messages.1:Mar 6 16:08:43 io2 kernel: Lustre: Server ddnlfs-OST0030 on device /dev/sdf has started /var/log/messages.1:Mar 6 16:08:53 io2 kernel: Lustre: Server ddnlfs-OST0034 on device /dev/sdj has started Note that there isn''t an entry for sdk (probably rotated out), and sdf has two different names. Is there a better way to find the right Lustre device name map to Linux block device? I''m trying to cull-out slow disks. I''m hoping that just by "deactivating" the device in lctl, it''ll quit using it, and that''s the best way to get rid of a slow drive... correct? Thanks, Chris ------------------------------ Message: 4 Date: Thu, 13 Mar 2008 16:50:04 -0400 From: Aaron Knister <aaron at iges.org> Subject: Re: [Lustre-discuss] e2fsck mdsdb: DB_NOTFOUND To: Michelle Butler <mbutler at ncsa.uiuc.edu> Cc: Andreas Dilger <adilger at sun.com>, lustre-discuss at clusterfs.com, abe-admin at ncsa.uiuc.edu, ckerner at ncsa.uiuc.edu, alex parga <aparga at ncsa.uiuc.edu>, set at ncsa.uiuc.edu Message-ID: <85E6EB25-EC03-4D93-BD8B-B267F65A5400 at iges.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes What version of lustre/kernel is running on the problematic server? On Mar 13, 2008, at 11:02 AM, Michelle Butler wrote:> We got past that point by e2fsck the individual partitions first. > > But we are still having problems.. I''m sorry to > say. we have an I/O server that is fine until > we start Lustre. It starts spewing lustre call traces : > > Call > Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > <ffffffff8013327d>{default_wake_function+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffff80110ebb>{child_rip+8} > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0} > <ffffffff80110eb3>{child_rip+0} > > ll_ost_io_232 S 000001037d6bbee8 0 26764 1 26765 > 26763 (L-TLB) > 000001037d6bbe58 0000000000000046 0000000100000246 0000000000000003 > 0000000000000016 0000000000000001 00000104100bcb20 > 0000000300000246 > 00000103f5470030 000000000001d381 > Call > Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > <ffffffff8013327d>{default_wake_function+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffff80110ebb>{child_rip+8} > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0} > <ffffffff80110eb3>{child_rip+0} > > ll_ost_io_233 S 00000103de847ee8 0 26765 1 26766 > 26764 (L-TLB) > 00000103de847e58 0000000000000046 0000000100000246 0000000000000001 > 0000000000000016 0000000000000001 000001040f83c620 > 0000000100000246 > 00000103e627e030 000000000001d487 > Call > Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > <ffffffff8013327d>{default_wake_function+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffff80110ebb>{child_rip+8} > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0} > <ffffffff80110eb3>{child_rip+0} > > ll_ost_io_234 S 00000100c4353ee8 0 26766 1 26767 > 26765 (L-TLB) > 00000100c4353e58 0000000000000046 0000000100000246 0000000000000003 > 0000000000000016 0000000000000001 00000104100bcc60 > 0000000300000246 > 00000103de81b810 000000000001d945 > Call > Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > <ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > <ffffffff8013327d>{default_wake_function+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retr???f?????????c?????????c?????? > > Ks[F???????????? > <ffffffff8013327d>{default_wake_function+0} > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffffa03e0156>{:ptl > > It then panic''s the kernel.. ?? > > Michelle Butler > > At 02:39 AM 3/13/2008, Andreas Dilger wrote: >> On Mar 12, 2008 06:44 -0500, Karen M. Fernsler wrote: >>> I''m running: >>> >>> e2fsck -y -v --mdsdb mdsdb --ostdb osth3_1 /dev/mapper/27l4 >>> >>> and getting: >>> >>> Pass 6: Acquiring information for lfsck >>> error getting mds_hdr (3685469441:8) in >> /post/cfg/mdsdb: DB_NOTFOUND: No matching key/data pair found >>> e2fsck: aborted >>> >>> Any ideas how to get around this? >> >> Does "mdsdb" actually exist? This should be created by first >> running: >> >> e2fsck --mdsdb mdsdb /dev/{mdsdevicename} >> >> before running your above command on the OST. >> >> Please also try specifying the absolute pathname for the mdsdb and >> ostdb >> files. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org ------------------------------ Message: 5 Date: Thu, 13 Mar 2008 15:51:22 -0500 From: "Karen M. Fernsler" <fernsler at ncsa.uiuc.edu> Subject: Re: [Lustre-discuss] e2fsck mdsdb: DB_NOTFOUND To: Aaron Knister <aaron at iges.org> Cc: Andreas Dilger <adilger at sun.com>, lustre-discuss at clusterfs.com, Michelle Butler <mbutler at ncsa.uiuc.edu>, abe-admin at ncsa.uiuc.edu, ckerner at ncsa.uiuc.edu, alex parga <aparga at ncsa.uiuc.edu>, set at ncsa.uiuc.edu Message-ID: <20080313205122.GA17635 at ncsa.uiuc.edu> Content-Type: text/plain; charset=iso-8859-1 2.6.9-42.0.10.EL_lustre-1.4.10.1smp This is a 2.6.9-42.0.10.E kernel with lustre-1.4.10.1. This has been working ok for almost a year. We did try to export this filesystem to another cluster over nfs before we started seeing problems, but I don''t know how related if at all that is. We are now trying to dissect the problem by inspecting the switch logs these nodes are connected to. thanks, -k On Thu, Mar 13, 2008 at 04:50:04PM -0400, Aaron Knister wrote:> What version of lustre/kernel is running on the problematic server? > > On Mar 13, 2008, at 11:02 AM, Michelle Butler wrote: > > >We got past that point by e2fsck the individual partitions first. > > > >But we are still having problems.. I''m sorry to > >say. we have an I/O server that is fine until > >we start Lustre. It starts spewing lustre call traces : > > > >Call > >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > > <ffffffff8013327d>{default_wake_function+0} > ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > ><ffffffff80110ebb>{child_rip+8} > > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0} > ><ffffffff80110eb3>{child_rip+0} > > > >ll_ost_io_232 S 000001037d6bbee8 0 26764 1 26765 > >26763 (L-TLB) > >000001037d6bbe58 0000000000000046 0000000100000246 0000000000000003 > > 0000000000000016 0000000000000001 00000104100bcb20 > >0000000300000246 > > 00000103f5470030 000000000001d381 > >Call > >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > > <ffffffff8013327d>{default_wake_function+0} > ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > ><ffffffff80110ebb>{child_rip+8} > > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0} > ><ffffffff80110eb3>{child_rip+0} > > > >ll_ost_io_233 S 00000103de847ee8 0 26765 1 26766 > >26764 (L-TLB) > >00000103de847e58 0000000000000046 0000000100000246 0000000000000001 > > 0000000000000016 0000000000000001 000001040f83c620 > >0000000100000246 > > 00000103e627e030 000000000001d487 > >Call > >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > > <ffffffff8013327d>{default_wake_function+0} > ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > > <ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > ><ffffffff80110ebb>{child_rip+8} > > <ffffffffa03e0163>{:ptlrpc:ptlrpc_main+0} > ><ffffffff80110eb3>{child_rip+0} > > > >ll_ost_io_234 S 00000100c4353ee8 0 26766 1 26767 > >26765 (L-TLB) > >00000100c4353e58 0000000000000046 0000000100000246 0000000000000003 > > 0000000000000016 0000000000000001 00000104100bcc60 > >0000000300000246 > > 00000103de81b810 000000000001d945 > >Call > >Trace:<ffffffffa02fa089>{:libcfs:lcw_update_time+22} > ><ffffffffa03e06e3>{:ptlrpc:ptlrpc_main+1408} > > <ffffffff8013327d>{default_wake_function+0} > ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > > > ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retr???f?????????c?????????c?????? > > > >Ks[F???????????? > ><ffffffff8013327d>{default_wake_function+0} > ><ffffffffa03e0156>{:ptlrpc:ptlrpc_retry_rqbds+0} > > <ffffffffa03e0156>{:ptl > > > >It then panic''s the kernel.. ?? > > > >Michelle Butler > > > >At 02:39 AM 3/13/2008, Andreas Dilger wrote: > >>On Mar 12, 2008 06:44 -0500, Karen M. Fernsler wrote: > >>>I''m running: > >>> > >>>e2fsck -y -v --mdsdb mdsdb --ostdb osth3_1 /dev/mapper/27l4 > >>> > >>>and getting: > >>> > >>>Pass 6: Acquiring information for lfsck > >>>error getting mds_hdr (3685469441:8) in > >>/post/cfg/mdsdb: DB_NOTFOUND: No matching key/data pair found > >>>e2fsck: aborted > >>> > >>>Any ideas how to get around this? > >> > >>Does "mdsdb" actually exist? This should be created by first > >>running: > >> > >>e2fsck --mdsdb mdsdb /dev/{mdsdevicename} > >> > >>before running your above command on the OST. > >> > >>Please also try specifying the absolute pathname for the mdsdb and > >>ostdb > >>files. > >> > >>Cheers, Andreas > >>-- > >>Andreas Dilger > >>Sr. Staff Engineer, Lustre Group > >>Sun Microsystems of Canada, Inc. > > > > > >_______________________________________________ > >Lustre-discuss mailing list > >Lustre-discuss at lists.lustre.org > >http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > >-- Karen Fernsler Systems Engineer National Center for Supercomputing Applications ph: (217) 265 5249 email: fernsler at ncsa.uiuc.edu ------------------------------ Message: 6 Date: Thu, 13 Mar 2008 13:55:45 -0700 From: Klaus Steden <klaus.steden at thomson.net> Subject: Re: [Lustre-discuss] Howto map block devices to Lustre devices? To: Chris Worley <worleys at gmail.com>, lustre-discuss <lustre-discuss at lists.lustre.org> Message-ID: <C3FEE2E1.59E7%klaus.steden at thomson.net> Content-Type: text/plain; charset="US-ASCII" Hi Chris, Don''t your Lustre volumes have a label on them? On the one cluster I''ve got, the physical storage is shared with a number of other systems, so the device information can change over time ... so I use device labels in my /etc/fstab and friends. Something like ''lustre-OST0000'', ''lustre-OST00001'' ... although when the devices are actually mounted, they show up with their /dev node names. Look through /proc/fs/lustre for Lustre volume names (they show up when they''re mounted), and you can winnow your list down by mounting by name, checking the device ID, and removing it that way. If you have a lot of devices on the same bus, it will likely take a bit for the right one to be found, but it''s there. hth, Klaus On 3/13/08 12:50 PM, "Chris Worley" <worleys at gmail.com>did etch on stone tablets:> I''m trying to deactivate some OST''s, but to find them I''ve been > searching through /var/log/messages, as in: > > # ssh io2 grep -e sde -e sdf -e sdj -e sdk -e sdd /var/log/messages"*" > | grep Server > /var/log/messages:Mar 10 13:27:54 io2 kernel: Lustre: Server > ddnlfs-OST0035 on device /dev/sdf has started > /var/log/messages.1:Mar 4 16:02:13 io2 kernel: Lustre: Server > ddnlfs-OST0030 on device /dev/sdf has started > /var/log/messages.1:Mar 6 14:34:44 io2 kernel: Lustre: Server > ddnlfs-OST002e on device /dev/sdd has started > /var/log/messages.1:Mar 6 14:34:55 io2 kernel: Lustre: Server > ddnlfs-OST002f on device /dev/sde has started > /var/log/messages.1:Mar 6 14:35:16 io2 kernel: Lustre: Server > ddnlfs-OST0030 on device /dev/sdf has started > /var/log/messages.1:Mar 6 15:20:48 io2 kernel: Lustre: Server > ddnlfs-OST002f on device /dev/sde has started > /var/log/messages.1:Mar 6 16:08:38 io2 kernel: Lustre: Server > ddnlfs-OST002e on device /dev/sdd has started > /var/log/messages.1:Mar 6 16:08:43 io2 kernel: Lustre: Server > ddnlfs-OST0030 on device /dev/sdf has started > /var/log/messages.1:Mar 6 16:08:53 io2 kernel: Lustre: Server > ddnlfs-OST0034 on device /dev/sdj has started > > Note that there isn''t an entry for sdk (probably rotated out), and sdf > has two different names. > > Is there a better way to find the right Lustre device name map to > Linux block device? > > I''m trying to cull-out slow disks. I''m hoping that just by > "deactivating" the device in lctl, it''ll quit using it, and that''s the > best way to get rid of a slow drive... correct? > > Thanks, > > Chris > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss------------------------------ _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss End of Lustre-discuss Digest, Vol 26, Issue 36 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080314/b0c65057/attachment-0002.html