thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre IOkit newbie: sgpdd-survey [Jul 2008]

If this information is useful, please help other people find it:
Share via:

Ms. Megan Larko

2008-Jul-22 19:03 UTC

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

Hi,

I am a neophyte trying to benchmark new hardware for a Lustre setup.
I got the Lustre IOkit noarch rpm and installed it onto the OSS.  I
edited the file /usr/bin/sgpdd-survey (attached) to meet the
description of our OSS (RAM is 16Gb, sample new hardware detected at
/dev/sdg).

My initial running of sgpdd-survey produced the following errors:
[root at oss4 ~]# sgpdd-survey
Can''t find SG device for /dev/sdg1, testing for partition
/usr/bin/sgpdd-survey: line 105: CAPACITY
trying
0x200: syntax error in expression (error token is "trying
0x200")
/usr/bin/sgpdd-survey: line 106: [: ==: unary operator expected
Tue Jul 22 14:46:48 EDT 2008 sgpdd-survey on /dev/sdg1 from oss4.crew.local
/usr/bin/sgpdd-survey: line 134: rsz*1024/bs: division by 0 (error token is
"s")

System Info:
[root at oss4 ~]# uname -a
Linux oss4.crew.local 2.6.18-53.1.13.el5_lustre.1.6.4.3smp #1 SMP Sun
Feb 17 08:38:44 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
[root at oss4 ~]# cat /etc/redhat-release
CentOS release 5 (Final)

The physical device is there.  I did format it into two 7Tb partitions
just to give it an earlier test.  Do I have to un-format it for
sgpdd-survey?

[root at oss4 ~]# parted /dev/sdg
GNU Parted 1.8.1
Using /dev/sdg
Welcome to GNU Parted! Type ''help'' to view a list of commands.
(parted) print

Model: LSI MegaRAID 8888ELP (scsi)
Disk /dev/sdg: 14.0TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name  Flags
 1      17.4kB  7000GB  7000GB  ext3         sdg1
 2      7000GB  14.0TB  6986GB  ext3         sdg2

(parted) q
Information: Don''t forget to update /etc/fstab, if necessary.

Any advice for testing a new system with sgpdd-survey?

Advice appreciated!
megan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OSS4.sgpdd-survey.sh
Type: application/x-sh
Size: 6081 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080722/942035ed/attachment-0001.sh

Aaron Knister

2008-Jul-24 15:56 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

I think it wants a raw device, not a partition. Also, I noticed you''re
using
two 7tb partitions on a single 14tb lun. You''re going to really hurt
your
performance by doing it that way. Just my $.02.

On Tue, Jul 22, 2008 at 3:03 PM, Ms. Megan Larko <dobsonunit at gmail.com>
wrote:
> Hi,
>
> I am a neophyte trying to benchmark new hardware for a Lustre setup.
> I got the Lustre IOkit noarch rpm and installed it onto the OSS.  I
> edited the file /usr/bin/sgpdd-survey (attached) to meet the
> description of our OSS (RAM is 16Gb, sample new hardware detected at
> /dev/sdg).
>
> My initial running of sgpdd-survey produced the following errors:
> [root at oss4 ~]# sgpdd-survey
> Can''t find SG device for /dev/sdg1, testing for partition
> /usr/bin/sgpdd-survey: line 105: CAPACITY
> trying
> 0x200: syntax error in expression (error token is "trying
> 0x200")
> /usr/bin/sgpdd-survey: line 106: [: ==: unary operator expected
> Tue Jul 22 14:46:48 EDT 2008 sgpdd-survey on /dev/sdg1 from oss4.crew.local
> /usr/bin/sgpdd-survey: line 134: rsz*1024/bs: division by 0 (error token is
> "s")
>
> System Info:
> [root at oss4 ~]# uname -a
> Linux oss4.crew.local 2.6.18-53.1.13.el5_lustre.1.6.4.3smp #1 SMP Sun
> Feb 17 08:38:44 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> [root at oss4 ~]# cat /etc/redhat-release
> CentOS release 5 (Final)
>
> The physical device is there.  I did format it into two 7Tb partitions
> just to give it an earlier test.  Do I have to un-format it for
> sgpdd-survey?
>
> [root at oss4 ~]# parted /dev/sdg
> GNU Parted 1.8.1
> Using /dev/sdg
> Welcome to GNU Parted! Type ''help'' to view a list of
commands.
> (parted) print
>
> Model: LSI MegaRAID 8888ELP (scsi)
> Disk /dev/sdg: 14.0TB
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
>
> Number  Start   End     Size    File system  Name  Flags
>  1      17.4kB  7000GB  7000GB  ext3         sdg1
>  2      7000GB  14.0TB  6986GB  ext3         sdg2
>
> (parted) q
> Information: Don''t forget to update /etc/fstab, if necessary.
>
> Any advice for testing a new system with sgpdd-survey?
>
> Advice appreciated!
> megan
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080724/15208c09/attachment.html

Brian J. Murrell

2008-Jul-24 16:10 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

On Tue, 2008-07-22 at 15:03 -0400, Ms. Megan Larko
wrote:> Hi,
Hi,
> [root at oss4 ~]# sgpdd-survey
> Can''t find SG device for /dev/sdg1, testing for partition
Did you load the sg module?

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080724/457fd65a/attachment.bin

Ms. Megan Larko

2008-Jul-24 18:44 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

> On Tue, Jul 22, 2008 at 3:03 PM, Ms. Megan Larko <dobsonunit at
gmail.com>
> wrote:
I received info for a fix.   Comments in-line.   
(megan)>>
>> Hi,
>>
>> I am a neophyte trying to benchmark new hardware for a Lustre setup.
>> I got the Lustre IOkit noarch rpm and installed it onto the OSS.  I
>> edited the file /usr/bin/sgpdd-survey (attached) to meet the
>> description of our OSS (RAM is 16Gb, sample new hardware detected at
>> /dev/sdg).
>>
>> My initial running of sgpdd-survey produced the following errors:
>> [root at oss4 ~]# sgpdd-survey
>> Can''t find SG device for /dev/sdg1, testing for partition
>> /usr/bin/sgpdd-survey: line 105: CAPACITY
>> trying
>> 0x200: syntax error in expression (error token is "trying
>> 0x200")
>> /usr/bin/sgpdd-survey: line 106: [: ==: unary operator expected
>> Tue Jul 22 14:46:48 EDT 2008 sgpdd-survey on /dev/sdg1 from
>> oss4.crew.local
>> /usr/bin/sgpdd-survey: line 134: rsz*1024/bs: division by 0 (error
token
>> is "s")
The fix to run /usr/bin/sgpdd-survey was to add an argument to the two
instances of sg_readcap that appear in the script.
In /usr/bin/sgpdd-survey:
ORIG Ex:  bs=$((`sg_readcap -b  ${devs[0]} | awk ''{print
$2}''`))
NEW  Ex:  bs=$((`sg_readcap -b -16 ${devs[0]} | awk ''{print
$2}''`))

This can be tested on the command line:
[root at oss4 ~]# sg_readcap -b -16 /dev/sg5

I have not yet gotten any further with my benchmark testing.  (Other
fires to be extinguished today.)

Thanks to Cliff White and Atul Vidwansa for the solution information.

Later,
megan>>
>> System Info:
>> [root at oss4 ~]# uname -a
>> Linux oss4.crew.local 2.6.18-53.1.13.el5_lustre.1.6.4.3smp #1 SMP Sun
>> Feb 17 08:38:44 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>> [root at oss4 ~]# cat /etc/redhat-release
>> CentOS release 5 (Final)
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>

Andreas Dilger

2008-Jul-24 18:47 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

On Jul 24, 2008  11:56 -0400, Aaron Knister wrote:> I think it wants a raw device, not a partition. Also, I noticed
you''re using
> two 7tb partitions on a single 14tb lun. You''re going to really
hurt your
> performance by doing it that way. Just my $.02.
In fact, Lustre filesystems larger than 8TB are known to have problems
and should not be used.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Andreas Dilger

2008-Jul-24 19:45 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

On Jul 24, 2008  14:52 -0400, Ms. Megan Larko wrote:> Okay.   So I have a JBOD with 16 1Tb Hitachi Ultrastar sATA drives in
> it connected to the OSS via an LSI 8888ELP controller card (with
> Battery Back-up).  The JBOD is passed to the OSS as raw space.  My
> partitions cannot exceed 8Tb, but splitting into two 7Tb will hurt
> performance....
The ideal layout would be to have two RAID-5 arrays with 8 data + 1 parity
disks using 64kB or 128kB RAID chunk size, but you are short two disks...

You may also consider using RAID-6 with 6 data + 2 parity disks.  Having
the RAID stripe width be 15+1 is probably quite bad because a single
1MB RPC will always need to recalculate the parity for that IO.
> What do I do for Lustre set-up?  I thought the fewer partitions the
> better because one does has less "overhead" space.  Do I put them
out
> as 16 single Tb partitions?????   That seems like extra work for a
> file system to track.
Please see section 10.1 in the Lustre manual for more tips:
http://manual.lustre.org/manual/LustreManual16_HTML/RAID.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Ms. Megan Larko

2008-Jul-29 19:38 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

Hello,

I''m back to working on benchmarking the hardware prior to lustre fs
installation.

I have used the LSI 8888ELP WebBIOS utility to destroy the previous
RAID configuration on one of our 16-bay JBOD units.   A new set of
RAID arrays were set up for benchmarking purposes.   One RAID6 with
parity disks,   one RAID5 with parity disk and one single spindle
passed directly to the OSS.  The boot of CentOS 5
(2.6.18-53.1.13.el5_lustre.1.6.4.3smp) detects these new block devices
as /dev/sdh, /dev/sdi and /dev/sdj respectively.  The "sg_map" command
detects the raw units.   The lustre-IOkit is still unable to find
them.   Themodule "sg" is indeed loaded as is the megaraid_sas for the
LSI 8888ELP card.  Can lustre-IOkit actually benchmark anything on an
LSI 8888ELP card?  Should I use another tool like IOzone?

Thanks,
megan
>From dmesg:sd 2:2:1:0: Attached scsi generic sg12 type 0
  Vendor: LSI       Model: MegaRAID 8888ELP  Rev: 1.20
  Type:   Direct-Access                      ANSI SCSI revision: 05
sdg : very big device. try to use READ CAPACITY(16).
SCSI device sdg: 11707023360 512-byte hdwr sectors (5993996 MB)
sdg: Write Protect is off
sdg: Mode Sense: 1f 00 00 08
SCSI device sdg: drive cache: write back, no read (daft)
 sdg: unknown partition table
sd 2:2:2:0: Attached scsi disk sdg

sd 2:2:2:0: Attached scsi generic sg13 type 0
  Vendor: LSI       Model: MegaRAID 8888ELP  Rev: 1.20
  Type:   Direct-Access                      ANSI SCSI revision: 05
sdh : very big device. try to use READ CAPACITY(16).
SCSI device sdh: 11707023360 512-byte hdwr sectors (5993996 MB)
sdh: Write Protect is off
sdh: Mode Sense: 1f 00 00 08
SCSI device sdh: drive cache: write back, no read (daft)
 sdh: unknown partition table
sd 2:2:3:0: Attached scsi disk sdh

sd 2:2:3:0: Attached scsi generic sg14 type 0
  Vendor: LSI       Model: MegaRAID 8888ELP  Rev: 1.20
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdi: 1951170560 512-byte hdwr sectors (998999 MB)
sdi: Write Protect is off
sdi: Mode Sense: 1f 00 00 08
SCSI device sdi: drive cache: write back, no read (daft)
SCSI device sdi: 1951170560 512-byte hdwr sectors (998999 MB)
sdi: Write Protect is off
sdi: Mode Sense: 1f 00 00 08
SCSI device sdi: drive cache: write back, no read (daft)
 sdi: unknown partition table
sd 2:2:4:0: Attached scsi disk sdi
sd 2:2:4:0: Attached scsi generic sg15 type 0

[root at oss4 ~]# sg_map
/dev/sg0  /dev/sda
/dev/sg1  /dev/scd0
/dev/sg2
/dev/sg3
/dev/sg4
/dev/sg5  /dev/sdb
/dev/sg6  /dev/sdc
/dev/sg7  /dev/sdd
/dev/sg8
/dev/sg9
/dev/sg10
/dev/sg11  /dev/sde
/dev/sg12  /dev/sdf
/dev/sg13  /dev/sdg
/dev/sg14  /dev/sdh
/dev/sg15  /dev/sdi
/dev/sg16  /dev/sdj
/dev/sg17  /dev/sdk

Lustre-IOkit sgpdd-survey:
scsidevs=/dev/sg16

Result of sgpdd-survey:
[root at oss4 log]# sgpdd-survey
Can''t find SG device for /dev/sg16, testing for partition
Can''t find SG device /dev/sg1.
Do you have the sg module configured for your kernel?
[root at oss4 log]# lsmod | grep sg
sg                     70056  0
scsi_mod              187192  11
ib_iser,libiscsi,scsi_transport_iscsi,ib_srp,sr_mod,libata,megaraid_sas,sg,3w_9xxx,usb_storage,sd_mod

This is still using the "-16" option to each of the two  sg_readcap 
calls.
sg_readcap -b -16

********************************************************************
On Thu, Jul 24, 2008 at 3:45 PM, Andreas Dilger <adilger at sun.com>
wrote:> On Jul 24, 2008  14:52 -0400, Ms. Megan Larko wrote:
>> Okay.   So I have a JBOD with 16 1Tb Hitachi Ultrastar sATA drives in
>> it connected to the OSS via an LSI 8888ELP controller card (with
>> Battery Back-up).  The JBOD is passed to the OSS as raw space.  My
>> partitions cannot exceed 8Tb, but splitting into two 7Tb will hurt
>> performance....
>
> The ideal layout would be to have two RAID-5 arrays with 8 data + 1 parity
> disks using 64kB or 128kB RAID chunk size, but you are short two disks...
>
> You may also consider using RAID-6 with 6 data + 2 parity disks.  Having
> the RAID stripe width be 15+1 is probably quite bad because a single
> 1MB RPC will always need to recalculate the parity for that IO.
>
>> What do I do for Lustre set-up?  I thought the fewer partitions the
>> better because one does has less "overhead" space.  Do I put
them out
>> as 16 single Tb partitions?????   That seems like extra work for a
>> file system to track.
>
> Please see section 10.1 in the Lustre manual for more tips:
> http://manual.lustre.org/manual/LustreManual16_HTML/RAID.html
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>

Ms. Megan Larko

2008-Jul-29 19:43 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

Hi,

Additional info.

If I use "scsidevs=/dev/sdj" in /usr/bin/sgpdd-survey in place of the
/dev/sg16 I receive the following result:
Tue Jul 29 15:40:47 EDT 2008 sgpdd-survey on /dev/sdj from oss4.crew.local
total_size 17487872K rsz 1024 crg     1 thr     4 write 1 failed read 1 failed
total_size 17487872K rsz 1024 crg     1 thr     8 write 1 failed read 1 failed
total_size 17487872K rsz 1024 crg     1 thr    16 write 1 failed read 1 failed
total_size 17487872K rsz 1024 crg     2 thr     4 write 2 failed read 2 failed
total_size 17487872K rsz 1024 crg     2 thr     8 write 2 failed read 2 failed
total_size 17487872K rsz 1024 crg     2 thr    16 write 2 failed read 2 failed
total_size 17487872K rsz 1024 crg     2 thr    32 write 2 failed read 2 failed
total_size 17487872K rsz 1024 crg     4 thr     4 write 4 failed read 4 failed
total_size 17487872K rsz 1024 crg     4 thr     8 write 4 failed read 4 failed
total_size 17487872K rsz 1024 crg     4 thr    16 write 4 failed read 4 failed
total_size 17487872K rsz 1024 crg     4 thr    32 write 4 failed read 4 failed
total_size 17487872K rsz 1024 crg     4 thr    64 write 4 failed read 4 failed
total_size 17487872K rsz 1024 crg     8 thr     8 write 8 failed read 8 failed
total_size 17487872K rsz 1024 crg     8 thr    16 write 8 failed read 8 failed
total_size 17487872K rsz 1024 crg     8 thr    32 write 8 failed read 8 failed
total_size 17487872K rsz 1024 crg     8 thr    64 write 8 failed read 8 failed
total_size 17487872K rsz 1024 crg    16 thr    16 write 16 failed read 16 failed
total_size 17487872K rsz 1024 crg    16 thr    32 write 16 failed read 16 failed
total_size 17487872K rsz 1024 crg    16 thr    64 write 16 failed read 16 failed
total_size 17487872K rsz 1024 crg    32 thr    32 write 32 failed read 32 failed
total_size 17487872K rsz 1024 crg    32 thr    64 write 32 failed read 32 failed
total_size 17487872K rsz 1024 crg    64 thr    64 write 64 failed read 64 failed

All writes and reads fail but it indicates that it found the device....

megan

On Tue, Jul 29, 2008 at 3:38 PM, Ms. Megan Larko <dobsonunit at gmail.com>
wrote:> Hello,
>
> I''m back to working on benchmarking the hardware prior to lustre
fs
> installation.
>
> I have used the LSI 8888ELP WebBIOS utility to destroy the previous
> RAID configuration on one of our 16-bay JBOD units.   A new set of
> RAID arrays were set up for benchmarking purposes.   One RAID6 with
> parity disks,   one RAID5 with parity disk and one single spindle
> passed directly to the OSS.  The boot of CentOS 5
> (2.6.18-53.1.13.el5_lustre.1.6.4.3smp) detects these new block devices
> as /dev/sdh, /dev/sdi and /dev/sdj respectively.  The "sg_map"
command
> detects the raw units.   The lustre-IOkit is still unable to find
> them.   Themodule "sg" is indeed loaded as is the megaraid_sas
for the
> LSI 8888ELP card.  Can lustre-IOkit actually benchmark anything on an
> LSI 8888ELP card?  Should I use another tool like IOzone?
>
> Thanks,
> megan
>
> From dmesg:
> sd 2:2:1:0: Attached scsi generic sg12 type 0
>  Vendor: LSI       Model: MegaRAID 8888ELP  Rev: 1.20
>  Type:   Direct-Access                      ANSI SCSI revision: 05
> sdg : very big device. try to use READ CAPACITY(16).
> SCSI device sdg: 11707023360 512-byte hdwr sectors (5993996 MB)
> sdg: Write Protect is off
> sdg: Mode Sense: 1f 00 00 08
> SCSI device sdg: drive cache: write back, no read (daft)
>  sdg: unknown partition table
> sd 2:2:2:0: Attached scsi disk sdg
>
> sd 2:2:2:0: Attached scsi generic sg13 type 0
>  Vendor: LSI       Model: MegaRAID 8888ELP  Rev: 1.20
>  Type:   Direct-Access                      ANSI SCSI revision: 05
> sdh : very big device. try to use READ CAPACITY(16).
> SCSI device sdh: 11707023360 512-byte hdwr sectors (5993996 MB)
> sdh: Write Protect is off
> sdh: Mode Sense: 1f 00 00 08
> SCSI device sdh: drive cache: write back, no read (daft)
>  sdh: unknown partition table
> sd 2:2:3:0: Attached scsi disk sdh
>
> sd 2:2:3:0: Attached scsi generic sg14 type 0
>  Vendor: LSI       Model: MegaRAID 8888ELP  Rev: 1.20
>  Type:   Direct-Access                      ANSI SCSI revision: 05
> SCSI device sdi: 1951170560 512-byte hdwr sectors (998999 MB)
> sdi: Write Protect is off
> sdi: Mode Sense: 1f 00 00 08
> SCSI device sdi: drive cache: write back, no read (daft)
> SCSI device sdi: 1951170560 512-byte hdwr sectors (998999 MB)
> sdi: Write Protect is off
> sdi: Mode Sense: 1f 00 00 08
> SCSI device sdi: drive cache: write back, no read (daft)
>  sdi: unknown partition table
> sd 2:2:4:0: Attached scsi disk sdi
> sd 2:2:4:0: Attached scsi generic sg15 type 0
>
> [root at oss4 ~]# sg_map
> /dev/sg0  /dev/sda
> /dev/sg1  /dev/scd0
> /dev/sg2
> /dev/sg3
> /dev/sg4
> /dev/sg5  /dev/sdb
> /dev/sg6  /dev/sdc
> /dev/sg7  /dev/sdd
> /dev/sg8
> /dev/sg9
> /dev/sg10
> /dev/sg11  /dev/sde
> /dev/sg12  /dev/sdf
> /dev/sg13  /dev/sdg
> /dev/sg14  /dev/sdh
> /dev/sg15  /dev/sdi
> /dev/sg16  /dev/sdj
> /dev/sg17  /dev/sdk
>
> Lustre-IOkit sgpdd-survey:
> scsidevs=/dev/sg16
>
> Result of sgpdd-survey:
> [root at oss4 log]# sgpdd-survey
> Can''t find SG device for /dev/sg16, testing for partition
> Can''t find SG device /dev/sg1.
> Do you have the sg module configured for your kernel?
> [root at oss4 log]# lsmod | grep sg
> sg                     70056  0
> scsi_mod              187192  11
>
ib_iser,libiscsi,scsi_transport_iscsi,ib_srp,sr_mod,libata,megaraid_sas,sg,3w_9xxx,usb_storage,sd_mod
>
> This is still using the "-16" option to each of the two 
sg_readcap  calls.
> sg_readcap -b -16
>
> ********************************************************************
> On Thu, Jul 24, 2008 at 3:45 PM, Andreas Dilger <adilger at sun.com>
wrote:
>> On Jul 24, 2008  14:52 -0400, Ms. Megan Larko wrote:
>>> Okay.   So I have a JBOD with 16 1Tb Hitachi Ultrastar sATA drives
in
>>> it connected to the OSS via an LSI 8888ELP controller card (with
>>> Battery Back-up).  The JBOD is passed to the OSS as raw space.  My
>>> partitions cannot exceed 8Tb, but splitting into two 7Tb will hurt
>>> performance....
>>
>> The ideal layout would be to have two RAID-5 arrays with 8 data + 1
parity
>> disks using 64kB or 128kB RAID chunk size, but you are short two
disks...
>>
>> You may also consider using RAID-6 with 6 data + 2 parity disks. 
Having
>> the RAID stripe width be 15+1 is probably quite bad because a single
>> 1MB RPC will always need to recalculate the parity for that IO.
>>
>>> What do I do for Lustre set-up?  I thought the fewer partitions the
>>> better because one does has less "overhead" space.  Do I
put them out
>>> as 16 single Tb partitions?????   That seems like extra work for a
>>> file system to track.
>>
>> Please see section 10.1 in the Lustre manual for more tips:
>> http://manual.lustre.org/manual/LustreManual16_HTML/RAID.html
>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Sr. Staff Engineer, Lustre Group
>> Sun Microsystems of Canada, Inc.
>>
>>
>

Brian J. Murrell

2008-Jul-30 13:49 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

On Tue, 2008-07-29 at 15:38 -0400, Ms. Megan Larko
wrote:> Hello,
Hi,
> Lustre-IOkit sgpdd-survey:
> scsidevs=/dev/sg16
Per your followup message, you specify the block devices you want to
test, not their SG mapped device.  sgpdd_survey will figure out what
devices they are mapped to automatically.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080730/a33e1e33/attachment-0001.bin

Brian J. Murrell

2008-Jul-30 14:12 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

On Tue, 2008-07-29 at 15:43 -0400, Ms. Megan Larko
wrote:> Hi,
> 
> Additional info.
> 
> If I use "scsidevs=/dev/sdj" in /usr/bin/sgpdd-survey in place of
the
> /dev/sg16
Yes, this is the correct syntax.
> I receive the following result:
> Tue Jul 29 15:40:47 EDT 2008 sgpdd-survey on /dev/sdj from oss4.crew.local
> total_size 17487872K rsz 1024 crg     1 thr     4 write 1 failed read 1
failed
> total_size 17487872K rsz 1024 crg     1 thr     8 write 1 failed read 1
failed
> total_size 17487872K rsz 1024 crg     1 thr    16 write 1 failed read 1
failed
> total_size 17487872K rsz 1024 crg     2 thr     4 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     2 thr     8 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     2 thr    16 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     2 thr    32 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     4 thr     4 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr     8 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr    16 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr    32 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr    64 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     8 thr     8 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg     8 thr    16 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg     8 thr    32 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg     8 thr    64 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg    16 thr    16 write 16 failed read 16
failed
> total_size 17487872K rsz 1024 crg    16 thr    32 write 16 failed read 16
failed
> total_size 17487872K rsz 1024 crg    16 thr    64 write 16 failed read 16
failed
> total_size 17487872K rsz 1024 crg    32 thr    32 write 32 failed read 32
failed
> total_size 17487872K rsz 1024 crg    32 thr    64 write 32 failed read 32
failed
> total_size 17487872K rsz 1024 crg    64 thr    64 write 64 failed read 64
failed
> 
> All writes and reads fail but it indicates that it found the device....
Indeed.  So the question is, why are the reads and writes failing.

Do you have any files in /tmp named:

/tmp/sgpdd_survey_$(date)_$(uname -n).detail

If so, can you paste one here?

Alternatively you can try using sgp_dd to read a device.  The following
should work:

# sgp_dd /dev/sg16 /dev/null count=10 bs=512 time=1

and paste the result here.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080730/e1e625a3/attachment.bin

Ms. Megan Larko

2008-Jul-31 18:12 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

>From megan:Comments in-line.

== 2 of 2 =Date: Wed, Jul 30 2008 7:12 am
From: "Brian J. Murrell"

On Tue, 2008-07-29 at 15:43 -0400, Ms. Megan Larko
wrote:> Hi,
>
> Additional info.
>
> If I use "scsidevs=/dev/sdj" in /usr/bin/sgpdd-survey in place of
the
> /dev/sg16
Yes, this is the correct syntax.
> I receive the following result:
> Tue Jul 29 15:40:47 EDT 2008 sgpdd-survey on /dev/sdj from oss4.crew.local
> total_size 17487872K rsz 1024 crg     1 thr     4 write 1 failed read 1
failed
> total_size 17487872K rsz 1024 crg     1 thr     8 write 1 failed read 1
failed
> total_size 17487872K rsz 1024 crg     1 thr    16 write 1 failed read 1
failed
> total_size 17487872K rsz 1024 crg     2 thr     4 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     2 thr     8 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     2 thr    16 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     2 thr    32 write 2 failed read 2
failed
> total_size 17487872K rsz 1024 crg     4 thr     4 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr     8 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr    16 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr    32 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     4 thr    64 write 4 failed read 4
failed
> total_size 17487872K rsz 1024 crg     8 thr     8 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg     8 thr    16 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg     8 thr    32 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg     8 thr    64 write 8 failed read 8
failed
> total_size 17487872K rsz 1024 crg    16 thr    16 write 16 failed read 16
failed
> total_size 17487872K rsz 1024 crg    16 thr    32 write 16 failed read 16
failed
> total_size 17487872K rsz 1024 crg    16 thr    64 write 16 failed read 16
failed
> total_size 17487872K rsz 1024 crg    32 thr    32 write 32 failed read 32
failed
> total_size 17487872K rsz 1024 crg    32 thr    64 write 32 failed read 32
failed
> total_size 17487872K rsz 1024 crg    64 thr    64 write 64 failed read 64
failed
>
> All writes and reads fail but it indicates that it found the device....
Indeed.  So the question is, why are the reads and writes failing.

Do you have any files in /tmp named:

/tmp/sgpdd_survey_$(date)_$(uname -n).detail

If so, can you paste one here?

megan:  I am attaching the file from
/tmp/sgpdd_survey_2008-07-29 at 15:40_oss4.crew.local.detail
The complaint seems to be that the memory cannot be accessed.

Alternatively you can try using sgp_dd to read a device.  The following
should work:

# sgp_dd /dev/sg16 /dev/null count=10 bs=512 time=1

and paste the result here.

megan:  Pasting result--
[root at oss4 ~]# sgp_dd of=/dev/sg16 if=/dev/null count=10 bs=512 time=1
time to transfer data was 0.000121 secs
  remaining block count=10
0+0 records in
0+0 records out

Note that a "cat /proc/meminfo" shows 16Gb RAM on the machine oss4.
[root at oss4 ~]# cat /proc/meminfo
MemTotal:     16439328 kB
MemFree:      16101332 kB
Buffers:         32260 kB
Cached:         205820 kB
  ---snip---

BTW I am running iozone  v. 3.283 on the OS drive, a RAID6 JBOD disk
formatted ext3 and one of our existing Lustre disks and the lustre
system is doing well under iozone.

Thanks,
megan

b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sgpdd_survey_2008-07-29 at 15:40_oss4.crew.local.detail
Type: application/octet-stream
Size: 33332 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080731/a61eb44d/attachment-0001.obj

Brian J. Murrell

2008-Aug-01 13:48 UTC

head link

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

On Thu, 2008-07-31 at 14:12 -0400, Ms. Megan Larko
wrote:> 
> megan:  I am attaching the file from
> /tmp/sgpdd_survey_2008-07-29 at 15:40_oss4.crew.local.detail
> The complaint seems to be that the memory cannot be accessed.
Allocated, not accessed:

==============> total_size 17487872K rsz 1024 crg     1 thr     4 
=====> write
sg starting out command at "sgp_dd.c":843: Cannot allocate memory
=====> read
sg starting in command at "sgp_dd.c":784: Cannot allocate memory
==============> total_size 17487872K rsz 1024 crg     1 thr     8 
=====> write
sg starting out command at "sgp_dd.c":843: Cannot allocate memory
=====> read
sg starting in command at "sgp_dd.c":784: Cannot allocate memory
==============> total_size 17487872K rsz 1024 crg     1 thr    16 
=====> write
sg starting out command at "sgp_dd.c":843: Cannot allocate memory
=====> read
sg starting in command at "sgp_dd.c":784: Cannot allocate memory
==============> total_size 17487872K rsz 1024 crg     2 thr     4 
=====> write
sg starting out command at "sgp_dd.c":843: Cannot allocate memory
sg starting out command at "sgp_dd.c":843: Cannot allocate memory
=====> read
sg starting in command at "sgp_dd.c":784: Cannot allocate memory
sg starting in command at "sgp_dd.c":784: Cannot allocate memory

So for whatever reason sgp_dd can''t allocate memory.
> megan:  Pasting result--
> [root at oss4 ~]# sgp_dd of=/dev/sg16 if=/dev/null count=10 bs=512 time=1
> time to transfer data was 0.000121 secs
>   remaining block count=10
> 0+0 records in
> 0+0 records out
Hrm.  The result doesn''t look right.
> Note that a "cat /proc/meminfo" shows 16Gb RAM on the machine
oss4.
> [root at oss4 ~]# cat /proc/meminfo
> MemTotal:     16439328 kB
> MemFree:      16101332 kB
> Buffers:         32260 kB
> Cached:         205820 kB
I don''t really know why you''d be getting those errors then. 
Buggy
version of sgp_dd maybe?  Buggy something else?
> BTW I am running iozone  v. 3.283 on the OS drive, a RAID6 JBOD disk
> formatted ext3 and one of our existing Lustre disks and the lustre
> system is doing well under iozone.
Good.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080801/87f58d6a/attachment.bin

Lustre discuss - Jul 2008 - Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey

[Lustre-discuss] Lustre IOkit newbie: sgpdd-survey