thr3ads.net - Lustre discuss - [Lustre-discuss] Plateau around 200MiB/s bond0 [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Arden Wiebe

2009-Jan-25 02:04 UTC

[Lustre-discuss] Plateau around 200MiB/s bond0

1-2948-SFP Plus Baseline 3Com Switch
1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
1-CLIENT bond0(eth0,eth1)
1-CLIENT eth0
1-CLIENT eth0

I fail so far creating external journal for MDT, MGS and OSSx2.? How to add the
external journal to /etc/fstab specifically the output of e2label /dev/sdb
followed by what options for fstab?

[root at lustreone ~]# cat /proc/fs/lustre/devices
? 0 UP mgs MGS MGS 17
? 1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
? 2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4
? 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
? 4 UP osc ioio-OST0000-osc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
? 5 UP osc ioio-OST0001-osc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
[root at lustreone ~]# lfs df -h
UUID???????????????????? bytes????? Used Available? Use% Mounted on
ioio-MDT0000_UUID?????? 815.0G??? 534.0M??? 767.9G??? 0% /mnt/ioio[MDT:0]
ioio-OST0000_UUID???????? 3.6T???? 28.4G????? 3.4T??? 0% /mnt/ioio[OST:0]
ioio-OST0001_UUID???????? 3.6T???? 18.0G????? 3.4T??? 0% /mnt/ioio[OST:1]

filesystem summary:?????? 7.2T???? 46.4G????? 6.8T??? 0% /mnt/ioio

[root at lustreone ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
??????? Aggregator ID: 1
??????? Number of ports: 1
??????? Actor Key: 17
??????? Partner Key: 1
??????? Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:db
Aggregator ID: 1

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:6c
Aggregator ID: 2

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:94
Aggregator ID: 3

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:93
Aggregator ID: 4

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:95
Aggregator ID: 5

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:96
Aggregator ID: 6
[root at lustreone ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb[0] sdc[1]
????? 976762496 blocks [2/2] [UU]

unused devices: <none>
[root at lustreone ~]# cat /etc/fstab
LABEL=/???????????????? /?????????????????????? ext3??? defaults??????? 1 1
tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0
devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0 0
sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0 0
proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0
LABEL=MGS?????????????? /mnt/mgs??????????????? lustre? defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio? /mnt/ioio?????????????? lustre?
defaults,_netdev,noauto 0 0

[root at lustreone ~]# ifconfig
bond0???? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB
????????? inet addr:192.168.0.7? Bcast:192.168.0.255? Mask:255.255.255.0
????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
????????? UP BROADCAST RUNNING MASTER MULTICAST? MTU:9000? Metric:1
????????? RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
????????? TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
????????? collisions:0 txqueuelen:0
????????? RX bytes:12376680079 (11.5 GiB)? TX bytes:34438742885 (32.0 GiB)

eth0????? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB
????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
????????? UP BROADCAST RUNNING SLAVE MULTICAST? MTU:9000? Metric:1
????????? RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
????????? TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
????????? collisions:0 txqueuelen:1000
????????? RX bytes:12290700380 (11.4 GiB)? TX bytes:34438581771 (32.0 GiB)
????????? Base address:0xec00 Memory:febe0000-fec00000
>From what I have read not having an external journal configured for the
OST''s is a sure recipie for slowness which I would rather not have
considering the goal is around 350MiB/s or more which should be obtainable.?
Here is how I formated the raid6 device on both OSS''s that have
identical
[root at lustrefour ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

?? Device Boot????? Start???????? End????? Blocks?? Id? System
/dev/sda1?? *?????????? 1????? 121601?? 976760001?? 83? Linux

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn''t contain a valid partition table

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn''t contain a valid partition table

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdd doesn''t contain a valid partition table

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sde doesn''t contain a valid partition table

Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdf doesn''t contain a valid partition table

Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdg doesn''t contain a valid partition table

Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdh doesn''t contain a valid partition table

Disk /dev/md0: 4000.8 GB, 4000819183616 bytes
2 heads, 4 sectors/track, 976762496 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn''t contain a valid partition table
[root at lustrefour ~]# 

[root at lustrefour ~]#? mdadm --create --assume-clean /dev/md0 --level=6
--chunk=128 --raid-devices=6 /dev/sd[cdefgh]
[root at lustrefour ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
????? 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU]
??????????????? in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192
writes
??????????????? 7712698 in raid5d, 8264 out of stripes, 25661224 handle called
??????????????? reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied
writes: 16115932
??????????????? 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out
??????????????? 0 expanding overlap


unused devices: <none>

Followed with:

[root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168.0.7 at
tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0

[root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1

But that is hard to reassemble on the reboot or at least was before I use
e2label and label things right.? Question how to label the external journal in
fstab if at all?? Right now only running

[root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168.0.7 at
tcp0 --reformat /dev/md0

So just raid6 no external journal.

[root at lustrefour ~]# cat /etc/fstab
LABEL=/???????????????? /?????????????????????? ext3??? defaults??????? 1 1
tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0
devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0 0
sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0 0
proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0
LABEL=ioio-OST0001????? /mnt/ost00????????????? lustre? defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio? /mnt/ioio?????????????? lustre?
defaults,_netdev,noauto 0 0

[root at lustrefour ~]#


[root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst? Read OST indx: 0 speed: 38.789337
Best?? Read OST indx: 1 speed: 40.017201
Read Average: 39.403269 +/- 0.613932 MB/s
Worst? Write OST indx: 0 speed: 49.227064
Best?? Write OST indx: 1 speed: 78.673564
Write Average: 63.950314 +/- 14.723250 MB/s
Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time
----------------------------------------------------
0???? 38.789?????? 49.227??????? 105.596????? 83.206
1???? 40.017?????? 78.674??????? 102.356????? 52.063
[root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst? Read OST indx: 0 speed: 38.559620
Best?? Read OST indx: 1 speed: 40.053787
Read Average: 39.306704 +/- 0.747083 MB/s
Worst? Write OST indx: 0 speed: 71.623744
Best?? Write OST indx: 1 speed: 82.764897
Write Average: 77.194320 +/- 5.570577 MB/s
Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time
----------------------------------------------------
0???? 38.560?????? 71.624??????? 26.556????? 14.297
1???? 40.054?????? 82.765??????? 25.566????? 12.372
[root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576
3536+0 records in
3536+0 records out
3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s

lustreonetwothreefour all have the same for modprobe.conf

[root at lustrefour ~]# cat /etc/modprobe.conf
alias eth0 e1000
alias eth1 e1000
alias scsi_hostadapter pata_marvell
alias scsi_hostadapter1 ata_piix
options lnet networks=tcp
alias eth2 sky2
alias eth3 sky2
alias eth4 sky2
alias eth5 sky2
alias bond0 bonding
options bonding miimon=100 mode=4
[root at lustrefour ~]#?? 

When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and
the send and recieve from the various nodes reaches a 209 MiB/s plateau?? Uggh



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090124/3c4b417b/attachment-0001.html

Arden Wiebe

2009-Jan-25 08:07 UTC

head link

[Lustre-discuss] Plateau around 200MiB/s bond0

So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make 400
MiB/s or this is not how to calculate throughput?? I will eventually plug the
right sequence into iozone to measure it.?
>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png
ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png
--- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote:

From: Arden Wiebe <albert682 at yahoo.com>
Subject: [Lustre-discuss] Plateau around 200MiB/s bond0
To: lustre-discuss at lists.lustre.org
Date: Saturday, January 24, 2009, 6:04 PM

1-2948-SFP Plus Baseline 3Com Switch
1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
1-CLIENT bond0(eth0,eth1)
1-CLIENT eth0
1-CLIENT eth0

I fail so far creating external journal for MDT, MGS and OSSx2.? How to add the
external journal to /etc/fstab specifically the output of e2label /dev/sdb
followed by what options for fstab?

[root at lustreone ~]# cat /proc/fs/lustre/devices
? 0 UP mgs MGS MGS 17
? 1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
? 2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4
? 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
? 4 UP osc
 ioio-OST0000-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5
? 5 UP osc ioio-OST0001-osc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
[root at lustreone ~]# lfs df -h
UUID???????????????????? bytes????? Used Available? Use% Mounted on
ioio-MDT0000_UUID?????? 815.0G??? 534.0M??? 767.9G??? 0% /mnt/ioio[MDT:0]
ioio-OST0000_UUID???????? 3.6T???? 28.4G????? 3.4T??? 0% /mnt/ioio[OST:0]
ioio-OST0001_UUID???????? 3.6T???? 18.0G????? 3.4T??? 0% /mnt/ioio[OST:1]

filesystem summary:??????
 7.2T???? 46.4G????? 6.8T??? 0% /mnt/ioio

[root at lustreone ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
??????? Aggregator ID: 1
??????? Number of ports: 1
??????? Actor Key: 17
??????? Partner Key: 1
??????? Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:db
Aggregator ID: 1

Slave Interface:
 eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:6c
Aggregator ID: 2

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:94
Aggregator ID: 3

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:93
Aggregator ID: 4

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:95
Aggregator ID: 5

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:96
Aggregator ID: 6
[root at lustreone ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb[0] sdc[1]
????? 976762496 blocks [2/2] [UU]

unused devices: <none>
[root at lustreone ~]# cat
 /etc/fstab
LABEL=/???????????????? /?????????????????????? ext3??? defaults??????? 1 1
tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0
devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0
 0
sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0 0
proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0
LABEL=MGS?????????????? /mnt/mgs??????????????? lustre? defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio?
 /mnt/ioio?????????????? lustre? defaults,_netdev,noauto 0 0

[root at lustreone ~]# ifconfig
bond0???? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB
????????? inet addr:192.168.0.7? Bcast:192.168.0.255? Mask:255.255.255.0
????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
????????? UP BROADCAST RUNNING MASTER MULTICAST? MTU:9000? Metric:1
????????? RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
????????? TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
????????? collisions:0
 txqueuelen:0
????????? RX bytes:12376680079 (11.5 GiB)? TX bytes:34438742885 (32.0 GiB)

eth0????? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB
????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
????????? UP BROADCAST RUNNING SLAVE MULTICAST? MTU:9000? Metric:1
????????? RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
????????? TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
????????? collisions:0 txqueuelen:1000
????????? RX bytes:12290700380 (11.4 GiB)? TX bytes:34438581771 (32.0
 GiB)
????????? Base address:0xec00 Memory:febe0000-fec00000
>From what I have read not having an external journal configured for the
OST''s is a sure recipie for slowness which I would rather not have
considering the goal is around 350MiB/s or more which should be obtainable.?
Here is how I formated the raid6 device on both OSS''s that have
identical
[root at lustrefour ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

?? Device Boot????? Start???????? End????? Blocks?? Id? System
/dev/sda1?? *?????????? 1????? 121601?? 976760001?? 83?
 Linux

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn''t contain a valid partition table

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn''t contain a valid partition table

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdd doesn''t contain a valid partition table

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sde doesn''t contain a valid partition table

Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601
 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdf doesn''t contain a valid partition table

Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdg doesn''t contain a valid partition table

Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdh doesn''t contain a valid partition table

Disk /dev/md0: 4000.8 GB, 4000819183616 bytes
2 heads, 4 sectors/track, 976762496 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn''t contain a valid partition table
[root at lustrefour ~]# 

[root at lustrefour ~]#? mdadm --create --assume-clean /dev/md0 --level=6
--chunk=128 --raid-devices=6 /dev/sd[cdefgh]
[root at lustrefour ~]# cat
 /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
????? 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU]
??????????????? in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192
writes
??????????????? 7712698 in raid5d, 8264 out of stripes, 25661224 handle called
??????????????? reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied
writes: 16115932
??????????????? 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out
??????????????? 0
 expanding overlap


unused devices: <none>

Followed with:

[root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168.0.7 at
tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0

[root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1

But that is hard to reassemble on the reboot or at least was before I use
e2label and label things right.? Question how to label the external journal in
fstab if at all?? Right now only running

[root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168.0.7 at
tcp0 --reformat /dev/md0

So just raid6 no external journal.

[root at lustrefour ~]# cat /etc/fstab
LABEL=/???????????????? /?????????????????????? ext3???
 defaults??????? 1 1
tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0
devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0 0
sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0
 0
proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0
LABEL=ioio-OST0001????? /mnt/ost00????????????? lustre? defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio? /mnt/ioio?????????????? lustre?
defaults,_netdev,noauto 0 0

[root at lustrefour ~]#


[root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst? Read OST indx: 0 speed: 38.789337
Best?? Read OST indx: 1 speed:
 40.017201
Read Average: 39.403269 +/- 0.613932 MB/s
Worst? Write OST indx: 0 speed: 49.227064
Best?? Write OST indx: 1 speed: 78.673564
Write Average: 63.950314 +/- 14.723250 MB/s
Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time
----------------------------------------------------
0???? 38.789?????? 49.227??????? 105.596????? 83.206
1???? 40.017?????? 78.674??????? 102.356????? 52.063
[root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst? Read OST indx: 0 speed: 38.559620
Best?? Read OST indx: 1 speed: 40.053787
Read Average:
 39.306704 +/- 0.747083 MB/s
Worst? Write OST indx: 0 speed: 71.623744
Best?? Write OST indx: 1 speed: 82.764897
Write Average: 77.194320 +/- 5.570577 MB/s
Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time
----------------------------------------------------
0???? 38.560?????? 71.624??????? 26.556????? 14.297
1???? 40.054?????? 82.765??????? 25.566????? 12.372
[root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576
3536+0 records in
3536+0 records out
3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s

lustreonetwothreefour all have the same for modprobe.conf

[root at lustrefour ~]# cat /etc/modprobe.conf
alias eth0
 e1000
alias eth1 e1000
alias scsi_hostadapter pata_marvell
alias scsi_hostadapter1 ata_piix
options lnet networks=tcp
alias eth2 sky2
alias eth3 sky2
alias eth4 sky2
alias eth5 sky2
alias bond0 bonding
options bonding miimon=100 mode=4
[root at lustrefour ~]#?? 

When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and
the send and recieve from the various nodes reaches a 209 MiB/s plateau?? Uggh





      
-----Inline Attachment Follows-----

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090125/ecb66e98/attachment-0001.html

Brian J. Murrell

2009-Jan-26 14:59 UTC

head link

[Lustre-discuss] Plateau around 200MiB/s bond0

In general, when writing messages to this list, you need to be more
concise about what you are asking.  I see so much information here, I''m
not sure what is relevant to your few interspersed questions and what is
not.  I will try to answer your specific question...

Also, in the future, please use a simple plain-text format and just copy
and paste for plain-text content.  All of the "quoted-printable"
mime-types are confusing my MUA.

On Sat, 2009-01-24 at 18:04 -0800, Arden Wiebe wrote:> 
> I fail so far creating external journal for MDT, MGS and OSSx2.  How
> to add the external journal to /etc/fstab specifically the output of
> e2label /dev/sdb followed by what options for fstab?
> 
You need to look at the mkfs.ext3 manpage on how to create an external
journal (i.e. -O journal_dev external-journal) and attach an external
journal to an ext3 filesystem (i.e. -J device=external-journal) then
apply those mkfs.ext3 options to your Lustre device with mkfs.lustre''s
--mkfsoptions option.

All of this is covered in the operations manual in section 10.3
"Creating an External Journal".

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090126/58a2b8ae/attachment.bin

Arden Wiebe

2009-Jan-28 07:38 UTC

head link

[Lustre-discuss] Plateau around 200MiB/s bond0

--- On Mon, 1/26/09, Brian J. Murrell <Brian.Murrell at Sun.COM> wrote:

From: Brian J. Murrell <Brian.Murrell at Sun.COM>
Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
To: lustre-discuss at lists.lustre.org
Date: Monday, January 26, 2009, 6:59 AM

In general, when writing messages to this list, you need to be more
concise about what you are asking.? I see so much information here, I''m
not sure what is relevant to your few interspersed questions and what is
not.? I will try to answer your specific question...

My apologies for posting my study hacks to the list.  Thanks Brian for at least
trying to answer questions that I have to learn the answer for myself first
before I know the correct question to ask.

Also, in the future, please use a simple plain-text format and just copy
and paste for plain-text content.? All of the "quoted-printable"
mime-types are confusing my MUA.

No doubt.  Sorry, I''m not good with MTA or MUA in general but
I''ll switch to plain text in the future.

On Sat, 2009-01-24 at 18:04 -0800, Arden Wiebe wrote:> 
> I fail so far creating external journal for MDT, MGS and OSSx2.? How
> to add the external journal to /etc/fstab specifically the output of
> e2label /dev/sdb followed by what options for fstab?
> 
You need to look at the mkfs.ext3 manpage on how to create an external
journal (i.e. -O journal_dev external-journal) and attach an external
journal to an ext3 filesystem (i.e. -J device=external-journal) then
apply those mkfs.ext3 options to your Lustre device with mkfs.lustre''s
--mkfsoptions option.

All of this is covered in the operations manual in section 10.3
"Creating an External Journal".

Been there done that well sort of.  Managed to have every luster filesystem with
external journals some even on different controllers.  Underlying root/boot
presentation separates the raid from the MBR and root and boot partitions that
are un-raided and could eventually be done with a USB memory stick to afford a
hot spare implementation from the released /dev/sda.

The goal so far as the root file system is eventually a network/cluster
configuration tool so that root/boot partitions can be delivered over the
cluster to new and old nodes.  Until then the DVD.iso method works fine and can
rehabilitate a failed boot drive in the standard CentOS 5.2 install time.

The manual or list said without quoting in numerous places no partitions.  There
are no partitions in this configuration save for a 1TB / partition on /dev/sda1
of all main nodes and external journals on /dev/sdf1 on the MDT and MGS and
/dev/sdb1 on the two OST that all occupy ,50,L of the entire 1TB drive for no
doubt the 400mb journal.

Solution at the time was to learn proper syntax for creation of raid10 device. 
So instead of physically making two raid 1 arrays and one raid 0 array to make a
raid 1+0 configuration I had to learn the right way to make a raid10 - ya
believe it.  e2label was reporting MGS for two drive volumes and fstab was all
borked.

To top it all off I was dealing with a network anomaly that still persists on my
MGS node whereupon I can''t run the node at MTU 9000 while the rest of
the nodes that are set can.  Even removed pulled the box off the shelf checked
for hardware faults, reseated cards.  Removed all network interfaces and started
over.  Still persists due to mixing of MTU 1500 and MTU 9000 on the same subnet
no doubt.

Not sure if this is a proper list deliverable  but I have produced a series of
pictures that in my understanding show a small lustre ethernet cluster running
on comodity hardware doing 400MiB/s on one OST but also one that needs to handle
smaller files better.  http://www.ioio.ca/Lustre-tcp-bonding/images.html and
http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html

Typical usage so far shows that copying /var/lib/mysql is still a time consuming
process given 4.9G of data.  Web based files in flight are also typical small
file size. Further objectives for the cluster are not implemented at this time
but would include more of the same and then some.
Further suggestions regarding implementation of network specific cluster
enhancements, partitioning, formatting, benchmarking or modes appreciated.

My apologies for the --verbose thread that I hope is better formatted to fit
your screen and also for my lack of specific questions due to not having enough
experience to know the correct ones to ask at times.

a.

b.

-----Inline Attachment Follows-----

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Jeffrey Alan Bennett

2009-Jan-28 20:30 UTC

head link

[Lustre-discuss] Plateau around 200MiB/s bond0

Hi Arden,

Are you obtaining more than 100 MB/sec from one client to one OST? Given that
you are using 802.3ad link aggregation, it will determine the physical NIC by
the other party''s MAC address. So having multiple OST and multiple
clients will improve the chances of using more than one NIC of the bonding.

What is the maximum performance you obtain on the client with two 1GbE?

jeff




________________________________
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of Arden Wiebe
Sent: Sunday, January 25, 2009 12:08 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0

So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make 400
MiB/s or this is not how to calculate throughput?  I will eventually plug the
right sequence into iozone to measure it.
>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png
ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png
--- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote:

From: Arden Wiebe <albert682 at yahoo.com>
Subject: [Lustre-discuss] Plateau around 200MiB/s bond0
To: lustre-discuss at lists.lustre.org
Date: Saturday, January 24, 2009, 6:04 PM

1-2948-SFP Plus Baseline 3Com Switch
1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
1-CLIENT bond0(eth0,eth1)
1-CLIENT eth0
1-CLIENT eth0

I fail so far creating external journal for MDT, MGS and OSSx2.  How to add the
external journal to /etc/fstab specifically the output of e2label /dev/sdb
followed by what options for fstab?

[root at lustreone ~]# cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 17
  1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
  2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4
  3 UP mdc ioio-MDT0000-mdc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
  4 UP osc ioio-OST0000-osc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
  5 UP osc ioio-OST0001-osc-ffff810209363c00
7307490a-4a12-4e8c-56ea-448e030a82e4 5
[root at lustreone ~]# lfs df -h
UUID                     bytes      Used Available  Use% Mounted on
ioio-MDT0000_UUID       815.0G    534.0M    767.9G    0% /mnt/ioio[MDT:0]
ioio-OST0000_UUID         3.6T     28.4G      3.4T    0% /mnt/ioio[OST:0]
ioio-OST0001_UUID         3.6T     18.0G      3.4T    0% /mnt/ioio[OST:1]

filesystem summary:       7.2T     46.4G      6.8T    0% /mnt/ioio

[root at lustreone ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1
        Actor Key: 17
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eth0
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:db
Aggregator ID: 1

Slave Interface: eth1
MII Status: up
Link Failure Count: 1
Permanent HW addr: 00:1b:21:28:77:6c
Aggregator ID: 2

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:94
Aggregator ID: 3

Slave Interface: eth2
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:93
Aggregator ID: 4

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:95
Aggregator ID: 5

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:22:15:06:3a:96
Aggregator ID: 6
[root at lustreone ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb[0] sdc[1]
      976762496 blocks [2/2] [UU]

unused devices: <none>
[root at lustreone ~]# cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=MGS               /mnt/mgs                lustre  defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio  /mnt/ioio               lustre 
defaults,_netdev,noauto 0 0

[root at lustreone ~]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
          inet addr:192.168.0.7  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
          RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:12376680079 (11.5 GiB)  TX bytes:34438742885 (32.0 GiB)

eth0      Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
          inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
          RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:12290700380 (11.4 GiB)  TX bytes:34438581771 (32.0 GiB)
          Base address:0xec00 Memory:febe0000-fec00000
>From what I have read not having an external journal configured for the
OST''s is a sure recipie for slowness which I would rather not have
considering the goal is around 350MiB/s or more which should be obtainable.
Here is how I formated the raid6 device on both OSS''s that have
identical
[root at lustrefour ~]# fdisk -l

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1      121601   976760001   83  Linux

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdb doesn''t contain a valid partition table

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdc doesn''t contain a valid partition table

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdd doesn''t contain a valid partition table

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sde doesn''t contain a valid partition table

Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdf doesn''t contain a valid partition table

Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdg doesn''t contain a valid partition table

Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdh doesn''t contain a valid partition table

Disk /dev/md0: 4000.8 GB, 4000819183616 bytes
2 heads, 4 sectors/track, 976762496 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn''t contain a valid partition table
[root at lustrefour ~]#

[root at lustrefour ~]#  mdadm --create --assume-clean /dev/md0 --level=6
--chunk=128 --raid-devices=6 /dev/sd[cdefgh]
[root at lustrefour ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
      3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU]
                in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192
writes
                7712698 in raid5d, 8264 out of stripes, 25661224 handle called
                reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied
writes: 16115932
                0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out
                0 expanding overlap


unused devices: <none>

Followed with:

[root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168.0.7 at
tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0

[root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1

But that is hard to reassemble on the reboot or at least was before I use
e2label and label things right.  Question how to label the external journal in
fstab if at all?  Right now only running

[root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168.0.7 at
tcp0 --reformat /dev/md0

So just raid6 no external journal.

[root at lustrefour ~]# cat /etc/fstab
LABEL=/                 /                       ext3    defaults        1 1
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=ioio-OST0001      /mnt/ost00              lustre  defaults,_netdev 0 0
192.168.0.7 at tcp0:/ioio  /mnt/ioio               lustre 
defaults,_netdev,noauto 0 0

[root at lustrefour ~]#


[root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst  Read OST indx: 0 speed: 38.789337
Best   Read OST indx: 1 speed: 40.017201
Read Average: 39.403269 +/- 0.613932 MB/s
Worst  Write OST indx: 0 speed: 49.227064
Best   Write OST indx: 1 speed: 78.673564
Write Average: 63.950314 +/- 14.723250 MB/s
Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
----------------------------------------------------
0     38.789       49.227        105.596      83.206
1     40.017       78.674        102.356      52.063
[root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio
./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp
Number of Active OST devices : 2
Worst  Read OST indx: 0 speed: 38.559620
Best   Read OST indx: 1 speed: 40.053787
Read Average: 39.306704 +/- 0.747083 MB/s
Worst  Write OST indx: 0 speed: 71.623744
Best   Write OST indx: 1 speed: 82.764897
Write Average: 77.194320 +/- 5.570577 MB/s
Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
----------------------------------------------------
0     38.560       71.624        26.556      14.297
1     40.054       82.765        25.566      12.372
[root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576
3536+0 records in
3536+0 records out
3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s

lustreonetwothreefour all have the same for modprobe.conf

[root at lustrefour ~]# cat /etc/modprobe.conf
alias eth0 e1000
alias eth1 e1000
alias scsi_hostadapter pata_marvell
alias scsi_hostadapter1 ata_piix
options lnet networks=tcp
alias eth2 sky2
alias eth3 sky2
alias eth4 sky2
alias eth5 sky2
alias bond0 bonding
options bonding miimon=100 mode=4
[root at lustrefour ~]#

When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and
the send and recieve from the various nodes reaches a 209 MiB/s plateau?  Uggh



-----Inline Attachment Follows-----

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org</mc/compose?to=Lustre-discuss at
lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090128/de86b0a3/attachment-0001.html

Jeremy Mann

2009-Jan-28 21:56 UTC

head link

[Lustre-discuss] Plateau around 200MiB/s bond0

Arden, we also use dual channel gigE (bond0) and in my tests found that
this works best:

options bonding miimon=100 mode=802.3ad xmit_hash_policy=layer3+4

This allows us to get roughly 250 MB/s transfers. Here is the iozone
command I used:

 iozone -t1 -i0 -il -r4m -s2g

You will not get anymore performance unless you move to Infiniband or
another interconnect.

Jeffrey Alan Bennett wrote:> Hi Arden,
>
> Are you obtaining more than 100 MB/sec from one client to one OST? Given
> that you are using 802.3ad link aggregation, it will determine the
> physical NIC by the other party''s MAC address. So having multiple
OST and
> multiple clients will improve the chances of using more than one NIC of
> the bonding.
>
> What is the maximum performance you obtain on the client with two 1GbE?
>
> jeff
>
>
>
>
> ________________________________
> From: lustre-discuss-bounces at lists.lustre.org
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Arden
Wiebe
> Sent: Sunday, January 25, 2009 12:08 AM
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
>
> So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make
> 400 MiB/s or this is not how to calculate throughput?  I will eventually
> plug the right sequence into iozone to measure it.
>
>>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png
>> ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png
>
> --- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote:
>
> From: Arden Wiebe <albert682 at yahoo.com>
> Subject: [Lustre-discuss] Plateau around 200MiB/s bond0
> To: lustre-discuss at lists.lustre.org
> Date: Saturday, January 24, 2009, 6:04 PM
>
> 1-2948-SFP Plus Baseline 3Com Switch
> 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
> 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
> 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
> 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
> 1-CLIENT bond0(eth0,eth1)
> 1-CLIENT eth0
> 1-CLIENT eth0
>
> I fail so far creating external journal for MDT, MGS and OSSx2.  How to
> add the external journal to /etc/fstab specifically the output of e2label
> /dev/sdb followed by what options for fstab?
>
> [root at lustreone ~]# cat /proc/fs/lustre/devices
>   0 UP mgs MGS MGS 17
>   1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
>   2 UP lov ioio-clilov-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 4
>   3 UP mdc ioio-MDT0000-mdc-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
>   4 UP osc ioio-OST0000-osc-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
>   5 UP osc ioio-OST0001-osc-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
> [root at lustreone ~]# lfs df -h
> UUID                     bytes      Used Available  Use% Mounted on
> ioio-MDT0000_UUID       815.0G    534.0M    767.9G    0% /mnt/ioio[MDT:0]
> ioio-OST0000_UUID         3.6T     28.4G      3.4T    0% /mnt/ioio[OST:0]
> ioio-OST0001_UUID         3.6T     18.0G      3.4T    0% /mnt/ioio[OST:1]
>
> filesystem summary:       7.2T     46.4G      6.8T    0% /mnt/ioio
>
> [root at lustreone ~]# cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
>
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> 802.3ad info
> LACP rate: slow
> Active Aggregator Info:
>         Aggregator ID: 1
>         Number of ports: 1
>         Actor Key: 17
>         Partner Key: 1
>         Partner Mac Address: 00:00:00:00:00:00
>
> Slave Interface: eth0
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:28:77:db
> Aggregator ID: 1
>
> Slave Interface: eth1
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:28:77:6c
> Aggregator ID: 2
>
> Slave Interface: eth3
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:94
> Aggregator ID: 3
>
> Slave Interface: eth2
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:93
> Aggregator ID: 4
>
> Slave Interface: eth4
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:95
> Aggregator ID: 5
>
> Slave Interface: eth5
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:96
> Aggregator ID: 6
> [root at lustreone ~]# cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb[0] sdc[1]
>       976762496 blocks [2/2] [UU]
>
> unused devices: <none>
> [root at lustreone ~]# cat /etc/fstab
> LABEL=/                 /                       ext3    defaults        1
> 1
> tmpfs                   /dev/shm                tmpfs   defaults        0
> 0
> devpts                  /dev/pts                devpts  gid=5,mode=620  0
> 0
> sysfs                   /sys                    sysfs   defaults        0
> 0
> proc                    /proc                   proc    defaults        0
> 0
> LABEL=MGS               /mnt/mgs                lustre  defaults,_netdev 0
> 0
> 192.168.0.7 at tcp0:/ioio  /mnt/ioio               lustre
> defaults,_netdev,noauto 0 0
>
> [root at lustreone ~]# ifconfig
> bond0     Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
>           inet addr:192.168.0.7  Bcast:192.168.0.255  Mask:255.255.255.0
>           inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
>           UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
>           RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:12376680079 (11.5 GiB)  TX bytes:34438742885 (32.0 GiB)
>
> eth0      Link encap:Ethernet  HWaddr 00:1B:21:28:77:DB
>           inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
>           UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
>           RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:12290700380 (11.4 GiB)  TX bytes:34438581771 (32.0 GiB)
>           Base address:0xec00 Memory:febe0000-fec00000
>
>>From what I have read not having an external journal configured for the
>> OST''s is a sure recipie for slowness which I would rather not
have
>> considering the goal is around 350MiB/s or more which should be
>> obtainable.
>
> Here is how I formated the raid6 device on both OSS''s that have
identical
> [root at lustrefour ~]# fdisk -l
>
> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *           1      121601   976760001   83  Linux
>
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdb doesn''t contain a valid partition table
>
> Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdc doesn''t contain a valid partition table
>
> Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdd doesn''t contain a valid partition table
>
> Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sde doesn''t contain a valid partition table
>
> Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdf doesn''t contain a valid partition table
>
> Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdg doesn''t contain a valid partition table
>
> Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdh doesn''t contain a valid partition table
>
> Disk /dev/md0: 4000.8 GB, 4000819183616 bytes
> 2 heads, 4 sectors/track, 976762496 cylinders
> Units = cylinders of 8 * 512 = 4096 bytes
>
> Disk /dev/md0 doesn''t contain a valid partition table
> [root at lustrefour ~]#
>
> [root at lustrefour ~]#  mdadm --create --assume-clean /dev/md0 --level=6
> --chunk=128 --raid-devices=6 /dev/sd[cdefgh]
> [root at lustrefour ~]# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
>       3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU]
>                 in: 16674 reads, 16217479 writes; out: 3022788 reads,
> 32865192 writes
>                 7712698 in raid5d, 8264 out of stripes, 25661224 handle
> called
>                 reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584,
> copied writes: 16115932
>                 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out
>                 0 expanding overlap
>
>
> unused devices: <none>
>
> Followed with:
>
> [root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio
> --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1"
--reformat
> /dev/md0
>
> [root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1
>
> But that is hard to reassemble on the reboot or at least was before I use
> e2label and label things right.  Question how to label the external
> journal in fstab if at all?  Right now only running
>
> [root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost
> --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0
>
> So just raid6 no external journal.
>
> [root at lustrefour ~]# cat /etc/fstab
> LABEL=/                 /                       ext3    defaults        1
> 1
> tmpfs                   /dev/shm                tmpfs   defaults        0
> 0
> devpts                  /dev/pts                devpts  gid=5,mode=620  0
> 0
> sysfs                   /sys                    sysfs   defaults        0
> 0
> proc                    /proc                   proc    defaults        0
> 0
> LABEL=ioio-OST0001      /mnt/ost00              lustre  defaults,_netdev 0
> 0
> 192.168.0.7 at tcp0:/ioio  /mnt/ioio               lustre
> defaults,_netdev,noauto 0 0
>
> [root at lustrefour ~]#
>
>
> [root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio
> ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at
tcp
> Number of Active OST devices : 2
> Worst  Read OST indx: 0 speed: 38.789337
> Best   Read OST indx: 1 speed: 40.017201
> Read Average: 39.403269 +/- 0.613932 MB/s
> Worst  Write OST indx: 0 speed: 49.227064
> Best   Write OST indx: 1 speed: 78.673564
> Write Average: 63.950314 +/- 14.723250 MB/s
> Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
> ----------------------------------------------------
> 0     38.789       49.227        105.596      83.206
> 1     40.017       78.674        102.356      52.063
> [root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio
> ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at
tcp
> Number of Active OST devices : 2
> Worst  Read OST indx: 0 speed: 38.559620
> Best   Read OST indx: 1 speed: 40.053787
> Read Average: 39.306704 +/- 0.747083 MB/s
> Worst  Write OST indx: 0 speed: 71.623744
> Best   Write OST indx: 1 speed: 82.764897
> Write Average: 77.194320 +/- 5.570577 MB/s
> Ost#  Read(MB/s)  Write(MB/s)  Read-time  Write-time
> ----------------------------------------------------
> 0     38.560       71.624        26.556      14.297
> 1     40.054       82.765        25.566      12.372
> [root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576
> 3536+0 records in
> 3536+0 records out
> 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s
>
> lustreonetwothreefour all have the same for modprobe.conf
>
> [root at lustrefour ~]# cat /etc/modprobe.conf
> alias eth0 e1000
> alias eth1 e1000
> alias scsi_hostadapter pata_marvell
> alias scsi_hostadapter1 ata_piix
> options lnet networks=tcp
> alias eth2 sky2
> alias eth3 sky2
> alias eth4 sky2
> alias eth5 sky2
> alias bond0 bonding
> options bonding miimon=100 mode=4
> [root at lustrefour ~]#
>
> When do the same from all clients I can watch
> ./usr/bin/gnome-system-monitor and the send and recieve from the various
> nodes reaches a 209 MiB/s plateau?  Uggh
>
>
>
> -----Inline Attachment Follows-----
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org</mc/compose?to=Lustre-discuss at
lists.lustre.org>
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672

Arden Wiebe

2009-Jan-29 01:24 UTC

head link

[Lustre-discuss] Plateau around 200MiB/s bond0

I ran this on my 6GigE Bond0 MGS Client.  I had to go back and cd to the mounted
lustre directory and filesystem.

[root at lustreone ~]# cd /mnt/ioio
[root at lustreone ioio]# iozone -t1 -i0 -il -r4m -s2g

Record Size 4096 KB
        File size set to 2097152 KB
        Command line used: iozone -t1 -i0 -il -r4m -s2g
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 1 process
        Each process writes a 2097152 Kbyte file in 4096 Kbyte records

        Children see throughput for  1 initial writers  =  106916.81 KB/sec
        Parent sees throughput for  1 initial writers   =  105244.22 KB/sec
        Min throughput per process                      =  106916.81 KB/sec
        Max throughput per process                      =  106916.81 KB/sec
        Avg throughput per process                      =  106916.81 KB/sec
        Min xfer                                        = 2097152.00 KB

        Children see throughput for  1 rewriters        =  106882.15 KB/sec
        Parent sees throughput for  1 rewriters         =  105215.34 KB/sec
        Min throughput per process                      =  106882.15 KB/sec
        Max throughput per process                      =  106882.15 KB/sec
        Avg throughput per process                      =  106882.15 KB/sec
        Min xfer                                        = 2097152.00 KB

I ran this to match the physical ram in the MGS Client.

[root at lustreone ioio]# iozone -t1 -i0 -il -r4m -s8g
Run began: Wed Jan 28 17:33:53 2009

        Record Size 4096 KB
        File size set to 8388608 KB
        Command line used: iozone -t1 -i0 -il -r4m -s8g
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 1 process
        Each process writes a 8388608 Kbyte file in 4096 Kbyte records

        Children see throughput for  1 initial writers  =  100817.04 KB/sec
        Parent sees throughput for  1 initial writers   =  100420.04 KB/sec
        Min throughput per process                      =  100817.04 KB/sec
        Max throughput per process                      =  100817.04 KB/sec
        Avg throughput per process                      =  100817.04 KB/sec
        Min xfer                                        = 8388608.00 KB

        Children see throughput for  1 rewriters        =  100884.15 KB/sec
        Parent sees throughput for  1 rewriters         =  100487.30 KB/sec
        Min throughput per process                      =  100884.15 KB/sec
        Max throughput per process                      =  100884.15 KB/sec
        Avg throughput per process                      =  100884.15 KB/sec
        Min xfer                                        = 8388608.00 KB

Then I ran this to match my processors and my physical ram to the iozone results
by increasing -t1 to -t4.  Subsequent test of -t6 prove redundant.

[root at lustreone ioio]# iozone -t4 -i0 -il -r4m -s8g
Run began: Wed Jan 28 17:37:33 2009

        Record Size 4096 KB
        File size set to 8388608 KB
        Command line used: iozone -t4 -i0 -il -r4m -s8g
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 4 processes
        Each process writes a 8388608 Kbyte file in 4096 Kbyte records

        Children see throughput for  4 initial writers  =  206173.77 KB/sec
        Parent sees throughput for  4 initial writers   =  191062.04 KB/sec
        Min throughput per process                      =   48302.41 KB/sec
        Max throughput per process                      =   54266.61 KB/sec
        Avg throughput per process                      =   51543.44 KB/sec
        Min xfer                                        = 7467008.00 KB

        Children see throughput for  4 rewriters        =  206216.61 KB/sec
        Parent sees throughput for  4 rewriters         =  205358.90 KB/sec
        Min throughput per process                      =   50336.13 KB/sec
        Max throughput per process                      =   53059.13 KB/sec
        Avg throughput per process                      =   51554.15 KB/sec
        Min xfer                                        = 7958528.00 KB

With screens at http://ioio.ca/iozone/MGSClient/images.html that clearly show a
large jump in smooth stable network activity from the 200MiB/s to the 400MiB/s
range.

If one were to have more processors would that increase maximum throughput? 
Does the number of GigE interfaces scale to the number of processors?  Give
6GigE bond0 can I test in any other way to increase the 412MiB/s plateau?  How
do I best interpret the above results?

--- On Wed, 1/28/09, Jeremy Mann <jeremy at biochem.uthscsa.edu> wrote:

From: Jeremy Mann <jeremy at biochem.uthscsa.edu>
Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
To: "Arden Wiebe" <albert682 at yahoo.com>
Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at
lists.lustre.org>
Date: Wednesday, January 28, 2009, 1:56 PM

Arden, we also use dual channel gigE (bond0) and in my tests found that
this works best:

options bonding miimon=100 mode=802.3ad xmit_hash_policy=layer3+4

This allows us to get roughly 250 MB/s transfers. Here is the iozone
command I used:

 iozone -t1 -i0 -il -r4m -s2g

You will not get anymore performance unless you move to Infiniband or
another interconnect.

Jeffrey Alan Bennett wrote:> Hi Arden,
>
> Are you obtaining more than 100 MB/sec from one client to one OST? Given
> that you are using 802.3ad link aggregation, it will determine the
> physical NIC by the other party''s MAC address. So having multiple
OST and
> multiple clients will improve the chances of using more than one NIC of
> the bonding.
>
> What is the maximum performance you obtain on the client with two 1GbE?
>
> jeff
>
>
>
>
> ________________________________
> From: lustre-discuss-bounces at lists.lustre.org
> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Arden
Wiebe
> Sent: Sunday, January 25, 2009 12:08 AM
> To: lustre-discuss at lists.lustre.org
> Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0
>
> So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make
> 400 MiB/s or this is not how to calculate throughput?? I will eventually
> plug the right sequence into iozone to measure it.
>
>>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png
>> ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png
>
> --- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote:
>
> From: Arden Wiebe <albert682 at yahoo.com>
> Subject: [Lustre-discuss] Plateau around 200MiB/s bond0
> To: lustre-discuss at lists.lustre.org
> Date: Saturday, January 24, 2009, 6:04 PM
>
> 1-2948-SFP Plus Baseline 3Com Switch
> 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
> 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1
> 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6
> 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5)
> 1-CLIENT bond0(eth0,eth1)
> 1-CLIENT eth0
> 1-CLIENT eth0
>
> I fail so far creating external journal for MDT, MGS and OSSx2.? How to
> add the external journal to /etc/fstab specifically the output of e2label
> /dev/sdb followed by what options for fstab?
>
> [root at lustreone ~]# cat /proc/fs/lustre/devices
>???0 UP mgs MGS MGS 17
>???1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5
>???2 UP lov ioio-clilov-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 4
>???3 UP mdc ioio-MDT0000-mdc-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
>???4 UP osc ioio-OST0000-osc-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
>???5 UP osc ioio-OST0001-osc-ffff810209363c00
> 7307490a-4a12-4e8c-56ea-448e030a82e4 5
> [root at lustreone ~]# lfs df -h
> UUID? ? ? ? ? ? ? ? ? ???bytes? ? ? Used Available? Use% Mounted on
> ioio-MDT0000_UUID? ? ???815.0G? ? 534.0M? ? 767.9G? ? 0% /mnt/ioio[MDT:0]
> ioio-OST0000_UUID? ? ? ???3.6T? ???28.4G? ? ? 3.4T? ? 0% /mnt/ioio[OST:0]
> ioio-OST0001_UUID? ? ? ???3.6T? ???18.0G? ? ? 3.4T? ? 0% /mnt/ioio[OST:1]
>
> filesystem summary:? ? ???7.2T? ???46.4G? ? ? 6.8T? ? 0% /mnt/ioio
>
> [root at lustreone ~]# cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
>
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2 (0)
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
>
> 802.3ad info
> LACP rate: slow
> Active Aggregator Info:
>? ? ? ???Aggregator ID: 1
>? ? ? ???Number of ports: 1
>? ? ? ???Actor Key: 17
>? ? ? ???Partner Key: 1
>? ? ? ???Partner Mac Address: 00:00:00:00:00:00
>
> Slave Interface: eth0
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:28:77:db
> Aggregator ID: 1
>
> Slave Interface: eth1
> MII Status: up
> Link Failure Count: 1
> Permanent HW addr: 00:1b:21:28:77:6c
> Aggregator ID: 2
>
> Slave Interface: eth3
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:94
> Aggregator ID: 3
>
> Slave Interface: eth2
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:93
> Aggregator ID: 4
>
> Slave Interface: eth4
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:95
> Aggregator ID: 5
>
> Slave Interface: eth5
> MII Status: up
> Link Failure Count: 0
> Permanent HW addr: 00:22:15:06:3a:96
> Aggregator ID: 6
> [root at lustreone ~]# cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb[0] sdc[1]
>? ? ???976762496 blocks [2/2] [UU]
>
> unused devices: <none>
> [root at lustreone ~]# cat /etc/fstab
> LABEL=/? ? ? ? ? ? ? ???/? ? ? ? ? ? ? ? ? ? ???ext3? ? defaults? ? ? ? 1
> 1
> tmpfs? ? ? ? ? ? ? ? ???/dev/shm? ? ? ? ? ? ? ? tmpfs???defaults? ? ? ? 0
> 0
> devpts? ? ? ? ? ? ? ? ? /dev/pts? ? ? ? ? ? ? ? devpts? gid=5,mode=620? 0
> 0
> sysfs? ? ? ? ? ? ? ? ???/sys? ? ? ? ? ? ? ? ? ? sysfs???defaults? ? ? ? 0
> 0
> proc? ? ? ? ? ? ? ? ? ? /proc? ? ? ? ? ? ? ? ???proc? ? defaults? ? ? ? 0
> 0
> LABEL=MGS? ? ? ? ? ? ???/mnt/mgs? ? ? ? ? ? ? ? lustre? defaults,_netdev 0
> 0
> 192.168.0.7 at tcp0:/ioio? /mnt/ioio? ? ? ? ? ? ???lustre
> defaults,_netdev,noauto 0 0
>
> [root at lustreone ~]# ifconfig
> bond0? ???Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB
>? ? ? ? ???inet addr:192.168.0.7? Bcast:192.168.0.255? Mask:255.255.255.0
>? ? ? ? ???inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
>? ? ? ? ???UP BROADCAST RUNNING MASTER MULTICAST? MTU:9000? Metric:1
>? ? ? ? ???RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0
>? ? ? ? ???TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0
>? ? ? ? ???collisions:0 txqueuelen:0
>? ? ? ? ???RX bytes:12376680079 (11.5 GiB)? TX bytes:34438742885 (32.0 GiB)
>
> eth0? ? ? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB
>? ? ? ? ???inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link
>? ? ? ? ???UP BROADCAST RUNNING SLAVE MULTICAST? MTU:9000? Metric:1
>? ? ? ? ???RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0
>? ? ? ? ???TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0
>? ? ? ? ???collisions:0 txqueuelen:1000
>? ? ? ? ???RX bytes:12290700380 (11.4 GiB)? TX bytes:34438581771 (32.0 GiB)
>? ? ? ? ???Base address:0xec00 Memory:febe0000-fec00000
>
>>From what I have read not having an external journal configured for the
>> OST''s is a sure recipie for slowness which I would rather not
have
>> considering the goal is around 350MiB/s or more which should be
>> obtainable.
>
> Here is how I formated the raid6 device on both OSS''s that have
identical
> [root at lustrefour ~]# fdisk -l
>
> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
>? ? Device Boot? ? ? Start? ? ? ???End? ? ? Blocks???Id? System
> /dev/sda1???*? ? ? ? ???1? ? ? 121601???976760001???83? Linux
>
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdb doesn''t contain a valid partition table
>
> Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdc doesn''t contain a valid partition table
>
> Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdd doesn''t contain a valid partition table
>
> Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sde doesn''t contain a valid partition table
>
> Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdf doesn''t contain a valid partition table
>
> Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdg doesn''t contain a valid partition table
>
> Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
>
> Disk /dev/sdh doesn''t contain a valid partition table
>
> Disk /dev/md0: 4000.8 GB, 4000819183616 bytes
> 2 heads, 4 sectors/track, 976762496 cylinders
> Units = cylinders of 8 * 512 = 4096 bytes
>
> Disk /dev/md0 doesn''t contain a valid partition table
> [root at lustrefour ~]#
>
> [root at lustrefour ~]#? mdadm --create --assume-clean /dev/md0 --level=6
> --chunk=128 --raid-devices=6 /dev/sd[cdefgh]
> [root at lustrefour ~]# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
>? ? ???3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU]
>? ? ? ? ? ? ? ???in: 16674 reads, 16217479 writes; out: 3022788 reads,
> 32865192 writes
>? ? ? ? ? ? ? ???7712698 in raid5d, 8264 out of stripes, 25661224 handle
> called
>? ? ? ? ? ? ? ???reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584,
> copied writes: 16115932
>? ? ? ? ? ? ? ???0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out
>? ? ? ? ? ? ? ???0 expanding overlap
>
>
> unused devices: <none>
>
> Followed with:
>
> [root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio
> --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1"
--reformat
> /dev/md0
>
> [root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1
>
> But that is hard to reassemble on the reboot or at least was before I use
> e2label and label things right.? Question how to label the external
> journal in fstab if at all?? Right now only running
>
> [root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost
> --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0
>
> So just raid6 no external journal.
>
> [root at lustrefour ~]# cat /etc/fstab
> LABEL=/? ? ? ? ? ? ? ???/? ? ? ? ? ? ? ? ? ? ???ext3? ? defaults? ? ? ? 1
> 1
> tmpfs? ? ? ? ? ? ? ? ???/dev/shm? ? ? ? ? ? ? ? tmpfs???defaults? ? ? ? 0
> 0
> devpts? ? ? ? ? ? ? ? ? /dev/pts? ? ? ? ? ? ? ? devpts? gid=5,mode=620? 0
> 0
> sysfs? ? ? ? ? ? ? ? ???/sys? ? ? ? ? ? ? ? ? ? sysfs???defaults? ? ? ? 0
> 0
> proc? ? ? ? ? ? ? ? ? ? /proc? ? ? ? ? ? ? ? ???proc? ? defaults? ? ? ? 0
> 0
> LABEL=ioio-OST0001? ? ? /mnt/ost00? ? ? ? ? ? ? lustre? defaults,_netdev 0
> 0
> 192.168.0.7 at tcp0:/ioio? /mnt/ioio? ? ? ? ? ? ???lustre
> defaults,_netdev,noauto 0 0
>
> [root at lustrefour ~]#
>
>
> [root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio
> ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at
tcp
> Number of Active OST devices : 2
> Worst? Read OST indx: 0 speed: 38.789337
> Best???Read OST indx: 1 speed: 40.017201
> Read Average: 39.403269 +/- 0.613932 MB/s
> Worst? Write OST indx: 0 speed: 49.227064
> Best???Write OST indx: 1 speed: 78.673564
> Write Average: 63.950314 +/- 14.723250 MB/s
> Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time
> ----------------------------------------------------
> 0? ???38.789? ? ???49.227? ? ? ? 105.596? ? ? 83.206
> 1? ???40.017? ? ???78.674? ? ? ? 102.356? ? ? 52.063
> [root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio
> ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at
tcp
> Number of Active OST devices : 2
> Worst? Read OST indx: 0 speed: 38.559620
> Best???Read OST indx: 1 speed: 40.053787
> Read Average: 39.306704 +/- 0.747083 MB/s
> Worst? Write OST indx: 0 speed: 71.623744
> Best???Write OST indx: 1 speed: 82.764897
> Write Average: 77.194320 +/- 5.570577 MB/s
> Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time
> ----------------------------------------------------
> 0? ???38.560? ? ???71.624? ? ? ? 26.556? ? ? 14.297
> 1? ???40.054? ? ???82.765? ? ? ? 25.566? ? ? 12.372
> [root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576
> 3536+0 records in
> 3536+0 records out
> 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s
>
> lustreonetwothreefour all have the same for modprobe.conf
>
> [root at lustrefour ~]# cat /etc/modprobe.conf
> alias eth0 e1000
> alias eth1 e1000
> alias scsi_hostadapter pata_marvell
> alias scsi_hostadapter1 ata_piix
> options lnet networks=tcp
> alias eth2 sky2
> alias eth3 sky2
> alias eth4 sky2
> alias eth5 sky2
> alias bond0 bonding
> options bonding miimon=100 mode=4
> [root at lustrefour ~]#
>
> When do the same from all clients I can watch
> ./usr/bin/gnome-system-monitor and the send and recieve from the various
> nodes reaches a 209 MiB/s plateau?? Uggh
>
>
>
> -----Inline Attachment Follows-----
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org</mc/compose?to=Lustre-discuss at
lists.lustre.org>
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672

Jeremy Mann

2009-Jan-30 18:43 UTC

head link

[Lustre-discuss] Plateau around 200MiB/s bond0

Arden Wiebe wrote:
> With screens at http://ioio.ca/iozone/MGSClient/images.html that clearly
> show a large jump in smooth stable network activity from the 200MiB/s to
> the 400MiB/s range.
>
> If one were to have more processors would that increase maximum
> throughput?  Does the number of GigE interfaces scale to the number of
> processors?  Give 6GigE bond0 can I test in any other way to increase the
> 412MiB/s plateau?  How do I best interpret the above results?
In your case, (as was in ours) I would assume the limiting factor are the
speeds of the drives on the OSTs.

-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672

Lustre discuss - Jan 2009 - Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0

[Lustre-discuss] Plateau around 200MiB/s bond0