1-2948-SFP Plus Baseline 3Com Switch 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5) 1-CLIENT bond0(eth0,eth1) 1-CLIENT eth0 1-CLIENT eth0 I fail so far creating external journal for MDT, MGS and OSSx2.? How to add the external journal to /etc/fstab specifically the output of e2label /dev/sdb followed by what options for fstab? [root at lustreone ~]# cat /proc/fs/lustre/devices ? 0 UP mgs MGS MGS 17 ? 1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5 ? 2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4 ? 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 ? 4 UP osc ioio-OST0000-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 ? 5 UP osc ioio-OST0001-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 [root at lustreone ~]# lfs df -h UUID???????????????????? bytes????? Used Available? Use% Mounted on ioio-MDT0000_UUID?????? 815.0G??? 534.0M??? 767.9G??? 0% /mnt/ioio[MDT:0] ioio-OST0000_UUID???????? 3.6T???? 28.4G????? 3.4T??? 0% /mnt/ioio[OST:0] ioio-OST0001_UUID???????? 3.6T???? 18.0G????? 3.4T??? 0% /mnt/ioio[OST:1] filesystem summary:?????? 7.2T???? 46.4G????? 6.8T??? 0% /mnt/ioio [root at lustreone ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Active Aggregator Info: ??????? Aggregator ID: 1 ??????? Number of ports: 1 ??????? Actor Key: 17 ??????? Partner Key: 1 ??????? Partner Mac Address: 00:00:00:00:00:00 Slave Interface: eth0 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:db Aggregator ID: 1 Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:6c Aggregator ID: 2 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:94 Aggregator ID: 3 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:93 Aggregator ID: 4 Slave Interface: eth4 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:95 Aggregator ID: 5 Slave Interface: eth5 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:96 Aggregator ID: 6 [root at lustreone ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb[0] sdc[1] ????? 976762496 blocks [2/2] [UU] unused devices: <none> [root at lustreone ~]# cat /etc/fstab LABEL=/???????????????? /?????????????????????? ext3??? defaults??????? 1 1 tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0 devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0 0 sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0 0 proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0 LABEL=MGS?????????????? /mnt/mgs??????????????? lustre? defaults,_netdev 0 0 192.168.0.7 at tcp0:/ioio? /mnt/ioio?????????????? lustre? defaults,_netdev,noauto 0 0 [root at lustreone ~]# ifconfig bond0???? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB ????????? inet addr:192.168.0.7? Bcast:192.168.0.255? Mask:255.255.255.0 ????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link ????????? UP BROADCAST RUNNING MASTER MULTICAST? MTU:9000? Metric:1 ????????? RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0 ????????? TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0 ????????? collisions:0 txqueuelen:0 ????????? RX bytes:12376680079 (11.5 GiB)? TX bytes:34438742885 (32.0 GiB) eth0????? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB ????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link ????????? UP BROADCAST RUNNING SLAVE MULTICAST? MTU:9000? Metric:1 ????????? RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0 ????????? TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0 ????????? collisions:0 txqueuelen:1000 ????????? RX bytes:12290700380 (11.4 GiB)? TX bytes:34438581771 (32.0 GiB) ????????? Base address:0xec00 Memory:febe0000-fec00000>From what I have read not having an external journal configured for the OST''s is a sure recipie for slowness which I would rather not have considering the goal is around 350MiB/s or more which should be obtainable.?Here is how I formated the raid6 device on both OSS''s that have identical [root at lustrefour ~]# fdisk -l Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes ?? Device Boot????? Start???????? End????? Blocks?? Id? System /dev/sda1?? *?????????? 1????? 121601?? 976760001?? 83? Linux Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdb doesn''t contain a valid partition table Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn''t contain a valid partition table Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdd doesn''t contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sde doesn''t contain a valid partition table Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdf doesn''t contain a valid partition table Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdg doesn''t contain a valid partition table Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdh doesn''t contain a valid partition table Disk /dev/md0: 4000.8 GB, 4000819183616 bytes 2 heads, 4 sectors/track, 976762496 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md0 doesn''t contain a valid partition table [root at lustrefour ~]# [root at lustrefour ~]#? mdadm --create --assume-clean /dev/md0 --level=6 --chunk=128 --raid-devices=6 /dev/sd[cdefgh] [root at lustrefour ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1] ????? 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU] ??????????????? in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192 writes ??????????????? 7712698 in raid5d, 8264 out of stripes, 25661224 handle called ??????????????? reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied writes: 16115932 ??????????????? 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out ??????????????? 0 expanding overlap unused devices: <none> Followed with: [root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0 [root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1 But that is hard to reassemble on the reboot or at least was before I use e2label and label things right.? Question how to label the external journal in fstab if at all?? Right now only running [root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0 So just raid6 no external journal. [root at lustrefour ~]# cat /etc/fstab LABEL=/???????????????? /?????????????????????? ext3??? defaults??????? 1 1 tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0 devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0 0 sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0 0 proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0 LABEL=ioio-OST0001????? /mnt/ost00????????????? lustre? defaults,_netdev 0 0 192.168.0.7 at tcp0:/ioio? /mnt/ioio?????????????? lustre? defaults,_netdev,noauto 0 0 [root at lustrefour ~]# [root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp Number of Active OST devices : 2 Worst? Read OST indx: 0 speed: 38.789337 Best?? Read OST indx: 1 speed: 40.017201 Read Average: 39.403269 +/- 0.613932 MB/s Worst? Write OST indx: 0 speed: 49.227064 Best?? Write OST indx: 1 speed: 78.673564 Write Average: 63.950314 +/- 14.723250 MB/s Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time ---------------------------------------------------- 0???? 38.789?????? 49.227??????? 105.596????? 83.206 1???? 40.017?????? 78.674??????? 102.356????? 52.063 [root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp Number of Active OST devices : 2 Worst? Read OST indx: 0 speed: 38.559620 Best?? Read OST indx: 1 speed: 40.053787 Read Average: 39.306704 +/- 0.747083 MB/s Worst? Write OST indx: 0 speed: 71.623744 Best?? Write OST indx: 1 speed: 82.764897 Write Average: 77.194320 +/- 5.570577 MB/s Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time ---------------------------------------------------- 0???? 38.560?????? 71.624??????? 26.556????? 14.297 1???? 40.054?????? 82.765??????? 25.566????? 12.372 [root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576 3536+0 records in 3536+0 records out 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s lustreonetwothreefour all have the same for modprobe.conf [root at lustrefour ~]# cat /etc/modprobe.conf alias eth0 e1000 alias eth1 e1000 alias scsi_hostadapter pata_marvell alias scsi_hostadapter1 ata_piix options lnet networks=tcp alias eth2 sky2 alias eth3 sky2 alias eth4 sky2 alias eth5 sky2 alias bond0 bonding options bonding miimon=100 mode=4 [root at lustrefour ~]#?? When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and the send and recieve from the various nodes reaches a 209 MiB/s plateau?? Uggh -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090124/3c4b417b/attachment-0001.html
So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make 400 MiB/s or this is not how to calculate throughput?? I will eventually plug the right sequence into iozone to measure it.?>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png--- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote: From: Arden Wiebe <albert682 at yahoo.com> Subject: [Lustre-discuss] Plateau around 200MiB/s bond0 To: lustre-discuss at lists.lustre.org Date: Saturday, January 24, 2009, 6:04 PM 1-2948-SFP Plus Baseline 3Com Switch 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5) 1-CLIENT bond0(eth0,eth1) 1-CLIENT eth0 1-CLIENT eth0 I fail so far creating external journal for MDT, MGS and OSSx2.? How to add the external journal to /etc/fstab specifically the output of e2label /dev/sdb followed by what options for fstab? [root at lustreone ~]# cat /proc/fs/lustre/devices ? 0 UP mgs MGS MGS 17 ? 1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5 ? 2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4 ? 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 ? 4 UP osc ioio-OST0000-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 ? 5 UP osc ioio-OST0001-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 [root at lustreone ~]# lfs df -h UUID???????????????????? bytes????? Used Available? Use% Mounted on ioio-MDT0000_UUID?????? 815.0G??? 534.0M??? 767.9G??? 0% /mnt/ioio[MDT:0] ioio-OST0000_UUID???????? 3.6T???? 28.4G????? 3.4T??? 0% /mnt/ioio[OST:0] ioio-OST0001_UUID???????? 3.6T???? 18.0G????? 3.4T??? 0% /mnt/ioio[OST:1] filesystem summary:?????? 7.2T???? 46.4G????? 6.8T??? 0% /mnt/ioio [root at lustreone ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Active Aggregator Info: ??????? Aggregator ID: 1 ??????? Number of ports: 1 ??????? Actor Key: 17 ??????? Partner Key: 1 ??????? Partner Mac Address: 00:00:00:00:00:00 Slave Interface: eth0 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:db Aggregator ID: 1 Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:6c Aggregator ID: 2 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:94 Aggregator ID: 3 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:93 Aggregator ID: 4 Slave Interface: eth4 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:95 Aggregator ID: 5 Slave Interface: eth5 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:96 Aggregator ID: 6 [root at lustreone ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb[0] sdc[1] ????? 976762496 blocks [2/2] [UU] unused devices: <none> [root at lustreone ~]# cat /etc/fstab LABEL=/???????????????? /?????????????????????? ext3??? defaults??????? 1 1 tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0 devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0 0 sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0 0 proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0 LABEL=MGS?????????????? /mnt/mgs??????????????? lustre? defaults,_netdev 0 0 192.168.0.7 at tcp0:/ioio? /mnt/ioio?????????????? lustre? defaults,_netdev,noauto 0 0 [root at lustreone ~]# ifconfig bond0???? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB ????????? inet addr:192.168.0.7? Bcast:192.168.0.255? Mask:255.255.255.0 ????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link ????????? UP BROADCAST RUNNING MASTER MULTICAST? MTU:9000? Metric:1 ????????? RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0 ????????? TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0 ????????? collisions:0 txqueuelen:0 ????????? RX bytes:12376680079 (11.5 GiB)? TX bytes:34438742885 (32.0 GiB) eth0????? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB ????????? inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link ????????? UP BROADCAST RUNNING SLAVE MULTICAST? MTU:9000? Metric:1 ????????? RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0 ????????? TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0 ????????? collisions:0 txqueuelen:1000 ????????? RX bytes:12290700380 (11.4 GiB)? TX bytes:34438581771 (32.0 GiB) ????????? Base address:0xec00 Memory:febe0000-fec00000>From what I have read not having an external journal configured for the OST''s is a sure recipie for slowness which I would rather not have considering the goal is around 350MiB/s or more which should be obtainable.?Here is how I formated the raid6 device on both OSS''s that have identical [root at lustrefour ~]# fdisk -l Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes ?? Device Boot????? Start???????? End????? Blocks?? Id? System /dev/sda1?? *?????????? 1????? 121601?? 976760001?? 83? Linux Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdb doesn''t contain a valid partition table Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn''t contain a valid partition table Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdd doesn''t contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sde doesn''t contain a valid partition table Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdf doesn''t contain a valid partition table Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdg doesn''t contain a valid partition table Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdh doesn''t contain a valid partition table Disk /dev/md0: 4000.8 GB, 4000819183616 bytes 2 heads, 4 sectors/track, 976762496 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md0 doesn''t contain a valid partition table [root at lustrefour ~]# [root at lustrefour ~]#? mdadm --create --assume-clean /dev/md0 --level=6 --chunk=128 --raid-devices=6 /dev/sd[cdefgh] [root at lustrefour ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1] ????? 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU] ??????????????? in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192 writes ??????????????? 7712698 in raid5d, 8264 out of stripes, 25661224 handle called ??????????????? reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied writes: 16115932 ??????????????? 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out ??????????????? 0 expanding overlap unused devices: <none> Followed with: [root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0 [root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1 But that is hard to reassemble on the reboot or at least was before I use e2label and label things right.? Question how to label the external journal in fstab if at all?? Right now only running [root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0 So just raid6 no external journal. [root at lustrefour ~]# cat /etc/fstab LABEL=/???????????????? /?????????????????????? ext3??? defaults??????? 1 1 tmpfs?????????????????? /dev/shm??????????????? tmpfs?? defaults??????? 0 0 devpts????????????????? /dev/pts??????????????? devpts? gid=5,mode=620? 0 0 sysfs?????????????????? /sys??????????????????? sysfs?? defaults??????? 0 0 proc??????????????????? /proc?????????????????? proc??? defaults??????? 0 0 LABEL=ioio-OST0001????? /mnt/ost00????????????? lustre? defaults,_netdev 0 0 192.168.0.7 at tcp0:/ioio? /mnt/ioio?????????????? lustre? defaults,_netdev,noauto 0 0 [root at lustrefour ~]# [root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp Number of Active OST devices : 2 Worst? Read OST indx: 0 speed: 38.789337 Best?? Read OST indx: 1 speed: 40.017201 Read Average: 39.403269 +/- 0.613932 MB/s Worst? Write OST indx: 0 speed: 49.227064 Best?? Write OST indx: 1 speed: 78.673564 Write Average: 63.950314 +/- 14.723250 MB/s Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time ---------------------------------------------------- 0???? 38.789?????? 49.227??????? 105.596????? 83.206 1???? 40.017?????? 78.674??????? 102.356????? 52.063 [root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp Number of Active OST devices : 2 Worst? Read OST indx: 0 speed: 38.559620 Best?? Read OST indx: 1 speed: 40.053787 Read Average: 39.306704 +/- 0.747083 MB/s Worst? Write OST indx: 0 speed: 71.623744 Best?? Write OST indx: 1 speed: 82.764897 Write Average: 77.194320 +/- 5.570577 MB/s Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time ---------------------------------------------------- 0???? 38.560?????? 71.624??????? 26.556????? 14.297 1???? 40.054?????? 82.765??????? 25.566????? 12.372 [root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576 3536+0 records in 3536+0 records out 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s lustreonetwothreefour all have the same for modprobe.conf [root at lustrefour ~]# cat /etc/modprobe.conf alias eth0 e1000 alias eth1 e1000 alias scsi_hostadapter pata_marvell alias scsi_hostadapter1 ata_piix options lnet networks=tcp alias eth2 sky2 alias eth3 sky2 alias eth4 sky2 alias eth5 sky2 alias bond0 bonding options bonding miimon=100 mode=4 [root at lustrefour ~]#?? When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and the send and recieve from the various nodes reaches a 209 MiB/s plateau?? Uggh -----Inline Attachment Follows----- _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090125/ecb66e98/attachment-0001.html
In general, when writing messages to this list, you need to be more concise about what you are asking. I see so much information here, I''m not sure what is relevant to your few interspersed questions and what is not. I will try to answer your specific question... Also, in the future, please use a simple plain-text format and just copy and paste for plain-text content. All of the "quoted-printable" mime-types are confusing my MUA. On Sat, 2009-01-24 at 18:04 -0800, Arden Wiebe wrote:> > I fail so far creating external journal for MDT, MGS and OSSx2. How > to add the external journal to /etc/fstab specifically the output of > e2label /dev/sdb followed by what options for fstab? >You need to look at the mkfs.ext3 manpage on how to create an external journal (i.e. -O journal_dev external-journal) and attach an external journal to an ext3 filesystem (i.e. -J device=external-journal) then apply those mkfs.ext3 options to your Lustre device with mkfs.lustre''s --mkfsoptions option. All of this is covered in the operations manual in section 10.3 "Creating an External Journal". b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090126/58a2b8ae/attachment.bin
--- On Mon, 1/26/09, Brian J. Murrell <Brian.Murrell at Sun.COM> wrote: From: Brian J. Murrell <Brian.Murrell at Sun.COM> Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0 To: lustre-discuss at lists.lustre.org Date: Monday, January 26, 2009, 6:59 AM In general, when writing messages to this list, you need to be more concise about what you are asking.? I see so much information here, I''m not sure what is relevant to your few interspersed questions and what is not.? I will try to answer your specific question... My apologies for posting my study hacks to the list. Thanks Brian for at least trying to answer questions that I have to learn the answer for myself first before I know the correct question to ask. Also, in the future, please use a simple plain-text format and just copy and paste for plain-text content.? All of the "quoted-printable" mime-types are confusing my MUA. No doubt. Sorry, I''m not good with MTA or MUA in general but I''ll switch to plain text in the future. On Sat, 2009-01-24 at 18:04 -0800, Arden Wiebe wrote:> > I fail so far creating external journal for MDT, MGS and OSSx2.? How > to add the external journal to /etc/fstab specifically the output of > e2label /dev/sdb followed by what options for fstab? >You need to look at the mkfs.ext3 manpage on how to create an external journal (i.e. -O journal_dev external-journal) and attach an external journal to an ext3 filesystem (i.e. -J device=external-journal) then apply those mkfs.ext3 options to your Lustre device with mkfs.lustre''s --mkfsoptions option. All of this is covered in the operations manual in section 10.3 "Creating an External Journal". Been there done that well sort of. Managed to have every luster filesystem with external journals some even on different controllers. Underlying root/boot presentation separates the raid from the MBR and root and boot partitions that are un-raided and could eventually be done with a USB memory stick to afford a hot spare implementation from the released /dev/sda. The goal so far as the root file system is eventually a network/cluster configuration tool so that root/boot partitions can be delivered over the cluster to new and old nodes. Until then the DVD.iso method works fine and can rehabilitate a failed boot drive in the standard CentOS 5.2 install time. The manual or list said without quoting in numerous places no partitions. There are no partitions in this configuration save for a 1TB / partition on /dev/sda1 of all main nodes and external journals on /dev/sdf1 on the MDT and MGS and /dev/sdb1 on the two OST that all occupy ,50,L of the entire 1TB drive for no doubt the 400mb journal. Solution at the time was to learn proper syntax for creation of raid10 device. So instead of physically making two raid 1 arrays and one raid 0 array to make a raid 1+0 configuration I had to learn the right way to make a raid10 - ya believe it. e2label was reporting MGS for two drive volumes and fstab was all borked. To top it all off I was dealing with a network anomaly that still persists on my MGS node whereupon I can''t run the node at MTU 9000 while the rest of the nodes that are set can. Even removed pulled the box off the shelf checked for hardware faults, reseated cards. Removed all network interfaces and started over. Still persists due to mixing of MTU 1500 and MTU 9000 on the same subnet no doubt. Not sure if this is a proper list deliverable but I have produced a series of pictures that in my understanding show a small lustre ethernet cluster running on comodity hardware doing 400MiB/s on one OST but also one that needs to handle smaller files better. http://www.ioio.ca/Lustre-tcp-bonding/images.html and http://www.ioio.ca/Lustre-tcp-bonding/Lustre-notes/images.html Typical usage so far shows that copying /var/lib/mysql is still a time consuming process given 4.9G of data. Web based files in flight are also typical small file size. Further objectives for the cluster are not implemented at this time but would include more of the same and then some. Further suggestions regarding implementation of network specific cluster enhancements, partitioning, formatting, benchmarking or modes appreciated. My apologies for the --verbose thread that I hope is better formatted to fit your screen and also for my lack of specific questions due to not having enough experience to know the correct ones to ask at times. a. b. -----Inline Attachment Follows----- _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hi Arden, Are you obtaining more than 100 MB/sec from one client to one OST? Given that you are using 802.3ad link aggregation, it will determine the physical NIC by the other party''s MAC address. So having multiple OST and multiple clients will improve the chances of using more than one NIC of the bonding. What is the maximum performance you obtain on the client with two 1GbE? jeff ________________________________ From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Arden Wiebe Sent: Sunday, January 25, 2009 12:08 AM To: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0 So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make 400 MiB/s or this is not how to calculate throughput? I will eventually plug the right sequence into iozone to measure it.>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png--- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote: From: Arden Wiebe <albert682 at yahoo.com> Subject: [Lustre-discuss] Plateau around 200MiB/s bond0 To: lustre-discuss at lists.lustre.org Date: Saturday, January 24, 2009, 6:04 PM 1-2948-SFP Plus Baseline 3Com Switch 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5) 1-CLIENT bond0(eth0,eth1) 1-CLIENT eth0 1-CLIENT eth0 I fail so far creating external journal for MDT, MGS and OSSx2. How to add the external journal to /etc/fstab specifically the output of e2label /dev/sdb followed by what options for fstab? [root at lustreone ~]# cat /proc/fs/lustre/devices 0 UP mgs MGS MGS 17 1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5 2 UP lov ioio-clilov-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 4 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 4 UP osc ioio-OST0000-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 5 UP osc ioio-OST0001-osc-ffff810209363c00 7307490a-4a12-4e8c-56ea-448e030a82e4 5 [root at lustreone ~]# lfs df -h UUID bytes Used Available Use% Mounted on ioio-MDT0000_UUID 815.0G 534.0M 767.9G 0% /mnt/ioio[MDT:0] ioio-OST0000_UUID 3.6T 28.4G 3.4T 0% /mnt/ioio[OST:0] ioio-OST0001_UUID 3.6T 18.0G 3.4T 0% /mnt/ioio[OST:1] filesystem summary: 7.2T 46.4G 6.8T 0% /mnt/ioio [root at lustreone ~]# cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) Bonding Mode: IEEE 802.3ad Dynamic link aggregation Transmit Hash Policy: layer2 (0) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 802.3ad info LACP rate: slow Active Aggregator Info: Aggregator ID: 1 Number of ports: 1 Actor Key: 17 Partner Key: 1 Partner Mac Address: 00:00:00:00:00:00 Slave Interface: eth0 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:db Aggregator ID: 1 Slave Interface: eth1 MII Status: up Link Failure Count: 1 Permanent HW addr: 00:1b:21:28:77:6c Aggregator ID: 2 Slave Interface: eth3 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:94 Aggregator ID: 3 Slave Interface: eth2 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:93 Aggregator ID: 4 Slave Interface: eth4 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:95 Aggregator ID: 5 Slave Interface: eth5 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:22:15:06:3a:96 Aggregator ID: 6 [root at lustreone ~]# cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb[0] sdc[1] 976762496 blocks [2/2] [UU] unused devices: <none> [root at lustreone ~]# cat /etc/fstab LABEL=/ / ext3 defaults 1 1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=MGS /mnt/mgs lustre defaults,_netdev 0 0 192.168.0.7 at tcp0:/ioio /mnt/ioio lustre defaults,_netdev,noauto 0 0 [root at lustreone ~]# ifconfig bond0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB inet addr:192.168.0.7 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1 RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0 TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:12376680079 (11.5 GiB) TX bytes:34438742885 (32.0 GiB) eth0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link UP BROADCAST RUNNING SLAVE MULTICAST MTU:9000 Metric:1 RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0 TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:12290700380 (11.4 GiB) TX bytes:34438581771 (32.0 GiB) Base address:0xec00 Memory:febe0000-fec00000>From what I have read not having an external journal configured for the OST''s is a sure recipie for slowness which I would rather not have considering the goal is around 350MiB/s or more which should be obtainable.Here is how I formated the raid6 device on both OSS''s that have identical [root at lustrefour ~]# fdisk -l Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 121601 976760001 83 Linux Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdb doesn''t contain a valid partition table Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdc doesn''t contain a valid partition table Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdd doesn''t contain a valid partition table Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sde doesn''t contain a valid partition table Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdf doesn''t contain a valid partition table Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdg doesn''t contain a valid partition table Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk /dev/sdh doesn''t contain a valid partition table Disk /dev/md0: 4000.8 GB, 4000819183616 bytes 2 heads, 4 sectors/track, 976762496 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk /dev/md0 doesn''t contain a valid partition table [root at lustrefour ~]# [root at lustrefour ~]# mdadm --create --assume-clean /dev/md0 --level=6 --chunk=128 --raid-devices=6 /dev/sd[cdefgh] [root at lustrefour ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1] 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU] in: 16674 reads, 16217479 writes; out: 3022788 reads, 32865192 writes 7712698 in raid5d, 8264 out of stripes, 25661224 handle called reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, copied writes: 16115932 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out 0 expanding overlap unused devices: <none> Followed with: [root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat /dev/md0 [root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1 But that is hard to reassemble on the reboot or at least was before I use e2label and label things right. Question how to label the external journal in fstab if at all? Right now only running [root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0 So just raid6 no external journal. [root at lustrefour ~]# cat /etc/fstab LABEL=/ / ext3 defaults 1 1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=ioio-OST0001 /mnt/ost00 lustre defaults,_netdev 0 0 192.168.0.7 at tcp0:/ioio /mnt/ioio lustre defaults,_netdev,noauto 0 0 [root at lustrefour ~]# [root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp Number of Active OST devices : 2 Worst Read OST indx: 0 speed: 38.789337 Best Read OST indx: 1 speed: 40.017201 Read Average: 39.403269 +/- 0.613932 MB/s Worst Write OST indx: 0 speed: 49.227064 Best Write OST indx: 1 speed: 78.673564 Write Average: 63.950314 +/- 14.723250 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 38.789 49.227 105.596 83.206 1 40.017 78.674 102.356 52.063 [root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp Number of Active OST devices : 2 Worst Read OST indx: 0 speed: 38.559620 Best Read OST indx: 1 speed: 40.053787 Read Average: 39.306704 +/- 0.747083 MB/s Worst Write OST indx: 0 speed: 71.623744 Best Write OST indx: 1 speed: 82.764897 Write Average: 77.194320 +/- 5.570577 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 38.560 71.624 26.556 14.297 1 40.054 82.765 25.566 12.372 [root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576 3536+0 records in 3536+0 records out 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s lustreonetwothreefour all have the same for modprobe.conf [root at lustrefour ~]# cat /etc/modprobe.conf alias eth0 e1000 alias eth1 e1000 alias scsi_hostadapter pata_marvell alias scsi_hostadapter1 ata_piix options lnet networks=tcp alias eth2 sky2 alias eth3 sky2 alias eth4 sky2 alias eth5 sky2 alias bond0 bonding options bonding miimon=100 mode=4 [root at lustrefour ~]# When do the same from all clients I can watch ./usr/bin/gnome-system-monitor and the send and recieve from the various nodes reaches a 209 MiB/s plateau? Uggh -----Inline Attachment Follows----- _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org</mc/compose?to=Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090128/de86b0a3/attachment-0001.html
Arden, we also use dual channel gigE (bond0) and in my tests found that this works best: options bonding miimon=100 mode=802.3ad xmit_hash_policy=layer3+4 This allows us to get roughly 250 MB/s transfers. Here is the iozone command I used: iozone -t1 -i0 -il -r4m -s2g You will not get anymore performance unless you move to Infiniband or another interconnect. Jeffrey Alan Bennett wrote:> Hi Arden, > > Are you obtaining more than 100 MB/sec from one client to one OST? Given > that you are using 802.3ad link aggregation, it will determine the > physical NIC by the other party''s MAC address. So having multiple OST and > multiple clients will improve the chances of using more than one NIC of > the bonding. > > What is the maximum performance you obtain on the client with two 1GbE? > > jeff > > > > > ________________________________ > From: lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Arden Wiebe > Sent: Sunday, January 25, 2009 12:08 AM > To: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0 > > So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make > 400 MiB/s or this is not how to calculate throughput? I will eventually > plug the right sequence into iozone to measure it. > >>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png >> ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png > > --- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote: > > From: Arden Wiebe <albert682 at yahoo.com> > Subject: [Lustre-discuss] Plateau around 200MiB/s bond0 > To: lustre-discuss at lists.lustre.org > Date: Saturday, January 24, 2009, 6:04 PM > > 1-2948-SFP Plus Baseline 3Com Switch > 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 > 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 > 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6 > 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5) > 1-CLIENT bond0(eth0,eth1) > 1-CLIENT eth0 > 1-CLIENT eth0 > > I fail so far creating external journal for MDT, MGS and OSSx2. How to > add the external journal to /etc/fstab specifically the output of e2label > /dev/sdb followed by what options for fstab? > > [root at lustreone ~]# cat /proc/fs/lustre/devices > 0 UP mgs MGS MGS 17 > 1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5 > 2 UP lov ioio-clilov-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 4 > 3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 > 4 UP osc ioio-OST0000-osc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 > 5 UP osc ioio-OST0001-osc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 > [root at lustreone ~]# lfs df -h > UUID bytes Used Available Use% Mounted on > ioio-MDT0000_UUID 815.0G 534.0M 767.9G 0% /mnt/ioio[MDT:0] > ioio-OST0000_UUID 3.6T 28.4G 3.4T 0% /mnt/ioio[OST:0] > ioio-OST0001_UUID 3.6T 18.0G 3.4T 0% /mnt/ioio[OST:1] > > filesystem summary: 7.2T 46.4G 6.8T 0% /mnt/ioio > > [root at lustreone ~]# cat /proc/net/bonding/bond0 > Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) > > Bonding Mode: IEEE 802.3ad Dynamic link aggregation > Transmit Hash Policy: layer2 (0) > MII Status: up > MII Polling Interval (ms): 100 > Up Delay (ms): 0 > Down Delay (ms): 0 > > 802.3ad info > LACP rate: slow > Active Aggregator Info: > Aggregator ID: 1 > Number of ports: 1 > Actor Key: 17 > Partner Key: 1 > Partner Mac Address: 00:00:00:00:00:00 > > Slave Interface: eth0 > MII Status: up > Link Failure Count: 1 > Permanent HW addr: 00:1b:21:28:77:db > Aggregator ID: 1 > > Slave Interface: eth1 > MII Status: up > Link Failure Count: 1 > Permanent HW addr: 00:1b:21:28:77:6c > Aggregator ID: 2 > > Slave Interface: eth3 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:94 > Aggregator ID: 3 > > Slave Interface: eth2 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:93 > Aggregator ID: 4 > > Slave Interface: eth4 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:95 > Aggregator ID: 5 > > Slave Interface: eth5 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:96 > Aggregator ID: 6 > [root at lustreone ~]# cat /proc/mdstat > Personalities : [raid1] > md0 : active raid1 sdb[0] sdc[1] > 976762496 blocks [2/2] [UU] > > unused devices: <none> > [root at lustreone ~]# cat /etc/fstab > LABEL=/ / ext3 defaults 1 > 1 > tmpfs /dev/shm tmpfs defaults 0 > 0 > devpts /dev/pts devpts gid=5,mode=620 0 > 0 > sysfs /sys sysfs defaults 0 > 0 > proc /proc proc defaults 0 > 0 > LABEL=MGS /mnt/mgs lustre defaults,_netdev 0 > 0 > 192.168.0.7 at tcp0:/ioio /mnt/ioio lustre > defaults,_netdev,noauto 0 0 > > [root at lustreone ~]# ifconfig > bond0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB > inet addr:192.168.0.7 Bcast:192.168.0.255 Mask:255.255.255.0 > inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link > UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1 > RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:12376680079 (11.5 GiB) TX bytes:34438742885 (32.0 GiB) > > eth0 Link encap:Ethernet HWaddr 00:1B:21:28:77:DB > inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link > UP BROADCAST RUNNING SLAVE MULTICAST MTU:9000 Metric:1 > RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:12290700380 (11.4 GiB) TX bytes:34438581771 (32.0 GiB) > Base address:0xec00 Memory:febe0000-fec00000 > >>From what I have read not having an external journal configured for the >> OST''s is a sure recipie for slowness which I would rather not have >> considering the goal is around 350MiB/s or more which should be >> obtainable. > > Here is how I formated the raid6 device on both OSS''s that have identical > [root at lustrefour ~]# fdisk -l > > Disk /dev/sda: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Device Boot Start End Blocks Id System > /dev/sda1 * 1 121601 976760001 83 Linux > > Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdb doesn''t contain a valid partition table > > Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdc doesn''t contain a valid partition table > > Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdd doesn''t contain a valid partition table > > Disk /dev/sde: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sde doesn''t contain a valid partition table > > Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdf doesn''t contain a valid partition table > > Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdg doesn''t contain a valid partition table > > Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdh doesn''t contain a valid partition table > > Disk /dev/md0: 4000.8 GB, 4000819183616 bytes > 2 heads, 4 sectors/track, 976762496 cylinders > Units = cylinders of 8 * 512 = 4096 bytes > > Disk /dev/md0 doesn''t contain a valid partition table > [root at lustrefour ~]# > > [root at lustrefour ~]# mdadm --create --assume-clean /dev/md0 --level=6 > --chunk=128 --raid-devices=6 /dev/sd[cdefgh] > [root at lustrefour ~]# cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1] > 3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU] > in: 16674 reads, 16217479 writes; out: 3022788 reads, > 32865192 writes > 7712698 in raid5d, 8264 out of stripes, 25661224 handle > called > reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, > copied writes: 16115932 > 0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out > 0 expanding overlap > > > unused devices: <none> > > Followed with: > > [root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio > --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat > /dev/md0 > > [root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1 > > But that is hard to reassemble on the reboot or at least was before I use > e2label and label things right. Question how to label the external > journal in fstab if at all? Right now only running > > [root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost > --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0 > > So just raid6 no external journal. > > [root at lustrefour ~]# cat /etc/fstab > LABEL=/ / ext3 defaults 1 > 1 > tmpfs /dev/shm tmpfs defaults 0 > 0 > devpts /dev/pts devpts gid=5,mode=620 0 > 0 > sysfs /sys sysfs defaults 0 > 0 > proc /proc proc defaults 0 > 0 > LABEL=ioio-OST0001 /mnt/ost00 lustre defaults,_netdev 0 > 0 > 192.168.0.7 at tcp0:/ioio /mnt/ioio lustre > defaults,_netdev,noauto 0 0 > > [root at lustrefour ~]# > > > [root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio > ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp > Number of Active OST devices : 2 > Worst Read OST indx: 0 speed: 38.789337 > Best Read OST indx: 1 speed: 40.017201 > Read Average: 39.403269 +/- 0.613932 MB/s > Worst Write OST indx: 0 speed: 49.227064 > Best Write OST indx: 1 speed: 78.673564 > Write Average: 63.950314 +/- 14.723250 MB/s > Ost# Read(MB/s) Write(MB/s) Read-time Write-time > ---------------------------------------------------- > 0 38.789 49.227 105.596 83.206 > 1 40.017 78.674 102.356 52.063 > [root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio > ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp > Number of Active OST devices : 2 > Worst Read OST indx: 0 speed: 38.559620 > Best Read OST indx: 1 speed: 40.053787 > Read Average: 39.306704 +/- 0.747083 MB/s > Worst Write OST indx: 0 speed: 71.623744 > Best Write OST indx: 1 speed: 82.764897 > Write Average: 77.194320 +/- 5.570577 MB/s > Ost# Read(MB/s) Write(MB/s) Read-time Write-time > ---------------------------------------------------- > 0 38.560 71.624 26.556 14.297 > 1 40.054 82.765 25.566 12.372 > [root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576 > 3536+0 records in > 3536+0 records out > 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s > > lustreonetwothreefour all have the same for modprobe.conf > > [root at lustrefour ~]# cat /etc/modprobe.conf > alias eth0 e1000 > alias eth1 e1000 > alias scsi_hostadapter pata_marvell > alias scsi_hostadapter1 ata_piix > options lnet networks=tcp > alias eth2 sky2 > alias eth3 sky2 > alias eth4 sky2 > alias eth5 sky2 > alias bond0 bonding > options bonding miimon=100 mode=4 > [root at lustrefour ~]# > > When do the same from all clients I can watch > ./usr/bin/gnome-system-monitor and the send and recieve from the various > nodes reaches a 209 MiB/s plateau? Uggh > > > > -----Inline Attachment Follows----- > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org</mc/compose?to=Lustre-discuss at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Jeremy Mann jeremy at biochem.uthscsa.edu University of Texas Health Science Center Bioinformatics Core Facility http://www.bioinformatics.uthscsa.edu Phone: (210) 567-2672
I ran this on my 6GigE Bond0 MGS Client. I had to go back and cd to the mounted lustre directory and filesystem. [root at lustreone ~]# cd /mnt/ioio [root at lustreone ioio]# iozone -t1 -i0 -il -r4m -s2g Record Size 4096 KB File size set to 2097152 KB Command line used: iozone -t1 -i0 -il -r4m -s2g Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 1 process Each process writes a 2097152 Kbyte file in 4096 Kbyte records Children see throughput for 1 initial writers = 106916.81 KB/sec Parent sees throughput for 1 initial writers = 105244.22 KB/sec Min throughput per process = 106916.81 KB/sec Max throughput per process = 106916.81 KB/sec Avg throughput per process = 106916.81 KB/sec Min xfer = 2097152.00 KB Children see throughput for 1 rewriters = 106882.15 KB/sec Parent sees throughput for 1 rewriters = 105215.34 KB/sec Min throughput per process = 106882.15 KB/sec Max throughput per process = 106882.15 KB/sec Avg throughput per process = 106882.15 KB/sec Min xfer = 2097152.00 KB I ran this to match the physical ram in the MGS Client. [root at lustreone ioio]# iozone -t1 -i0 -il -r4m -s8g Run began: Wed Jan 28 17:33:53 2009 Record Size 4096 KB File size set to 8388608 KB Command line used: iozone -t1 -i0 -il -r4m -s8g Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 1 process Each process writes a 8388608 Kbyte file in 4096 Kbyte records Children see throughput for 1 initial writers = 100817.04 KB/sec Parent sees throughput for 1 initial writers = 100420.04 KB/sec Min throughput per process = 100817.04 KB/sec Max throughput per process = 100817.04 KB/sec Avg throughput per process = 100817.04 KB/sec Min xfer = 8388608.00 KB Children see throughput for 1 rewriters = 100884.15 KB/sec Parent sees throughput for 1 rewriters = 100487.30 KB/sec Min throughput per process = 100884.15 KB/sec Max throughput per process = 100884.15 KB/sec Avg throughput per process = 100884.15 KB/sec Min xfer = 8388608.00 KB Then I ran this to match my processors and my physical ram to the iozone results by increasing -t1 to -t4. Subsequent test of -t6 prove redundant. [root at lustreone ioio]# iozone -t4 -i0 -il -r4m -s8g Run began: Wed Jan 28 17:37:33 2009 Record Size 4096 KB File size set to 8388608 KB Command line used: iozone -t4 -i0 -il -r4m -s8g Output is in Kbytes/sec Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. Throughput test with 4 processes Each process writes a 8388608 Kbyte file in 4096 Kbyte records Children see throughput for 4 initial writers = 206173.77 KB/sec Parent sees throughput for 4 initial writers = 191062.04 KB/sec Min throughput per process = 48302.41 KB/sec Max throughput per process = 54266.61 KB/sec Avg throughput per process = 51543.44 KB/sec Min xfer = 7467008.00 KB Children see throughput for 4 rewriters = 206216.61 KB/sec Parent sees throughput for 4 rewriters = 205358.90 KB/sec Min throughput per process = 50336.13 KB/sec Max throughput per process = 53059.13 KB/sec Avg throughput per process = 51554.15 KB/sec Min xfer = 7958528.00 KB With screens at http://ioio.ca/iozone/MGSClient/images.html that clearly show a large jump in smooth stable network activity from the 200MiB/s to the 400MiB/s range. If one were to have more processors would that increase maximum throughput? Does the number of GigE interfaces scale to the number of processors? Give 6GigE bond0 can I test in any other way to increase the 412MiB/s plateau? How do I best interpret the above results? --- On Wed, 1/28/09, Jeremy Mann <jeremy at biochem.uthscsa.edu> wrote: From: Jeremy Mann <jeremy at biochem.uthscsa.edu> Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0 To: "Arden Wiebe" <albert682 at yahoo.com> Cc: "lustre-discuss at lists.lustre.org" <lustre-discuss at lists.lustre.org> Date: Wednesday, January 28, 2009, 1:56 PM Arden, we also use dual channel gigE (bond0) and in my tests found that this works best: options bonding miimon=100 mode=802.3ad xmit_hash_policy=layer3+4 This allows us to get roughly 250 MB/s transfers. Here is the iozone command I used: iozone -t1 -i0 -il -r4m -s2g You will not get anymore performance unless you move to Infiniband or another interconnect. Jeffrey Alan Bennett wrote:> Hi Arden, > > Are you obtaining more than 100 MB/sec from one client to one OST? Given > that you are using 802.3ad link aggregation, it will determine the > physical NIC by the other party''s MAC address. So having multiple OST and > multiple clients will improve the chances of using more than one NIC of > the bonding. > > What is the maximum performance you obtain on the client with two 1GbE? > > jeff > > > > > ________________________________ > From: lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Arden Wiebe > Sent: Sunday, January 25, 2009 12:08 AM > To: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Plateau around 200MiB/s bond0 > > So if one OST gets 200MiB/s and another OST gets 200MiB/s does that make > 400 MiB/s or this is not how to calculate throughput?? I will eventually > plug the right sequence into iozone to measure it. > >>From my perspective it looks like ioio.ca/ioio.jpg ioio.ca/lustreone.png >> ioio.ca/lustretwo.png ioio.ca/lustrethree.png ioio.ca/lustrefour.png > > --- On Sat, 1/24/09, Arden Wiebe <albert682 at yahoo.com> wrote: > > From: Arden Wiebe <albert682 at yahoo.com> > Subject: [Lustre-discuss] Plateau around 200MiB/s bond0 > To: lustre-discuss at lists.lustre.org > Date: Saturday, January 24, 2009, 6:04 PM > > 1-2948-SFP Plus Baseline 3Com Switch > 1-MGS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 > 1-MDT bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid1 > 2-OSS bond0(eth0,eth1,eth2,eth3,eth4,eth5) raid6 > 1-MGS-CLIENT bond0(eth0,eth1,eth2,eth3,eth4,eth5) > 1-CLIENT bond0(eth0,eth1) > 1-CLIENT eth0 > 1-CLIENT eth0 > > I fail so far creating external journal for MDT, MGS and OSSx2.? How to > add the external journal to /etc/fstab specifically the output of e2label > /dev/sdb followed by what options for fstab? > > [root at lustreone ~]# cat /proc/fs/lustre/devices >???0 UP mgs MGS MGS 17 >???1 UP mgc MGC192.168.0.7 at tcp 876c20af-aaec-1da0-5486-1fc61ec8cd15 5 >???2 UP lov ioio-clilov-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 4 >???3 UP mdc ioio-MDT0000-mdc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 >???4 UP osc ioio-OST0000-osc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 >???5 UP osc ioio-OST0001-osc-ffff810209363c00 > 7307490a-4a12-4e8c-56ea-448e030a82e4 5 > [root at lustreone ~]# lfs df -h > UUID? ? ? ? ? ? ? ? ? ???bytes? ? ? Used Available? Use% Mounted on > ioio-MDT0000_UUID? ? ???815.0G? ? 534.0M? ? 767.9G? ? 0% /mnt/ioio[MDT:0] > ioio-OST0000_UUID? ? ? ???3.6T? ???28.4G? ? ? 3.4T? ? 0% /mnt/ioio[OST:0] > ioio-OST0001_UUID? ? ? ???3.6T? ???18.0G? ? ? 3.4T? ? 0% /mnt/ioio[OST:1] > > filesystem summary:? ? ???7.2T? ???46.4G? ? ? 6.8T? ? 0% /mnt/ioio > > [root at lustreone ~]# cat /proc/net/bonding/bond0 > Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008) > > Bonding Mode: IEEE 802.3ad Dynamic link aggregation > Transmit Hash Policy: layer2 (0) > MII Status: up > MII Polling Interval (ms): 100 > Up Delay (ms): 0 > Down Delay (ms): 0 > > 802.3ad info > LACP rate: slow > Active Aggregator Info: >? ? ? ???Aggregator ID: 1 >? ? ? ???Number of ports: 1 >? ? ? ???Actor Key: 17 >? ? ? ???Partner Key: 1 >? ? ? ???Partner Mac Address: 00:00:00:00:00:00 > > Slave Interface: eth0 > MII Status: up > Link Failure Count: 1 > Permanent HW addr: 00:1b:21:28:77:db > Aggregator ID: 1 > > Slave Interface: eth1 > MII Status: up > Link Failure Count: 1 > Permanent HW addr: 00:1b:21:28:77:6c > Aggregator ID: 2 > > Slave Interface: eth3 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:94 > Aggregator ID: 3 > > Slave Interface: eth2 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:93 > Aggregator ID: 4 > > Slave Interface: eth4 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:95 > Aggregator ID: 5 > > Slave Interface: eth5 > MII Status: up > Link Failure Count: 0 > Permanent HW addr: 00:22:15:06:3a:96 > Aggregator ID: 6 > [root at lustreone ~]# cat /proc/mdstat > Personalities : [raid1] > md0 : active raid1 sdb[0] sdc[1] >? ? ???976762496 blocks [2/2] [UU] > > unused devices: <none> > [root at lustreone ~]# cat /etc/fstab > LABEL=/? ? ? ? ? ? ? ???/? ? ? ? ? ? ? ? ? ? ???ext3? ? defaults? ? ? ? 1 > 1 > tmpfs? ? ? ? ? ? ? ? ???/dev/shm? ? ? ? ? ? ? ? tmpfs???defaults? ? ? ? 0 > 0 > devpts? ? ? ? ? ? ? ? ? /dev/pts? ? ? ? ? ? ? ? devpts? gid=5,mode=620? 0 > 0 > sysfs? ? ? ? ? ? ? ? ???/sys? ? ? ? ? ? ? ? ? ? sysfs???defaults? ? ? ? 0 > 0 > proc? ? ? ? ? ? ? ? ? ? /proc? ? ? ? ? ? ? ? ???proc? ? defaults? ? ? ? 0 > 0 > LABEL=MGS? ? ? ? ? ? ???/mnt/mgs? ? ? ? ? ? ? ? lustre? defaults,_netdev 0 > 0 > 192.168.0.7 at tcp0:/ioio? /mnt/ioio? ? ? ? ? ? ???lustre > defaults,_netdev,noauto 0 0 > > [root at lustreone ~]# ifconfig > bond0? ???Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB >? ? ? ? ???inet addr:192.168.0.7? Bcast:192.168.0.255? Mask:255.255.255.0 >? ? ? ? ???inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link >? ? ? ? ???UP BROADCAST RUNNING MASTER MULTICAST? MTU:9000? Metric:1 >? ? ? ? ???RX packets:5457486 errors:0 dropped:0 overruns:0 frame:0 >? ? ? ? ???TX packets:4665580 errors:0 dropped:0 overruns:0 carrier:0 >? ? ? ? ???collisions:0 txqueuelen:0 >? ? ? ? ???RX bytes:12376680079 (11.5 GiB)? TX bytes:34438742885 (32.0 GiB) > > eth0? ? ? Link encap:Ethernet? HWaddr 00:1B:21:28:77:DB >? ? ? ? ???inet6 addr: fe80::21b:21ff:fe28:77db/64 Scope:Link >? ? ? ? ???UP BROADCAST RUNNING SLAVE MULTICAST? MTU:9000? Metric:1 >? ? ? ? ???RX packets:3808615 errors:0 dropped:0 overruns:0 frame:0 >? ? ? ? ???TX packets:4664270 errors:0 dropped:0 overruns:0 carrier:0 >? ? ? ? ???collisions:0 txqueuelen:1000 >? ? ? ? ???RX bytes:12290700380 (11.4 GiB)? TX bytes:34438581771 (32.0 GiB) >? ? ? ? ???Base address:0xec00 Memory:febe0000-fec00000 > >>From what I have read not having an external journal configured for the >> OST''s is a sure recipie for slowness which I would rather not have >> considering the goal is around 350MiB/s or more which should be >> obtainable. > > Here is how I formated the raid6 device on both OSS''s that have identical > [root at lustrefour ~]# fdisk -l > > Disk /dev/sda: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > >? ? Device Boot? ? ? Start? ? ? ???End? ? ? Blocks???Id? System > /dev/sda1???*? ? ? ? ???1? ? ? 121601???976760001???83? Linux > > Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdb doesn''t contain a valid partition table > > Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdc doesn''t contain a valid partition table > > Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdd doesn''t contain a valid partition table > > Disk /dev/sde: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sde doesn''t contain a valid partition table > > Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdf doesn''t contain a valid partition table > > Disk /dev/sdg: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdg doesn''t contain a valid partition table > > Disk /dev/sdh: 1000.2 GB, 1000204886016 bytes > 255 heads, 63 sectors/track, 121601 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/sdh doesn''t contain a valid partition table > > Disk /dev/md0: 4000.8 GB, 4000819183616 bytes > 2 heads, 4 sectors/track, 976762496 cylinders > Units = cylinders of 8 * 512 = 4096 bytes > > Disk /dev/md0 doesn''t contain a valid partition table > [root at lustrefour ~]# > > [root at lustrefour ~]#? mdadm --create --assume-clean /dev/md0 --level=6 > --chunk=128 --raid-devices=6 /dev/sd[cdefgh] > [root at lustrefour ~]# cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid6 sdc[0] sdh[5] sdg[4] sdf[3] sde[2] sdd[1] >? ? ???3907049984 blocks level 6, 128k chunk, algorithm 2 [6/6] [UUUUUU] >? ? ? ? ? ? ? ???in: 16674 reads, 16217479 writes; out: 3022788 reads, > 32865192 writes >? ? ? ? ? ? ? ???7712698 in raid5d, 8264 out of stripes, 25661224 handle > called >? ? ? ? ? ? ? ???reads: 0 for rmw, 1710975 for rcw. zcopy writes: 4864584, > copied writes: 16115932 >? ? ? ? ? ? ? ???0 delayed, 0 bit delayed, 0 active, queues: 0 in, 0 out >? ? ? ? ? ? ? ???0 expanding overlap > > > unused devices: <none> > > Followed with: > > [root at lustrefour ~]# mkfs.lustre --ost --fsname=ioio > --mgsnode=192.168.0.7 at tcp0 --mkfsoptions="-J device=/dev/sdb1" --reformat > /dev/md0 > > [root at lustrefour ~]# mke2fs -b 4096 -O journal_dev /dev/sdb1 > > But that is hard to reassemble on the reboot or at least was before I use > e2label and label things right.? Question how to label the external > journal in fstab if at all?? Right now only running > > [root at lustrefour ~]# mkfs.lustre --fsname=ioio --ost > --mgsnode=192.168.0.7 at tcp0 --reformat /dev/md0 > > So just raid6 no external journal. > > [root at lustrefour ~]# cat /etc/fstab > LABEL=/? ? ? ? ? ? ? ???/? ? ? ? ? ? ? ? ? ? ???ext3? ? defaults? ? ? ? 1 > 1 > tmpfs? ? ? ? ? ? ? ? ???/dev/shm? ? ? ? ? ? ? ? tmpfs???defaults? ? ? ? 0 > 0 > devpts? ? ? ? ? ? ? ? ? /dev/pts? ? ? ? ? ? ? ? devpts? gid=5,mode=620? 0 > 0 > sysfs? ? ? ? ? ? ? ? ???/sys? ? ? ? ? ? ? ? ? ? sysfs???defaults? ? ? ? 0 > 0 > proc? ? ? ? ? ? ? ? ? ? /proc? ? ? ? ? ? ? ? ???proc? ? defaults? ? ? ? 0 > 0 > LABEL=ioio-OST0001? ? ? /mnt/ost00? ? ? ? ? ? ? lustre? defaults,_netdev 0 > 0 > 192.168.0.7 at tcp0:/ioio? /mnt/ioio? ? ? ? ? ? ???lustre > defaults,_netdev,noauto 0 0 > > [root at lustrefour ~]# > > > [root at lustreone bin]# ./ost-survey -s 4096 /mnt/ioio > ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp > Number of Active OST devices : 2 > Worst? Read OST indx: 0 speed: 38.789337 > Best???Read OST indx: 1 speed: 40.017201 > Read Average: 39.403269 +/- 0.613932 MB/s > Worst? Write OST indx: 0 speed: 49.227064 > Best???Write OST indx: 1 speed: 78.673564 > Write Average: 63.950314 +/- 14.723250 MB/s > Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time > ---------------------------------------------------- > 0? ???38.789? ? ???49.227? ? ? ? 105.596? ? ? 83.206 > 1? ???40.017? ? ???78.674? ? ? ? 102.356? ? ? 52.063 > [root at lustreone bin]# ./ost-survey -s 1024 /mnt/ioio > ./ost-survey: 01/24/09 OST speed survey on /mnt/ioio from 192.168.0.7 at tcp > Number of Active OST devices : 2 > Worst? Read OST indx: 0 speed: 38.559620 > Best???Read OST indx: 1 speed: 40.053787 > Read Average: 39.306704 +/- 0.747083 MB/s > Worst? Write OST indx: 0 speed: 71.623744 > Best???Write OST indx: 1 speed: 82.764897 > Write Average: 77.194320 +/- 5.570577 MB/s > Ost#? Read(MB/s)? Write(MB/s)? Read-time? Write-time > ---------------------------------------------------- > 0? ???38.560? ? ???71.624? ? ? ? 26.556? ? ? 14.297 > 1? ???40.054? ? ???82.765? ? ? ? 25.566? ? ? 12.372 > [root at lustreone bin]# dd of=/mnt/ioio/bigfileMGS if=/dev/zero bs=1048576 > 3536+0 records in > 3536+0 records out > 3707764736 bytes (3.7 GB) copied, 38.4775 seconds, 96.4 MB/s > > lustreonetwothreefour all have the same for modprobe.conf > > [root at lustrefour ~]# cat /etc/modprobe.conf > alias eth0 e1000 > alias eth1 e1000 > alias scsi_hostadapter pata_marvell > alias scsi_hostadapter1 ata_piix > options lnet networks=tcp > alias eth2 sky2 > alias eth3 sky2 > alias eth4 sky2 > alias eth5 sky2 > alias bond0 bonding > options bonding miimon=100 mode=4 > [root at lustrefour ~]# > > When do the same from all clients I can watch > ./usr/bin/gnome-system-monitor and the send and recieve from the various > nodes reaches a 209 MiB/s plateau?? Uggh > > > > -----Inline Attachment Follows----- > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org</mc/compose?to=Lustre-discuss at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Jeremy Mann jeremy at biochem.uthscsa.edu University of Texas Health Science Center Bioinformatics Core Facility http://www.bioinformatics.uthscsa.edu Phone: (210) 567-2672
Arden Wiebe wrote:> With screens at http://ioio.ca/iozone/MGSClient/images.html that clearly > show a large jump in smooth stable network activity from the 200MiB/s to > the 400MiB/s range. > > If one were to have more processors would that increase maximum > throughput? Does the number of GigE interfaces scale to the number of > processors? Give 6GigE bond0 can I test in any other way to increase the > 412MiB/s plateau? How do I best interpret the above results?In your case, (as was in ours) I would assume the limiting factor are the speeds of the drives on the OSTs. -- Jeremy Mann jeremy at biochem.uthscsa.edu University of Texas Health Science Center Bioinformatics Core Facility http://www.bioinformatics.uthscsa.edu Phone: (210) 567-2672