lior amar
2011-Jul-06 20:04 UTC
[Lustre-discuss] Fwd: Lustre performance issue (obdfilter_survey
Hi, I am installing a Lustre system and I wanted to measure the OSS performance. I used the obdfilter_survey and got very low performance for low thread numbers when using the case=network option System Configuration: * Lustre 1.8.6-wc (compiled from the whamcloud git) * Centos 5.6 * Infiniband (mellanox cards) open ib from centos 5.6 * OSS - 2 quad core E5620 CPUS * OSS - memory 48GB * LSI 2965 raid card with 18 disks in raid 6 (16 data + 2). Raw performance are good both when testing the block device or over a file system with Bonnie++ * OSS uses ext4 and mkfs parameters were set to reflect the stripe size .. -E stride =... The performance test I did: 1) obdfilter_survey case=disk - OSS performance is ok (similar to raw disk performance) - In the case of 1 thread and one object getting 966MB/sec 2) obdfilter_survey case=network - OSS performance is bad for low thread numbers and get better as the number of threads increases. For the 1 thread one object getting 88MB/sec 3) obdfilter_survey case=netdisk -- Same as network case 4) When running ost_survey I am getting also low performance: Read = 156 MB/sec Write = ~350MB/sec 5) Running the lnet_self test I get much higher numbers Numbers obtained with concurrency = 1 [LNet Rates of servers] [R] Avg: 3556 RPC/s Min: 3556 RPC/s Max: 3556 RPC/s [W] Avg: 4742 RPC/s Min: 4742 RPC/s Max: 4742 RPC/s [LNet Bandwidth of servers] [R] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s [W] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s Any Ideas why a single thread over network obtain 88MB/sec while the same test conducted local obtained 966MB/sec?? What else should I test/read/try ?? 10x Below are the actual numbers: ===== obdfilter_survey case = disk =====Wed Jul 6 13:24:57 IDT 2011 Obdfilter-survey for case=disk from oss1 ost 1 sz 16777216K rsz 1024K obj 1 thr 1 write 966.90 [ 644.40,1030.02] rewrite 1286.23 [1300.78,1315.77] read 8474.33 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 2 write 1577.95 [1533.57,1681.43] rewrite 1548.29 [1244.83,1718.42] read 11003.26 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 4 write 1465.68 [1354.73,1600.50] rewrite 1484.98 [1271.54,1584.52] read 16464.13 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 8 write 1267.39 [ 797.25,1476.48] rewrite 1350.28 [1283.80,1387.70] read 15353.69 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 16 write 1295.35 [1266.82,1408.70] rewrite 1332.59 [1315.61,1429.66] read 15001.67 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 2 write 1467.80 [1472.62,1691.42] rewrite 1218.88 [ 821.23,1338.74] read 13538.41 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 4 write 1561.09 [1521.57,1682.75] rewrite 1183.31 [ 959.10,1372.52] read 15955.31 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 8 write 1498.74 [1543.58,1704.41] rewrite 1116.19 [1001.06,1163.91] read 15523.22 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 16 write 1462.54 [ 985.08,1615.48] rewrite 1244.29 [1100.97,1444.80] read 15174.56 SHORT ost 1 sz 16777216K rsz 1024K obj 4 thr 4 write 1483.42 [1497.88,1648.45] rewrite 1042.92 [ 801.25,1192.69] read 15997.30 SHORT ost 1 sz 16777216K rsz 1024K obj 4 thr 8 write 1494.63 [1458.85,1624.13] rewrite 1041.81 [ 806.25,1183.89] read 15450.18 SHORT ost 1 sz 16777216K rsz 1024K obj 4 thr 16 write 1469.96 [1450.65,1647.45] rewrite 1027.06 [ 645.50,1215.86] read 15543.46 SHORT ost 1 sz 16777216K rsz 1024K obj 8 thr 8 write 1417.93 [1250.85,1520.58] rewrite 1007.45 [ 905.15,1130.82] read 15789.66 SHORT ost 1 sz 16777216K rsz 1024K obj 8 thr 16 write 1324.28 [ 951.87,1518.26] rewrite 986.48 [ 855.21,1079.99] read 15510.70 SHORT ost 1 sz 16777216K rsz 1024K obj 16 thr 16 write 1237.22 [ 989.07,1345.17] rewrite 915.56 [ 749.08,1033.03] read 15415.75 SHORT ============================= ====== obdfilter_survey case = network =======================Wed Jul 6 16:29:38 IDT 2011 Obdfilter-survey for case=network from oss6 ost 1 sz 16777216K rsz 1024K obj 1 thr 1 write 87.99 [ 86.92, 88.92] rewrite 87.98 [ 86.83, 88.92] read 88.09 [ 86.92, 88.92] ost 1 sz 16777216K rsz 1024K obj 1 thr 2 write 175.76 [ 173.84, 176.83] rewrite 175.75 [ 174.84, 176.83] read 172.76 [ 171.67, 174.84] ost 1 sz 16777216K rsz 1024K obj 1 thr 4 write 343.13 [ 327.69, 347.67] rewrite 344.64 [ 342.34, 347.67] read 331.20 [ 327.69, 337.77] ost 1 sz 16777216K rsz 1024K obj 1 thr 8 write 638.44 [ 638.10, 653.39] rewrite 639.07 [ 627.75, 654.74] read 605.36 [ 598.84, 626.71] ost 1 sz 16777216K rsz 1024K obj 1 thr 16 write 1257.67 [1216.88,1424.42] rewrite 1231.61 [1200.67,1316.77] read 1122.70 [1095.04,1187.64] ost 1 sz 16777216K rsz 1024K obj 2 thr 2 write 175.69 [ 174.49, 176.83] rewrite 175.82 [ 174.79, 176.83] read 172.06 [ 169.67, 173.84] ost 1 sz 16777216K rsz 1024K obj 2 thr 4 write 345.38 [ 343.68, 348.67] rewrite 344.40 [ 342.66, 348.32] read 331.19 [ 328.62, 337.68] ost 1 sz 16777216K rsz 1024K obj 2 thr 8 write 638.29 [ 625.16, 676.37] rewrite 632.57 [ 619.43, 672.38] read 604.72 [ 601.69, 625.41] ost 1 sz 16777216K rsz 1024K obj 2 thr 16 write 1247.19 [1212.38,1377.73] rewrite 1265.31 [1220.56,1396.71] read 1127.87 [1099.97,1187.67] ost 1 sz 16777216K rsz 1024K obj 4 thr 4 write 343.96 [ 341.68, 347.67] rewrite 337.98 [ 324.70, 348.67] read 332.27 [ 327.69, 337.68] ost 1 sz 16777216K rsz 1024K obj 4 thr 8 write 637.15 [ 626.89, 673.38] rewrite 636.47 [ 624.42, 675.37] read 605.98 [ 604.43, 620.64] ost 1 sz 16777216K rsz 1024K obj 4 thr 16 write 1260.31 [1198.30,1419.70] rewrite 1289.95 [1235.05,1486.35] read 1119.08 [1081.16,1159.77] ost 1 sz 16777216K rsz 1024K obj 8 thr 8 write 636.82 [ 628.41, 678.37] rewrite 634.36 [ 622.41, 671.38] read 607.59 [ 601.23, 627.79] ost 1 sz 16777216K rsz 1024K obj 8 thr 16 write 1257.81 [1207.65,1405.00] rewrite 1267.45 [1233.43,1372.72] read 1125.58 [1114.65,1163.67] ost 1 sz 16777216K rsz 1024K obj 16 thr 16 write 1247.34 [1215.70,1418.69] rewrite 1249.45 [1194.92,1372.73] read 1118.77 [1082.07,1171.94] =========================== ======= OST Survey =========ost-survey -s 10000 Worst Read OST indx: 0 speed: 156.223264 Best Read OST indx: 4 speed: 172.706590 Read Average: 163.681117 +/- 5.299526 MB/s Worst Write OST indx: 4 speed: 307.893338 Best Write OST indx: 2 speed: 370.923486 Write Average: 346.664793 +/- 20.849197 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 156.223 354.215 64.011 28.231 1 164.394 349.652 60.830 28.600 2 162.195 370.923 61.654 26.960 3 162.887 350.640 61.392 28.519 4 172.707 307.893 57.902 32.479 10x --lior -- ----------------------oo--o(:-:)o--oo---------------- Lior Amar, Ph.D. Cluster Logic Ltd --> The Art of HPC www.clusterlogic.net ---------------------------------------------------------- -- ----------------------oo--o(:-:)o--oo---------------- Lior Amar, Ph.D. Cluster Logic Ltd --> The Art of HPC www.clusterlogic.net ---------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110706/2b4ae202/attachment-0001.html
Cliff White
2011-Jul-06 20:37 UTC
[Lustre-discuss] Fwd: Lustre performance issue (obdfilter_survey
The case=network part of obdfilter_survey has really been replaced by lnet_selftest. I don''t think it''s been maintained in awhile. It would be best to repeat the network-only test with lnet_selftest, this is likely an issue with the script. cliffw On Wed, Jul 6, 2011 at 1:04 PM, lior amar <liororama at gmail.com> wrote:> Hi, > > I am installing a Lustre system and I wanted to measure the OSS > performance. > I used the obdfilter_survey and got very low performance for low > thread numbers when using the case=network option > > > System Configuration: > * Lustre 1.8.6-wc (compiled from the whamcloud git) > * Centos 5.6 > * Infiniband (mellanox cards) open ib from centos 5.6 > * OSS - 2 quad core E5620 CPUS > * OSS - memory 48GB > * LSI 2965 raid card with 18 disks in raid 6 (16 data + 2). Raw > performance are good both when testing the block device or over a file > system with Bonnie++ > > * OSS uses ext4 and mkfs parameters were set to reflect the stripe > size .. -E stride =... > > The performance test I did: > > > 1) obdfilter_survey case=disk - > OSS performance is ok (similar to raw disk performance) - > In the case of 1 thread and one object getting 966MB/sec > > 2) obdfilter_survey case=network - > OSS performance is bad for low thread numbers and get better as > the number of threads increases. > For the 1 thread one object getting 88MB/sec > > 3) obdfilter_survey case=netdisk -- Same as network case > > 4) When running ost_survey I am getting also low performance: > Read = 156 MB/sec Write = ~350MB/sec > > 5) Running the lnet_self test I get much higher numbers > Numbers obtained with concurrency = 1 > > [LNet Rates of servers] > [R] Avg: 3556 RPC/s Min: 3556 RPC/s Max: 3556 RPC/s > [W] Avg: 4742 RPC/s Min: 4742 RPC/s Max: 4742 RPC/s > [LNet Bandwidth of servers] > [R] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s > [W] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s > > > > > Any Ideas why a single thread over network obtain 88MB/sec while the same > test conducted local obtained 966MB/sec?? > > What else should I test/read/try ?? > > 10x > > Below are the actual numbers: > > ===== obdfilter_survey case = disk =====> Wed Jul 6 13:24:57 IDT 2011 Obdfilter-survey for case=disk from oss1 > ost 1 sz 16777216K rsz 1024K obj 1 thr 1 write 966.90 > [ 644.40,1030.02] rewrite 1286.23 [1300.78,1315.77] read > 8474.33 SHORT > ost 1 sz 16777216K rsz 1024K obj 1 thr 2 write 1577.95 > [1533.57,1681.43] rewrite 1548.29 [1244.83,1718.42] read > 11003.26 SHORT > ost 1 sz 16777216K rsz 1024K obj 1 thr 4 write 1465.68 > [1354.73,1600.50] rewrite 1484.98 [1271.54,1584.52] read > 16464.13 SHORT > ost 1 sz 16777216K rsz 1024K obj 1 thr 8 write 1267.39 > [ 797.25,1476.48] rewrite 1350.28 [1283.80,1387.70] read > 15353.69 SHORT > ost 1 sz 16777216K rsz 1024K obj 1 thr 16 write 1295.35 > [1266.82,1408.70] rewrite 1332.59 [1315.61,1429.66] read > 15001.67 SHORT > ost 1 sz 16777216K rsz 1024K obj 2 thr 2 write 1467.80 > [1472.62,1691.42] rewrite 1218.88 [ 821.23,1338.74] read > 13538.41 SHORT > ost 1 sz 16777216K rsz 1024K obj 2 thr 4 write 1561.09 > [1521.57,1682.75] rewrite 1183.31 [ 959.10,1372.52] read > 15955.31 SHORT > ost 1 sz 16777216K rsz 1024K obj 2 thr 8 write 1498.74 > [1543.58,1704.41] rewrite 1116.19 [1001.06,1163.91] read > 15523.22 SHORT > ost 1 sz 16777216K rsz 1024K obj 2 thr 16 write 1462.54 > [ 985.08,1615.48] rewrite 1244.29 [1100.97,1444.80] read > 15174.56 SHORT > ost 1 sz 16777216K rsz 1024K obj 4 thr 4 write 1483.42 > [1497.88,1648.45] rewrite 1042.92 [ 801.25,1192.69] read > 15997.30 SHORT > ost 1 sz 16777216K rsz 1024K obj 4 thr 8 write 1494.63 > [1458.85,1624.13] rewrite 1041.81 [ 806.25,1183.89] read > 15450.18 SHORT > ost 1 sz 16777216K rsz 1024K obj 4 thr 16 write 1469.96 > [1450.65,1647.45] rewrite 1027.06 [ 645.50,1215.86] read > 15543.46 SHORT > ost 1 sz 16777216K rsz 1024K obj 8 thr 8 write 1417.93 > [1250.85,1520.58] rewrite 1007.45 [ 905.15,1130.82] read > 15789.66 SHORT > ost 1 sz 16777216K rsz 1024K obj 8 thr 16 write 1324.28 > [ 951.87,1518.26] rewrite 986.48 [ 855.21,1079.99] read > 15510.70 SHORT > ost 1 sz 16777216K rsz 1024K obj 16 thr 16 write 1237.22 > [ 989.07,1345.17] rewrite 915.56 [ 749.08,1033.03] read > 15415.75 SHORT > > =============================> > ====== obdfilter_survey case = network =======================> Wed Jul 6 16:29:38 IDT 2011 Obdfilter-survey for case=network from > oss6 > ost 1 sz 16777216K rsz 1024K obj 1 thr 1 write 87.99 > [ 86.92, 88.92] rewrite 87.98 [ 86.83, 88.92] read 88.09 > [ 86.92, 88.92] > ost 1 sz 16777216K rsz 1024K obj 1 thr 2 write 175.76 > [ 173.84, 176.83] rewrite 175.75 [ 174.84, 176.83] read 172.76 > [ 171.67, 174.84] > ost 1 sz 16777216K rsz 1024K obj 1 thr 4 write 343.13 > [ 327.69, 347.67] rewrite 344.64 [ 342.34, 347.67] read 331.20 > [ 327.69, 337.77] > ost 1 sz 16777216K rsz 1024K obj 1 thr 8 write 638.44 > [ 638.10, 653.39] rewrite 639.07 [ 627.75, 654.74] read 605.36 > [ 598.84, 626.71] > ost 1 sz 16777216K rsz 1024K obj 1 thr 16 write 1257.67 > [1216.88,1424.42] rewrite 1231.61 [1200.67,1316.77] read 1122.70 > [1095.04,1187.64] > ost 1 sz 16777216K rsz 1024K obj 2 thr 2 write 175.69 > [ 174.49, 176.83] rewrite 175.82 [ 174.79, 176.83] read 172.06 > [ 169.67, 173.84] > ost 1 sz 16777216K rsz 1024K obj 2 thr 4 write 345.38 > [ 343.68, 348.67] rewrite 344.40 [ 342.66, 348.32] read 331.19 > [ 328.62, 337.68] > ost 1 sz 16777216K rsz 1024K obj 2 thr 8 write 638.29 > [ 625.16, 676.37] rewrite 632.57 [ 619.43, 672.38] read 604.72 > [ 601.69, 625.41] > ost 1 sz 16777216K rsz 1024K obj 2 thr 16 write 1247.19 > [1212.38,1377.73] rewrite 1265.31 [1220.56,1396.71] read 1127.87 > [1099.97,1187.67] > ost 1 sz 16777216K rsz 1024K obj 4 thr 4 write 343.96 > [ 341.68, 347.67] rewrite 337.98 [ 324.70, 348.67] read 332.27 > [ 327.69, 337.68] > ost 1 sz 16777216K rsz 1024K obj 4 thr 8 write 637.15 > [ 626.89, 673.38] rewrite 636.47 [ 624.42, 675.37] read 605.98 > [ 604.43, 620.64] > ost 1 sz 16777216K rsz 1024K obj 4 thr 16 write 1260.31 > [1198.30,1419.70] rewrite 1289.95 [1235.05,1486.35] read 1119.08 > [1081.16,1159.77] > ost 1 sz 16777216K rsz 1024K obj 8 thr 8 write 636.82 > [ 628.41, 678.37] rewrite 634.36 [ 622.41, 671.38] read 607.59 > [ 601.23, 627.79] > ost 1 sz 16777216K rsz 1024K obj 8 thr 16 write 1257.81 > [1207.65,1405.00] rewrite 1267.45 [1233.43,1372.72] read 1125.58 > [1114.65,1163.67] > ost 1 sz 16777216K rsz 1024K obj 16 thr 16 write 1247.34 > [1215.70,1418.69] rewrite 1249.45 [1194.92,1372.73] read 1118.77 > [1082.07,1171.94] > > ===========================> > ======= OST Survey =========> ost-survey -s 10000 > > > > Worst Read OST indx: 0 speed: 156.223264 > Best Read OST indx: 4 speed: 172.706590 > Read Average: 163.681117 +/- 5.299526 MB/s > Worst Write OST indx: 4 speed: 307.893338 > Best Write OST indx: 2 speed: 370.923486 > Write Average: 346.664793 +/- 20.849197 MB/s > Ost# Read(MB/s) Write(MB/s) Read-time Write-time > ---------------------------------------------------- > 0 156.223 354.215 64.011 28.231 > 1 164.394 349.652 60.830 28.600 > 2 162.195 370.923 61.654 26.960 > 3 162.887 350.640 61.392 28.519 > 4 172.707 307.893 57.902 32.479 > > > > 10x > > --lior > -- > ----------------------oo--o(:-:)o--oo---------------- > Lior Amar, Ph.D. > Cluster Logic Ltd --> The Art of HPC > www.clusterlogic.net > ---------------------------------------------------------- > > > > > -- > ----------------------oo--o(:-:)o--oo---------------- > Lior Amar, Ph.D. > Cluster Logic Ltd --> The Art of HPC > www.clusterlogic.net > ---------------------------------------------------------- > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110706/fcdfea54/attachment.html
Chris Horn
2011-Jul-06 21:30 UTC
[Lustre-discuss] Fwd: Lustre performance issue (obdfilter_survey
FYI, there is some work being done to clean up obdfilter-survey. See https://bugzilla.lustre.org/show_bug.cgi?id=24490 If there was a script issue you might try the patch from that bug to see if you can reproduce. <https://bugzilla.lustre.org/show_bug.cgi?id=24490> Chris Horn On Jul 6, 2011, at 3:37 PM, Cliff White wrote: The case=network part of obdfilter_survey has really been replaced by lnet_selftest. I don''t think it''s been maintained in awhile. It would be best to repeat the network-only test with lnet_selftest, this is likely an issue with the script. cliffw On Wed, Jul 6, 2011 at 1:04 PM, lior amar <liororama at gmail.com<mailto:liororama at gmail.com>> wrote: Hi, I am installing a Lustre system and I wanted to measure the OSS performance. I used the obdfilter_survey and got very low performance for low thread numbers when using the case=network option System Configuration: * Lustre 1.8.6-wc (compiled from the whamcloud git) * Centos 5.6 * Infiniband (mellanox cards) open ib from centos 5.6 * OSS - 2 quad core E5620 CPUS * OSS - memory 48GB * LSI 2965 raid card with 18 disks in raid 6 (16 data + 2). Raw performance are good both when testing the block device or over a file system with Bonnie++ * OSS uses ext4 and mkfs parameters were set to reflect the stripe size .. -E stride =... The performance test I did: 1) obdfilter_survey case=disk - OSS performance is ok (similar to raw disk performance) - In the case of 1 thread and one object getting 966MB/sec 2) obdfilter_survey case=network - OSS performance is bad for low thread numbers and get better as the number of threads increases. For the 1 thread one object getting 88MB/sec 3) obdfilter_survey case=netdisk -- Same as network case 4) When running ost_survey I am getting also low performance: Read = 156 MB/sec Write = ~350MB/sec 5) Running the lnet_self test I get much higher numbers Numbers obtained with concurrency = 1 [LNet Rates of servers] [R] Avg: 3556 RPC/s Min: 3556 RPC/s Max: 3556 RPC/s [W] Avg: 4742 RPC/s Min: 4742 RPC/s Max: 4742 RPC/s [LNet Bandwidth of servers] [R] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s [W] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s Any Ideas why a single thread over network obtain 88MB/sec while the same test conducted local obtained 966MB/sec?? What else should I test/read/try ?? 10x Below are the actual numbers: ===== obdfilter_survey case = disk =====Wed Jul 6 13:24:57 IDT 2011 Obdfilter-survey for case=disk from oss1 ost 1 sz 16777216K rsz 1024K obj 1 thr 1 write 966.90 [ 644.40,1030.02] rewrite 1286.23 [1300.78,1315.77] read 8474.33 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 2 write 1577.95 [1533.57,1681.43] rewrite 1548.29 [1244.83,1718.42] read 11003.26 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 4 write 1465.68 [1354.73,1600.50] rewrite 1484.98 [1271.54,1584.52] read 16464.13 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 8 write 1267.39 [ 797.25,1476.48] rewrite 1350.28 [1283.80,1387.70] read 15353.69 SHORT ost 1 sz 16777216K rsz 1024K obj 1 thr 16 write 1295.35 [1266.82,1408.70] rewrite 1332.59 [1315.61,1429.66] read 15001.67 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 2 write 1467.80 [1472.62,1691.42] rewrite 1218.88 [ 821.23,1338.74] read 13538.41 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 4 write 1561.09 [1521.57,1682.75] rewrite 1183.31 [ 959.10,1372.52] read 15955.31 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 8 write 1498.74 [1543.58,1704.41] rewrite 1116.19 [1001.06,1163.91] read 15523.22 SHORT ost 1 sz 16777216K rsz 1024K obj 2 thr 16 write 1462.54 [ 985.08,1615.48] rewrite 1244.29 [1100.97,1444.80] read 15174.56 SHORT ost 1 sz 16777216K rsz 1024K obj 4 thr 4 write 1483.42 [1497.88,1648.45] rewrite 1042.92 [ 801.25,1192.69] read 15997.30 SHORT ost 1 sz 16777216K rsz 1024K obj 4 thr 8 write 1494.63 [1458.85,1624.13] rewrite 1041.81 [ 806.25,1183.89] read 15450.18 SHORT ost 1 sz 16777216K rsz 1024K obj 4 thr 16 write 1469.96 [1450.65,1647.45] rewrite 1027.06 [ 645.50,1215.86] read 15543.46 SHORT ost 1 sz 16777216K rsz 1024K obj 8 thr 8 write 1417.93 [1250.85,1520.58] rewrite 1007.45 [ 905.15,1130.82] read 15789.66 SHORT ost 1 sz 16777216K rsz 1024K obj 8 thr 16 write 1324.28 [ 951.87,1518.26] rewrite 986.48 [ 855.21,1079.99] read 15510.70 SHORT ost 1 sz 16777216K rsz 1024K obj 16 thr 16 write 1237.22 [ 989.07,1345.17] rewrite 915.56 [ 749.08,1033.03] read 15415.75 SHORT ============================= ====== obdfilter_survey case = network =======================Wed Jul 6 16:29:38 IDT 2011 Obdfilter-survey for case=network from oss6 ost 1 sz 16777216K rsz 1024K obj 1 thr 1 write 87.99 [ 86.92, 88.92] rewrite 87.98 [ 86.83, 88.92] read 88.09 [ 86.92, 88.92] ost 1 sz 16777216K rsz 1024K obj 1 thr 2 write 175.76 [ 173.84, 176.83] rewrite 175.75 [ 174.84, 176.83] read 172.76 [ 171.67, 174.84] ost 1 sz 16777216K rsz 1024K obj 1 thr 4 write 343.13 [ 327.69, 347.67] rewrite 344.64 [ 342.34, 347.67] read 331.20 [ 327.69, 337.77] ost 1 sz 16777216K rsz 1024K obj 1 thr 8 write 638.44 [ 638.10, 653.39] rewrite 639.07 [ 627.75, 654.74] read 605.36 [ 598.84, 626.71] ost 1 sz 16777216K rsz 1024K obj 1 thr 16 write 1257.67 [1216.88,1424.42] rewrite 1231.61 [1200.67,1316.77] read 1122.70 [1095.04,1187.64] ost 1 sz 16777216K rsz 1024K obj 2 thr 2 write 175.69 [ 174.49, 176.83] rewrite 175.82 [ 174.79, 176.83] read 172.06 [ 169.67, 173.84] ost 1 sz 16777216K rsz 1024K obj 2 thr 4 write 345.38 [ 343.68, 348.67] rewrite 344.40 [ 342.66, 348.32] read 331.19 [ 328.62, 337.68] ost 1 sz 16777216K rsz 1024K obj 2 thr 8 write 638.29 [ 625.16, 676.37] rewrite 632.57 [ 619.43, 672.38] read 604.72 [ 601.69, 625.41] ost 1 sz 16777216K rsz 1024K obj 2 thr 16 write 1247.19 [1212.38,1377.73] rewrite 1265.31 [1220.56,1396.71] read 1127.87 [1099.97,1187.67] ost 1 sz 16777216K rsz 1024K obj 4 thr 4 write 343.96 [ 341.68, 347.67] rewrite 337.98 [ 324.70, 348.67] read 332.27 [ 327.69, 337.68] ost 1 sz 16777216K rsz 1024K obj 4 thr 8 write 637.15 [ 626.89, 673.38] rewrite 636.47 [ 624.42, 675.37] read 605.98 [ 604.43, 620.64] ost 1 sz 16777216K rsz 1024K obj 4 thr 16 write 1260.31 [1198.30,1419.70] rewrite 1289.95 [1235.05,1486.35] read 1119.08 [1081.16,1159.77] ost 1 sz 16777216K rsz 1024K obj 8 thr 8 write 636.82 [ 628.41, 678.37] rewrite 634.36 [ 622.41, 671.38] read 607.59 [ 601.23, 627.79] ost 1 sz 16777216K rsz 1024K obj 8 thr 16 write 1257.81 [1207.65,1405.00] rewrite 1267.45 [1233.43,1372.72] read 1125.58 [1114.65,1163.67] ost 1 sz 16777216K rsz 1024K obj 16 thr 16 write 1247.34 [1215.70,1418.69] rewrite 1249.45 [1194.92,1372.73] read 1118.77 [1082.07,1171.94] =========================== ======= OST Survey =========ost-survey -s 10000 Worst Read OST indx: 0 speed: 156.223264 Best Read OST indx: 4 speed: 172.706590 Read Average: 163.681117 +/- 5.299526 MB/s Worst Write OST indx: 4 speed: 307.893338 Best Write OST indx: 2 speed: 370.923486 Write Average: 346.664793 +/- 20.849197 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 156.223 354.215 64.011 28.231 1 164.394 349.652 60.830 28.600 2 162.195 370.923 61.654 26.960 3 162.887 350.640 61.392 28.519 4 172.707 307.893 57.902 32.479 10x --lior -- ----------------------oo--o(:-:)o--oo---------------- Lior Amar, Ph.D. Cluster Logic Ltd --> The Art of HPC www.clusterlogic.net<http://www.clusterlogic.net/> ---------------------------------------------------------- -- ----------------------oo--o(:-:)o--oo---------------- Lior Amar, Ph.D. Cluster Logic Ltd --> The Art of HPC www.clusterlogic.net<http://www.clusterlogic.net/> ---------------------------------------------------------- _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com<http://www.whamcloud.com/> _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org<mailto:Lustre-discuss at lists.lustre.org> http://lists.lustre.org/mailman/listinfo/lustre-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110706/c82b0315/attachment-0001.html
lior amar
2011-Jul-07 05:12 UTC
[Lustre-discuss] Fwd: Lustre performance issue (obdfilter_survey
Hi, First, thanks for your quick replay. On Wed, Jul 6, 2011 at 11:37 PM, Cliff White <cliffw at whamcloud.com> wrote:> The case=network part of obdfilter_survey has really been replaced by > lnet_selftest. > I don''t think it''s been maintained in awhile. > > It would be best to repeat the network-only test with lnet_selftest, this > is likely an issue with > the script. > cliffw > >I used the lnet self test and got reasonable results ------------------ Numbers obtained with concurrency = 1 [LNet Rates of servers] [R] Avg: 3556 RPC/s Min: 3556 RPC/s Max: 3556 RPC/s [W] Avg: 4742 RPC/s Min: 4742 RPC/s Max: 4742 RPC/s [LNet Bandwidth of servers] [R] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s [W] Avg: 1185.72 MB/s Min: 1185.72 MB/s Max: 1185.72 MB/s ------------------- The question is what is the meaning of the concurrency=1 flag. Does it mean a single thread at the client or a sigle thread per core?? My problem is with the case=netdisk that gives me low performance for single thread (the dd case). (as well as with the ost_survey case). Is the case=netdisk a valid test ?? I am trying to isolate the problem and the case=netdisk alows me to avoid accessing the mds (right?) Any Idea?? ----------------------oo--o(:-:)o--oo---------------- Lior Amar, Ph.D. Cluster Logic Ltd --> The Art of HPC www.clusterlogic.net ---------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110707/ba8dc09f/attachment.html