I''m attempting to deploy a new lustre filesystem using lustre 1.8.5, but this is my first stab at incorporating an IB network. I''ve deployed several over tcp using 1.8.4 without issue, so I''m not sure if there is an IB configuration or a 1.8.5 issue here. Any assistance would be appreciated. This new cluster has two parallel networks: gige: 10.27.5.0/23 ib : 10.27.8.0/23 On the lfs servers and clients, lnet is configured as: options lnet networks=o2ib0(ib0),tcp0(ib0) The IB network is routable to 10/8 and clients mount other lustre filesystems using 1.8.4 over tcp. On the MDS (with an intended failover to a secondary) the mgs,mdt filesystem is created with: mkfs.lustre --fsname lfs --mdt --mgs \ --mkfsoptions=''-i 1024 -I 512'' \ --failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \ --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \ /dev/sda However, this mount then fails with: mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign requested address An lctl shows the proper nids: 10.27.9.133 at o2ib 10.27.9.133 at tcp Dmesg shows a parsing error with the o2ib0 nid: LustreError: 159-d: Can''t parse NID ''failover.node=10.27.9.133 at o2ib0'' Lustre: Denying initial registration attempt from nid 10.27.9.133 at o2ib, specified as failover LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required registration failed for lfs-MDT0000: -99 Am I specifying the failover incorrectly? What should it be when using o2ib as the primary interconnect. If I remove the failover parameters using tunefs.lustre the mount succeeds, but clients cannot connect to the mdt. -- Gary Molenkamp SHARCNET Systems Administrator University of Western Ontario Compute/Calcul Canada http://www.computecanada.org gary at sharcnet.ca http://www.sharcnet.ca (519) 661-2111 x88429 (519) 661-4000
On 12/13/2010 11:54 AM, Gary Molenkamp wrote:> I''m attempting to deploy a new lustre filesystem using lustre 1.8.5, but > this is my first stab at incorporating an IB network. I''ve deployed > several over tcp using 1.8.4 without issue, so I''m not sure if there is > an IB configuration or a 1.8.5 issue here. Any assistance would be > appreciated. > > This new cluster has two parallel networks: > gige: 10.27.5.0/23 > ib : 10.27.8.0/23 > > On the lfs servers and clients, lnet is configured as: > options lnet networks=o2ib0(ib0),tcp0(ib0)^^^^^ Why are you assigning two different network types to the same physical device?> The IB network is routable to 10/8 and clients mount other lustre > filesystems using 1.8.4 over tcp. > > On the MDS (with an intended failover to a secondary) the mgs,mdt > filesystem is created with: > > mkfs.lustre --fsname lfs --mdt --mgs \ > --mkfsoptions=''-i 1024 -I 512'' \ > --failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \ > --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \ > /dev/sda > > However, this mount then fails with: > > mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign > requested address > > An lctl shows the proper nids: > 10.27.9.133 at o2ib > 10.27.9.133 at tcp > > Dmesg shows a parsing error with the o2ib0 nid: > > LustreError: 159-d: Can''t parse NID ''failover.node=10.27.9.133 at o2ib0'' > Lustre: Denying initial registration attempt from nid 10.27.9.133 at o2ib, > specified as failover > LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required > registration failed for lfs-MDT0000: -99 > > Am I specifying the failover incorrectly? What should it be when using > o2ib as the primary interconnect. If I remove the failover parameters > using tunefs.lustre the mount succeeds, but clients cannot connect to > the mdt. > >
Colin Faber wrote:> > > On 12/13/2010 11:54 AM, Gary Molenkamp wrote: >> I''m attempting to deploy a new lustre filesystem using lustre 1.8.5, but >> this is my first stab at incorporating an IB network. I''ve deployed >> several over tcp using 1.8.4 without issue, so I''m not sure if there is >> an IB configuration or a 1.8.5 issue here. Any assistance would be >> appreciated. >> >> This new cluster has two parallel networks: >> gige: 10.27.5.0/23 >> ib : 10.27.8.0/23 >> >> On the lfs servers and clients, lnet is configured as: >> options lnet networks=o2ib0(ib0),tcp0(ib0) > ^^^^^ > Why are you assigning two different network types to the same physical > device?My assumption was that this indicated to lnet when IPoIB was to be used vs native IB, but by your question, I assume that is not the case. :) I retested with just options lnet networks=o2ib0(ib0) And the resulting error conditions below still hold true.>> The IB network is routable to 10/8 and clients mount other lustre >> filesystems using 1.8.4 over tcp. >> >> On the MDS (with an intended failover to a secondary) the mgs,mdt >> filesystem is created with: >> >> mkfs.lustre --fsname lfs --mdt --mgs \ >> --mkfsoptions=''-i 1024 -I 512'' \ >> --failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \ >> --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \ >> /dev/sda >> >> However, this mount then fails with: >> >> mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign >> requested address >> >> An lctl shows the proper nids: >> 10.27.9.133 at o2ib >> 10.27.9.133 at tcp >> >> Dmesg shows a parsing error with the o2ib0 nid: >> >> LustreError: 159-d: Can''t parse NID ''failover.node=10.27.9.133 at o2ib0'' >> Lustre: Denying initial registration attempt from nid 10.27.9.133 at o2ib, >> specified as failover >> LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required >> registration failed for lfs-MDT0000: -99 >> >> Am I specifying the failover incorrectly? What should it be when using >> o2ib as the primary interconnect. If I remove the failover parameters >> using tunefs.lustre the mount succeeds, but clients cannot connect to >> the mdt. >> >>-- Gary Molenkamp SHARCNET Systems Administrator University of Western Ontario Compute/Calcul Canada http://www.computecanada.org gary at sharcnet.ca http://www.sharcnet.ca (519) 661-2111 x88429 (519) 661-4000
On 14/12/2010 05:54, Gary Molenkamp wrote:> On the MDS (with an intended failover to a secondary) the mgs,mdt > filesystem is created with: > > mkfs.lustre --fsname lfs --mdt --mgs \ > --mkfsoptions=''-i 1024 -I 512'' \ > --failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \ > --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \ > /dev/sda > > However, this mount then fails with: > > mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign > requested addressShouldn''t there only be one "--failnode" flag? IIRC, failnode should only reference the secondary / standby server, not the primary (i.e. the node where the mkfs command is being executed). Malcolm.
On Mon, 13 Dec 2010, Colin Faber wrote:> On 12/13/2010 11:54 AM, Gary Molenkamp wrote: >> I''m attempting to deploy a new lustre filesystem using lustre 1.8.5, but >> this is my first stab at incorporating an IB network. I''ve deployed >> several over tcp using 1.8.4 without issue, so I''m not sure if there is >> an IB configuration or a 1.8.5 issue here. Any assistance would be >> appreciated. >> >> This new cluster has two parallel networks: >> gige: 10.27.5.0/23 >> ib : 10.27.8.0/23 >> >> On the lfs servers and clients, lnet is configured as: >> options lnet networks=o2ib0(ib0),tcp0(ib0) > ^^^^^ > Why are you assigning two different network types to the same physical > device?Hello Colin, Thanks for the reply. In answer to your question: The same physical device has access to two different lustre filesystems using different protocols. One lustre filesystem is locally available via the native ib interface o2ib0(ib0). The other lustre filesystem is remotely available (via a IB to 10Gb switch/gateway in the local IB fabric) on the same local IB device but only via the tcp/ip (IPoIB) protocol, tcp0(ib0). (not sure how good this ASCII diagram will look) --------------------- |-------------| |------------| local lustre setup| ib0 | ----------- --------------------- -------- |ib fabric| |client| ----------- -------- | -------------- |ib to 10Gb gw| -------------- | eth0 -------------------- |---------------------| remote lustre setup| -------------------- Is this possible? -k>> The IB network is routable to 10/8 and clients mount other lustre >> filesystems using 1.8.4 over tcp. >> >> On the MDS (with an intended failover to a secondary) the mgs,mdt >> filesystem is created with: >> >> mkfs.lustre --fsname lfs --mdt --mgs \ >> --mkfsoptions=''-i 1024 -I 512'' \ >> --failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \ >> --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \ >> /dev/sda >> >> However, this mount then fails with: >> >> mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign >> requested address >> >> An lctl shows the proper nids: >> 10.27.9.133 at o2ib >> 10.27.9.133 at tcp >> >> Dmesg shows a parsing error with the o2ib0 nid: >> >> LustreError: 159-d: Can''t parse NID ''failover.node=10.27.9.133 at o2ib0'' >> Lustre: Denying initial registration attempt from nid 10.27.9.133 at o2ib, >> specified as failover >> LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required >> registration failed for lfs-MDT0000: -99 >> >> Am I specifying the failover incorrectly? What should it be when using >> o2ib as the primary interconnect. If I remove the failover parameters >> using tunefs.lustre the mount succeeds, but clients cannot connect to >> the mdt. >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Thanx Malcolm (and Colin) for the pointers. I have the local filesystem up and running over the o2ib(ib0) network, using a single failnode and only the o2ib(ib0) network. As Kaizaad mentioned though, we are in a situation where we need access to tcp over the ib device or abandon o2ib and run the local lfs as ipoib as well. Malcolm Cowe wrote:> On 14/12/2010 05:54, Gary Molenkamp wrote: >> On the MDS (with an intended failover to a secondary) the mgs,mdt >> filesystem is created with: >> >> mkfs.lustre --fsname lfs --mdt --mgs \ >> --mkfsoptions=''-i 1024 -I 512'' \ >> --failnode=10.27.9.133 at o2ib0 --failnode=10.27.9.132 at o2ib0 \ >> --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \ >> /dev/sda >> >> However, this mount then fails with: >> >> mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign >> requested address > Shouldn''t there only be one "--failnode" flag? IIRC, failnode should > only reference the secondary / standby server, not the primary (i.e. the > node where the mkfs command is being executed). > > Malcolm. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Gary Molenkamp SHARCNET Systems Administrator University of Western Ontario Compute/Calcul Canada http://www.computecanada.org gary at sharcnet.ca http://www.sharcnet.ca (519) 661-2111 x88429 (519) 661-4000
Heald, Nathan T.
2010-Dec-14 17:20 UTC
[Lustre-discuss] Errors in output from sgpdd-survey (sgp_dd.c Cannot allocate memory)
Hi everyone, I have been running sgpdd-survey on some DDN 9550''s and am getting some errors. I''m using what I believe to be the latest version of the I/O Kit (lustre-iokit-1.2-200709210921). I''ve got 4 OSSes attached and run sgpdd-survey against all the disk from each host one at a time. Each host is getting these errors, but not identically. I''ve found several threads on the mailing list with people reporting this same error but there are no resolutions posted. One post suggested a modification to the flags for "sg_readcap" in the script could resolve these errors, but making the changes did not seem to fix the issue. It looks like sgp_dd is having intermittent problems: 16384+0 records out sg starting in command at "sgp_dd.c":827: Cannot allocate memory sg starting in command at "sgp_dd.c":827: Cannot allocate memory sg starting in command at "sgp_dd.c":827: Cannot allocate memory sg starting in command at "sgp_dd.c":827: Cannot allocate memory sg starting in command at "sgp_dd.c":827: Cannot allocate memory sg starting in command at "sgp_dd.c":827: Cannot allocate memory Output from sgpdd-survey: Wed Dec 1 10:55:55 EST 2010 sgpdd-survey on /dev/sdp /dev/sdo /dev/sdn /dev/sdw /dev/sdv /dev/sdu /dev/sdt /dev/sds /dev/sdy /dev/sdr /dev/sdx /dev/sdq from oss1 ... total_size 100663296K rsz 1024 crg 384 thr 768 write 388.20 MB/s 384 x 1.01 = 388.18 MB/s read 387.16 MB/s 384 x 1.01 = 388.18 MB/s total_size 100663296K rsz 1024 crg 384 thr 1536 write 1 failed read 385.72 MB/s 384 x 1.01 = 388.18 MB/s total_size 100663296K rsz 1024 crg 384 thr 3072 write 140 failed read 121 failed total_size 100663296K rsz 1024 crg 384 thr 6144 ENOMEM total_size 100663296K rsz 1024 crg 768 thr 768 write 1 failed read 387.28 MB/s 768 x 0.51 = 388.18 MB/s total_size 100663296K rsz 1024 crg 768 thr 1536 write 388.23 MB/s 768 x 0.51 = 388.18 MB/s read 386.76 MB/s 768 x 0.51 = 388.18 MB/s total_size 100663296K rsz 1024 crg 768 thr 3072 write 42 failed read 31 failed total_size 100663296K rsz 1024 crg 768 thr 6144 ENOMEM total_size 100663296K rsz 1024 crg 768 thr 12288 ENOMEM ... Any suggestions are welcome. Thanks, -Nathan
Kevin Van Maren
2010-Dec-14 17:38 UTC
[Lustre-discuss] Errors in output from sgpdd-survey (sgp_dd.c Cannot allocate memory)
Yep, this is a common problem. I''ve never bothered to figure out why memory can''t be allocated, although as you note the issue is in sgp_dd, not in the iokit scripts. Could be a resource limit of some sort (pinned pages?). If you have time to dig into it, I''m sure many people would appreciate it. One thing to note is that Lustre limits itself to 512 total threads per server. So there are never more than that outstanding IOs when running Lustre, although additional client requests can be queued and processed, which is why higher crg/thread values are interesting. If you limit the sgpdd_survey total thread count, you should not have these failures (note that 1536 threads has one failing write process while 3072 has 140; perhaps you could have sgp_dd retry the allocation). Kevin Heald, Nathan T. wrote:> Hi everyone, > I have been running sgpdd-survey on some DDN 9550''s and am getting some > errors. I''m using what I believe to be the latest version of the I/O Kit > (lustre-iokit-1.2-200709210921). I''ve got 4 OSSes attached and run > sgpdd-survey against all the disk from each host one at a time. Each host is > getting these errors, but not identically. I''ve found several threads on the > mailing list with people reporting this same error but there are no > resolutions posted. One post suggested a modification to the flags for > "sg_readcap" in the script could resolve these errors, but making the > changes did not seem to fix the issue. It looks like sgp_dd is having > intermittent problems: > > 16384+0 records out > sg starting in command at "sgp_dd.c":827: Cannot allocate memory > sg starting in command at "sgp_dd.c":827: Cannot allocate memory > sg starting in command at "sgp_dd.c":827: Cannot allocate memory > sg starting in command at "sgp_dd.c":827: Cannot allocate memory > sg starting in command at "sgp_dd.c":827: Cannot allocate memory > sg starting in command at "sgp_dd.c":827: Cannot allocate memory > > > Output from sgpdd-survey: > > Wed Dec 1 10:55:55 EST 2010 sgpdd-survey on /dev/sdp /dev/sdo /dev/sdn > /dev/sdw /dev/sdv /dev/sdu /dev/sdt /dev/sds /dev/sdy /dev/sdr /dev/sdx > /dev/sdq from oss1 > ... > total_size 100663296K rsz 1024 crg 384 thr 768 write 388.20 MB/s 384 > x 1.01 = 388.18 MB/s read 387.16 MB/s 384 x 1.01 = 388.18 MB/s > total_size 100663296K rsz 1024 crg 384 thr 1536 write 1 failed read > 385.72 MB/s 384 x 1.01 = 388.18 MB/s > total_size 100663296K rsz 1024 crg 384 thr 3072 write 140 failed read 121 > failed > total_size 100663296K rsz 1024 crg 384 thr 6144 ENOMEM > total_size 100663296K rsz 1024 crg 768 thr 768 write 1 failed read > 387.28 MB/s 768 x 0.51 = 388.18 MB/s > total_size 100663296K rsz 1024 crg 768 thr 1536 write 388.23 MB/s 768 > x 0.51 = 388.18 MB/s read 386.76 MB/s 768 x 0.51 = 388.18 MB/s > total_size 100663296K rsz 1024 crg 768 thr 3072 write 42 failed read 31 > failed > total_size 100663296K rsz 1024 crg 768 thr 6144 ENOMEM > total_size 100663296K rsz 1024 crg 768 thr 12288 ENOMEM > ... > > Any suggestions are welcome. > > Thanks, > -Nathan > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Jim Shankland
2010-Dec-14 18:20 UTC
[Lustre-discuss] Errors in output from sgpdd-survey (sgp_dd.c Cannot allocate memory)
Heald, Nathan T. wrote:> Hi everyone, > I have been running sgpdd-survey on some DDN 9550''s and am getting some > errors. I''m using what I believe to be the latest version of the I/O Kit > (lustre-iokit-1.2-200709210921). I''ve got 4 OSSes attached and run > sgpdd-survey against all the disk from each host one at a time. Each host is > getting these errors, but not identically. I''ve found several threads on the > mailing list with people reporting this same error but there are no > resolutions posted. One post suggested a modification to the flags for > "sg_readcap" in the script could resolve these errors, but making the > changes did not seem to fix the issue. It looks like sgp_dd is having > intermittent problems: > > 16384+0 records out > sg starting in command at "sgp_dd.c":827: Cannot allocate memory[snip]> > Output from sgpdd-survey: > > Wed Dec 1 10:55:55 EST 2010 sgpdd-survey on /dev/sdp /dev/sdo /dev/sdn > /dev/sdw /dev/sdv /dev/sdu /dev/sdt /dev/sds /dev/sdy /dev/sdr /dev/sdx > /dev/sdq from oss1 > ... > total_size 100663296K rsz 1024 crg 384 thr 768 write 388.20 MB/s 384 > x 1.01 = 388.18 MB/s read 387.16 MB/s 384 x 1.01 = 388.18 MB/s > total_size 100663296K rsz 1024 crg 384 thr 1536 write 1 failed read > 385.72 MB/s 384 x 1.01 = 388.18 MB/s > total_size 100663296K rsz 1024 crg 384 thr 3072 write 140 failed read 121 > failed > total_size 100663296K rsz 1024 crg 384 thr 6144 ENOMEMYou just don''t have enough RAM to do these particular runs. If you look at the line ending in ENOMEM above: sgpdd-survey is proposing to launch 384 separate sgp_dd processes for each of 12 different devices, with each process launching 16 threads (6144 / 384), and each thread allocating at least 1 1 MiB write buffer. That adds up to 72 GiB of RAM for write buffers. The ENOMEM line means that the sgpdd-survey script looked at the amount of physical RAM you have, and estimated it wasn''t enough to do this run. You could try running sgpdd-survey against each block device one at a time, which will reduce the needed RAM by a factor of 12 (in your case), but of course isn''t quite equivalent. sg_readcap is used to determine the physical sector size and capacity (sector count) of each block device. I wouldn''t think changing the flags on it would help anything. Jim Shankland Whamcloud, Inc.
Kevin Van Maren
2010-Dec-14 20:27 UTC
[Lustre-discuss] Errors in output from sgpdd-survey (sgp_dd.c Cannot allocate memory)
Jim Shankland wrote:>> ... >> total_size 100663296K rsz 1024 crg 384 thr 768 write 388.20 MB/s 384 >> x 1.01 = 388.18 MB/s read 387.16 MB/s 384 x 1.01 = 388.18 MB/s >> total_size 100663296K rsz 1024 crg 384 thr 1536 write 1 failed read >> 385.72 MB/s 384 x 1.01 = 388.18 MB/s >> total_size 100663296K rsz 1024 crg 384 thr 3072 write 140 failed read 121 >> failed >> total_size 100663296K rsz 1024 crg 384 thr 6144 ENOMEM >> > > You just don''t have enough RAM to do these particular runs. > If you look at the line ending in ENOMEM above: sgpdd-survey > is proposing to launch 384 separate sgp_dd processes for each > of 12 different devices, with each process launching 16 > threads (6144 / 384), and each thread allocating at least 1 1 > MiB write buffer. That adds up to 72 GiB of RAM for write > buffers. The ENOMEM line means that the sgpdd-survey script > looked at the amount of physical RAM you have, and estimated it > wasn''t enough to do this run. >It''s not just the ENOMEM at 6144 total threads that is the problem, it is the "write X failed", etc, at the _lower_ thread counts. From memory, the "crg" and "thr" numbers are already multiplied by 12 (the number of devices being tested), so "thr" should reflect the total number of buffers required. For this test, it looks like crg=32 and SG_MAX_QUEUE is the default 16. So the memory consumption _should not_ be an issue, but sgp_dd is still having problems allocating buffers. Again, I''ve seen this even when I clearly had free memory on the node, so I think there is something else at work here. Kevin
^^^^^>> Why are you assigning two different network types to the same physical >> device? > > Hello Colin, > > Thanks for the reply. In answer to your question: > > The same physical device has access to two different lustre filesystems > using different protocols. > > One lustre filesystem is locally available via the native ib interface > o2ib0(ib0). > > The other lustre filesystem is remotely available (via a IB to 10Gb > switch/gateway in the local IB fabric) on the same local IB device but > only via the tcp/ip (IPoIB) protocol, tcp0(ib0). > > (not sure how good this ASCII diagram will look) > > --------------------- > |-------------| |------------| local lustre setup| > ib0 | ----------- --------------------- > -------- |ib fabric| > |client| ----------- > -------- | > -------------- > |ib to 10Gb gw| > -------------- > | eth0 -------------------- > |---------------------| remote lustre setup| > -------------------- > > Is this possible? > > -kI did manage to get this to work properly under the following conditions: remote lustre setup uses tcp(eth0) local lustre setup uses o2ib(ib0) on the ib client lnet o2ib(ib0),tcp(ib0) With this configuration, all lustre servers are active and reachable. If the client ordering is reversed, then the OSSs on the local lustre always reports as temporarily unreachable. -- Gary Molenkamp SHARCNET Systems Administrator University of Western Ontario Compute/Calcul Canada http://www.computecanada.org gary at sharcnet.ca http://www.sharcnet.ca (519) 661-2111 x88429 (519) 661-4000
Christopher J. Walker
2011-Feb-21 23:52 UTC
[Lustre-discuss] Errors in output from sgpdd-survey (sgp_dd.c Cannot allocate memory)
On 14/12/10 20:27, Kevin Van Maren wrote:> Jim Shankland wrote: >>> ... >>> total_size 100663296K rsz 1024 crg 384 thr 768 write 388.20 MB/s 384 >>> x 1.01 = 388.18 MB/s read 387.16 MB/s 384 x 1.01 = 388.18 MB/s >>> total_size 100663296K rsz 1024 crg 384 thr 1536 write 1 failed read >>> 385.72 MB/s 384 x 1.01 = 388.18 MB/s >>> total_size 100663296K rsz 1024 crg 384 thr 3072 write 140 failed read 121 >>> failed >>> total_size 100663296K rsz 1024 crg 384 thr 6144 ENOMEM >>> >> >> You just don''t have enough RAM to do these particular runs. >> If you look at the line ending in ENOMEM above: sgpdd-survey >> is proposing to launch 384 separate sgp_dd processes for each >> of 12 different devices, with each process launching 16 >> threads (6144 / 384), and each thread allocating at least 1 1 >> MiB write buffer. That adds up to 72 GiB of RAM for write >> buffers. The ENOMEM line means that the sgpdd-survey script >> looked at the amount of physical RAM you have, and estimated it >> wasn''t enough to do this run. >> > > It''s not just the ENOMEM at 6144 total threads that is the problem, it > is the "write X failed", etc, at the _lower_ thread counts. > > From memory, the "crg" and "thr" numbers are already multiplied by 12 > (the number of devices being tested), so "thr" should reflect the total > number of buffers required. For this test, it looks like crg=32 and > SG_MAX_QUEUE is the default 16. So the memory consumption _should not_ > be an issue, but sgp_dd is still having problems allocating buffers. > > Again, I''ve seen this even when I clearly had free memory on the node, > so I think there is something else at work here. >I''ve run into this problem (on a scientific linux 5.5 machine). If I use /dev/sg1, I get the following: [root at sn86 lustre]# sgp_dd if=/dev/zero of=/dev/sg1 seek=1024 thr=1 count=1677721 bs=512 bpt=2048 time=1 sg starting out command at "sgp_dd.c":872: Cannot allocate memory whereas if I use /dev/sdb, I get: [root at sn86 lustre]# sgp_dd if=/dev/zero of=/dev/sdb seek=1024 thr=1 count=1677721 bs=512 bpt=2048 time=1 time to transfer data was 0.485030 secs, 1771.01 MB/sec They correspond to the same disk: [root at sn86 lustre]# sg_map | grep sdb /dev/sg1 /dev/sdb Have I just defeated the point of using sgp_dd? Is the fact that this really a sata disk (behind a Dell H700 controller) the problem? Chris