Dragan Cvetkovic
2005-Jun-29 23:00 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Wed, 29 Jun 2005, Rich Teer wrote:> On Wed, 29 Jun 2005, Dragan Cvetkovic wrote: > >> after starting truss -afe -o /dev/null -p 2079, it managed to copy some >> 10MB of data in one minute or so, but after killing truss, it takes 5 >> minutes to copy 2MB of data. >> >> What is going on here? How can I find that out? > > <Bryan Cantrill> > This sounds like a job DTrace! [FX: sounds of feverish typing] > </Bryan Cantrill> > > :-) >Yes, I know I should use them, and I wish I know how to do it. Therefore dtrace-discuss cc-ed :-) To recapitulate and simplify a bit: I want to copy via NFS a large file (Solaris Express b17 first iso, some 300MB). My NFS server is a x86 machine running onnv16, my NFS client is a SPARC machine (SunBlade 100) running Solaris 10 GA and up to date with patches. They are all on the local network and e.g. ftp shows the transfer speed of some 8-9MB/s. However, if I use cp to do the same, it goes extremelly slowly: 1 or 2MB per minute or even less! When I truss nfsd on the server, I see that it does a lot of sleeping, when I trace cp on the client, it does a lot of write() + mmap() of 8MB chunks and some idling in between. I have tried experimenting with switching off NFS4 and some other stuff (as I am not completely sure that our NFS4 setup is completely OK), but didn''t help. Where is the cp source, btw? Another example of slow transfer: copying these 4 isos to our install server (using setup_install_server + add_to_install_server) took the whole day (some 5 to 6 hours!)! How do I start debugging this problem? Hints are more than welcome. Thanks and bye, Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn''t. L. E. J. Brouwer
Alex Kiernan
2005-Jun-29 23:17 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On 6/30/05, Dragan Cvetkovic <opensolaris.src at gmail.com> wrote:> On Wed, 29 Jun 2005, Rich Teer wrote: > > > On Wed, 29 Jun 2005, Dragan Cvetkovic wrote: > > > >> after starting truss -afe -o /dev/null -p 2079, it managed to copy some > >> 10MB of data in one minute or so, but after killing truss, it takes 5 > >> minutes to copy 2MB of data. > >> > >> What is going on here? How can I find that out? > > > > <Bryan Cantrill> > > This sounds like a job DTrace! [FX: sounds of feverish typing] > > </Bryan Cantrill> > > > > :-) > > > > Yes, I know I should use them, and I wish I know how to do it. Therefore > dtrace-discuss cc-ed :-) > > To recapitulate and simplify a bit: I want to copy via NFS a large file > (Solaris Express b17 first iso, some 300MB). My NFS server is a x86 machine > running onnv16, my NFS client is a SPARC machine (SunBlade 100) running > Solaris 10 GA and up to date with patches. They are all on the local > network and e.g. ftp shows the transfer speed of some 8-9MB/s. However, if > I use cp to do the same, it goes extremelly slowly: 1 or 2MB per minute or > even less! > > When I truss nfsd on the server, I see that it does a lot of sleeping, when > I trace cp on the client, it does a lot of write() + mmap() of 8MB chunks > and some idling in between. I have tried experimenting with switching off > NFS4 and some other stuff (as I am not completely sure that our NFS4 setup > is completely OK), but didn''t help. > > Where is the cp source, btw? > > Another example of slow transfer: copying these 4 isos to our install > server (using setup_install_server + add_to_install_server) took the whole > day (some 5 to 6 hours!)! >Those symptoms immediately make me think duplex mismatch. -- Alex Kiernan
Dragan Cvetkovic
2005-Jun-29 23:35 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Thu, 30 Jun 2005, Alex Kiernan wrote:> On 6/30/05, Dragan Cvetkovic <opensolaris.src at gmail.com> wrote: >> >> To recapitulate and simplify a bit: I want to copy via NFS a large file >> (Solaris Express b17 first iso, some 300MB). My NFS server is a x86 machine >> running onnv16, my NFS client is a SPARC machine (SunBlade 100) running >> Solaris 10 GA and up to date with patches. They are all on the local >> network and e.g. ftp shows the transfer speed of some 8-9MB/s. However, if >> I use cp to do the same, it goes extremelly slowly: 1 or 2MB per minute or >> even less! >>> Those symptoms immediately make me think duplex mismatch.I thought so as well, but the ftp transfer confuses me (sca is the NFS client, lokrum is the NFS server): lokrum% ftp sca Connected to sca. 220 sca FTP server ready. Name (sca:dragan): 331 Password required for dragan. Password: 230 User dragan logged in. Remote system type is UNIX. Using binary mode to transfer files. ftp> cd /var/tmp 250 CWD command successful. ftp> bin 200 Type set to I. ftp> put sol-nv-b17-x86-v3-iso.zip 200 PORT command successful. 150 Opening BINARY mode data connection for sol-nv-b17-x86-v3-iso.zip. 226 Transfer complete. local: sol-nv-b17-x86-v3-iso.zip remote: sol-nv-b17-x86-v3-iso.zip 615586788 bytes sent in 60 seconds (10051.06 Kbytes/s) ftp> quit So, it''s almost 10MB/s for the ftp between the same two machines. How can I check "duplex mismatch"? Bye, Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn''t. L. E. J. Brouwer
Alex Kiernan
2005-Jun-29 23:40 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On 6/30/05, Dragan Cvetkovic <opensolaris.src at gmail.com> wrote:> On Thu, 30 Jun 2005, Alex Kiernan wrote: > > > On 6/30/05, Dragan Cvetkovic <opensolaris.src at gmail.com> wrote: > >> > >> To recapitulate and simplify a bit: I want to copy via NFS a large file > >> (Solaris Express b17 first iso, some 300MB). My NFS server is a x86 machine > >> running onnv16, my NFS client is a SPARC machine (SunBlade 100) running > >> Solaris 10 GA and up to date with patches. They are all on the local > >> network and e.g. ftp shows the transfer speed of some 8-9MB/s. However, if > >> I use cp to do the same, it goes extremelly slowly: 1 or 2MB per minute or > >> even less! > >> > > > Those symptoms immediately make me think duplex mismatch. > > I thought so as well, but the ftp transfer confuses me (sca is the NFS > client, lokrum is the NFS server): > > lokrum% ftp sca > Connected to sca. > 220 sca FTP server ready. > Name (sca:dragan): > 331 Password required for dragan. > Password: > 230 User dragan logged in. > Remote system type is UNIX. > Using binary mode to transfer files. > ftp> cd /var/tmp > 250 CWD command successful. > ftp> bin > 200 Type set to I. > ftp> put sol-nv-b17-x86-v3-iso.zip > 200 PORT command successful. > 150 Opening BINARY mode data connection for sol-nv-b17-x86-v3-iso.zip. > 226 Transfer complete. > local: sol-nv-b17-x86-v3-iso.zip remote: sol-nv-b17-x86-v3-iso.zip > 615586788 bytes sent in 60 seconds (10051.06 Kbytes/s) > ftp> quit > > So, it''s almost 10MB/s for the ftp between the same two machines. How can > I check "duplex mismatch"? > > Bye, Dragan >kstat -n <interface-name> at each end would where I''d start, but I think whether you get meaningful duplex from it depends on what your NICs actually are.>From memory late collisions are another good indicator.-- Alex Kiernan
Dragan Cvetkovic
2005-Jun-29 23:54 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Thu, 30 Jun 2005, Alex Kiernan wrote:> On 6/30/05, Dragan Cvetkovic <opensolaris.src at gmail.com> wrote: >> On Thu, 30 Jun 2005, Alex Kiernan wrote: >> >>> Those symptoms immediately make me think duplex mismatch. >> >> I thought so as well, but the ftp transfer confuses me (sca is the NFS >> client, lokrum is the NFS server):[snip]>> 615586788 bytes sent in 60 seconds (10051.06 Kbytes/s)>> So, it''s almost 10MB/s for the ftp between the same two machines. How can >> I check "duplex mismatch"? >> > > kstat -n <interface-name> at each end would where I''d start, but I > think whether you get meaningful duplex from it depends on what your > NICs actually are. > > From memory late collisions are another good indicator.OK, on the NFS server I see the following: $ kstat -n elxl0 module: elxl instance: 0 name: elxl0 class: net [snip] carrier_errors 456984 [snip] $ kstat -n elxl0 | grep coll collisions 0 ex_collisions 0 first_collisions 0 multi_collisions 0 tx_late_collisions 0 on the client $ kstat -n eri0 module: eri instance: 0 name: eri0 class: net [snip] ifspeed 100000000 [snip] $ kstat -n eri0 | grep coll collisions 0 excessive_coll 0 first_coll 0 late_coll 0 so there are no collisions, but quite a few carrier_error on the server. Hm. Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn''t. L. E. J. Brouwer
Gavin Maltby
2005-Jun-30 00:12 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
Hi Dragan, On 06/30/05 00:00, Dragan Cvetkovic wrote: [cut]> Where is the cp source, btw?usr/src/cmd/mv - cp and mv have common source.> Another example of slow transfer: copying these 4 isos to our install > server (using setup_install_server + add_to_install_server) took the whole > day (some 5 to 6 hours!)! > > How do I start debugging this problem? Hints are more than welcome.There was a fix around a year ago to cp performance - 5015406. Basically all the fancy mmap and VM advice that our copy was doing had so much VM overhead that a brute-force "read buffer, write buffer" approach was faster! So that should have eliminated the traditional cp performance problems. Before diving into cp/mv source try some copies using dd in both direction - nfs client to server and server to client - using a few block sizes. That could eliminate or confirm cp as a suspect - my guess is it will eliminate it. Are you copying from a client local f/s to server via nfs, from server via nfs to local client f/s, or one nfs mount to another? By way of a baseline, how long does a mkfile 300m take when run on the nfs server in the shared directory? Cheers Gavin
Dragan Cvetkovic
2005-Jun-30 00:48 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Thu, 30 Jun 2005, Gavin Maltby wrote:> Hi Dragan,Hi Gavin,> > On 06/30/05 00:00, Dragan Cvetkovic wrote: > [cut] >> Where is the cp source, btw? > > usr/src/cmd/mv - cp and mv have common source.Ah, that explains it.> >> Another example of slow transfer: copying these 4 isos to our install >> server (using setup_install_server + add_to_install_server) took the whole >> day (some 5 to 6 hours!)! >> >> How do I start debugging this problem? Hints are more than welcome. > > There was a fix around a year ago to cp performance - 5015406. > Basically all the fancy mmap and VM advice that our copy was doing > had so much VM overhead that a brute-force "read buffer, write buffer" > approach was faster! So that should have eliminated the traditional cp > performance problems. Before diving into cp/mv source try some > copies using dd in both direction - nfs client to server and server > to client - using a few block sizes. That could eliminate or > confirm cp as a suspect - my guess is it will eliminate it. >OK. See below.> Are you copying from a client local f/s to server via nfs, > from server via nfs to local client f/s, or one nfs mount > to another? By way of a baseline, how long does > a mkfile 300m take when run on the nfs server in the > shared directory?I am copying to a clients f/s from server''s fs via nfs. I was doing both using /net/server/shared/file/system and mounting it via NFS on a client. Here is the mkfile experiment: server# share -F nfs -o anon=0 /var/tmp/se server# pwd /var/tmp/se server# time mkfile 300m foo real 0m8.613s user 0m0.072s sys 0m5.080s On the client: client# pwd /net/server/var/tmp/se client# time mkfile 300m bar real 0m35.535s user 0m0.140s sys 0m4.053s Here is the dd time client# pwd /net/server/var/tmp/se client# time dd if=sol-nv-b17-x86-v1-iso.zip of=/var/tmp/sol-nv-b17-x86-v1-iso.zip (pressed ^c, it was taking too long) ^C104736+0 records in 104736+0 records out real 3m23.931s user 0m0.997s sys 0m4.701s i.e. in 3m 24sec it only copied 51MB of data (256KB/s). and with larger block size: client# time dd if=sol-nv-b17-x86-v1-iso.zip of=/var/tmp/sol-nv-b17-x86-v1-iso.zip bs=32k ^C 59+0 records in 59+0 records out real 3m43.381s user 0m0.005s sys 0m0.117s so, it''s even worse (1.8MB in 3m 43sec!). It seems that the transfer start quickly, but after some time it gets slower and slower. Bye, Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn''t. L. E. J. Brouwer
Spencer Shepler
2005-Jun-30 01:02 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Wed, Dragan Cvetkovic wrote:> On Thu, 30 Jun 2005, Gavin Maltby wrote: > >Hi Dragan, > > Hi Gavin, > > > >On 06/30/05 00:00, Dragan Cvetkovic wrote: > >[cut] > >>Where is the cp source, btw? > > > >usr/src/cmd/mv - cp and mv have common source. > > Ah, that explains it. > > > > >>Another example of slow transfer: copying these 4 isos to our install > >>server (using setup_install_server + add_to_install_server) took the whole > >>day (some 5 to 6 hours!)! > >> > >>How do I start debugging this problem? Hints are more than welcome. > > > >There was a fix around a year ago to cp performance - 5015406. > >Basically all the fancy mmap and VM advice that our copy was doing > >had so much VM overhead that a brute-force "read buffer, write buffer" > >approach was faster! So that should have eliminated the traditional cp > >performance problems. Before diving into cp/mv source try some > >copies using dd in both direction - nfs client to server and server > >to client - using a few block sizes. That could eliminate or > >confirm cp as a suspect - my guess is it will eliminate it. > > > > OK. See below. > > > >Are you copying from a client local f/s to server via nfs, > >from server via nfs to local client f/s, or one nfs mount > >to another? By way of a baseline, how long does > >a mkfile 300m take when run on the nfs server in the > >shared directory? > > I am copying to a clients f/s from server''s fs via nfs. I was doing both > using /net/server/shared/file/system and mounting it via NFS on a client. > > Here is the mkfile experiment: > > server# share -F nfs -o anon=0 /var/tmp/se > server# pwd > /var/tmp/se > server# time mkfile 300m foo > > real 0m8.613s > user 0m0.072s > sys 0m5.080s > > On the client: > > client# pwd > /net/server/var/tmp/se > client# time mkfile 300m bar > > real 0m35.535s > user 0m0.140s > sys 0m4.053s > > Here is the dd time > > client# pwd > /net/server/var/tmp/se > client# time dd if=sol-nv-b17-x86-v1-iso.zip > of=/var/tmp/sol-nv-b17-x86-v1-iso.zip > > (pressed ^c, it was taking too long) > > ^C104736+0 records in > 104736+0 records out > > real 3m23.931s > user 0m0.997s > sys 0m4.701s > > i.e. in 3m 24sec it only copied 51MB of data (256KB/s). > > and with larger block size: > > client# time dd if=sol-nv-b17-x86-v1-iso.zip > of=/var/tmp/sol-nv-b17-x86-v1-iso.zip bs=32k > > ^C > 59+0 records in > 59+0 records out > > real 3m43.381s > user 0m0.005s > sys 0m0.117s > > so, it''s even worse (1.8MB in 3m 43sec!). > > It seems that the transfer start quickly, but after some time it gets > slower and slower.(note: in the future the nfs-discuss at opensolaris.org list may be a better target for these discussions). Without the aid of a snoop trace of this interaction, I am going to guess that there may be something going on with the x86 server. The NFS client, especially in Solaris 10, is aggressive about read-aheads and the read sizes it generates are larger than ftp would generate. So the classic problem is usually one where the client/server is unable to handle the back-to-back network traffic. To confirm something like this, a snoop trace at both the client and server would be needed; if such a snoop trace is generated, a pointer to it would be best or you can send it to me directly. Spencer
Dragan Cvetkovic
2005-Jun-30 01:08 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Wed, 29 Jun 2005, Spencer Shepler wrote:>> >> It seems that the transfer start quickly, but after some time it gets >> slower and slower. > > (note: in the future the nfs-discuss at opensolaris.org list may be a > better target for these discussions).Ah, didn''t know that it exists. OK, my next problem will go there, I am not subscribed there yet.> Without the aid of a snoop trace of this interaction, I am going to > guess that there may be something going on with the x86 server. > The NFS client, especially in Solaris 10, is aggressive about read-aheads > and the read sizes it generates are larger than ftp would generate. > So the classic problem is usually one where the client/server is unable > to handle the back-to-back network traffic. > > To confirm something like this, a snoop trace at both the client and server > would be needed; if such a snoop trace is generated, a pointer to it would > be best or you can send it to me directly.Will do. What do you want me to snoop? cp or dd? what block size? If copying 300MB takes 2 hours to finish, do you want the whole snoop or shall I limit it? Bye, Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn''t. L. E. J. Brouwer
Spencer Shepler
2005-Jun-30 02:13 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Wed, Dragan Cvetkovic wrote:> On Wed, 29 Jun 2005, Spencer Shepler wrote: > > >> > >>It seems that the transfer start quickly, but after some time it gets > >>slower and slower. > > > >(note: in the future the nfs-discuss at opensolaris.org list may be a > >better target for these discussions). > > Ah, didn''t know that it exists. OK, my next problem will go there, I am > not subscribed there yet. > > >Without the aid of a snoop trace of this interaction, I am going to > >guess that there may be something going on with the x86 server. > >The NFS client, especially in Solaris 10, is aggressive about read-aheads > >and the read sizes it generates are larger than ftp would generate. > >So the classic problem is usually one where the client/server is unable > >to handle the back-to-back network traffic. > > > >To confirm something like this, a snoop trace at both the client and server > >would be needed; if such a snoop trace is generated, a pointer to it would > >be best or you can send it to me directly. > > Will do. What do you want me to snoop? cp or dd? what block size? If > copying 300MB takes 2 hours to finish, do you want the whole snoop or > shall I limit it?Just a short (2 minutes) should suffice. HOwever, in the other mail you noted that there were the very high number of carrier_errors. That is likely the problem, right? Spencer
Dragan Cvetkovic
2005-Jun-30 02:21 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Wed, 29 Jun 2005, Spencer Shepler wrote:> On Wed, Dragan Cvetkovic wrote: >> On Wed, 29 Jun 2005, Spencer Shepler wrote: >> >>>> >>>> It seems that the transfer start quickly, but after some time it gets >>>> slower and slower. >> >> Will do. What do you want me to snoop? cp or dd? what block size? If >> copying 300MB takes 2 hours to finish, do you want the whole snoop or >> shall I limit it? > > Just a short (2 minutes) should suffice. HOwever, in the other mail > you noted that there were the very high number of carrier_errors. > That is likely the problem, right?OK, I have emailed you the link to snoop files. I have noticed quite a few carrier_errors on the server bash-3.00# kstat -n elxl0 | grep carrier carrier_errors 842217 but how came it affects NFS but doesn''t affect e.g. ftp? Bye, Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn''t. L. E. J. Brouwer
Spencer Shepler
2005-Jun-30 04:27 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Wed, Dragan Cvetkovic wrote:> On Wed, 29 Jun 2005, Spencer Shepler wrote: > > >On Wed, Dragan Cvetkovic wrote: > >>On Wed, 29 Jun 2005, Spencer Shepler wrote: > >> > >>>> > >>>>It seems that the transfer start quickly, but after some time it gets > >>>>slower and slower. > >> > >>Will do. What do you want me to snoop? cp or dd? what block size? If > >>copying 300MB takes 2 hours to finish, do you want the whole snoop or > >>shall I limit it? > > > >Just a short (2 minutes) should suffice. HOwever, in the other mail > >you noted that there were the very high number of carrier_errors. > >That is likely the problem, right? > > OK, I have emailed you the link to snoop files. I have noticed quite a few > carrier_errors on the server > > bash-3.00# kstat -n elxl0 | grep carrier > carrier_errors 842217 > > but how came it affects NFS but doesn''t affect e.g. ftp?I don''t have an answer for you. As mentioned, NFS does place a heavier load on networking components and I have never seen that induce carrier_errors but there is a correlation. In the short snoop trace you provided to me offline, for the 788 unique NFS READ requests there were 121 retransmits. 15 percent of the READs were retransmitted to the server and will result in the type of throughput you are seeing. The snoop shows gaps in traffic (for the retransmits) as large as 60 seconds. So, solve the carrier_errors and the situation will improve immensely. Spencer
Dragan Cvetkovic
2005-Jun-30 04:44 UTC
[dtrace-discuss] Re: [osol-discuss] b17 copying behaviour?
On Wed, 29 Jun 2005, Spencer Shepler wrote:> On Wed, Dragan Cvetkovic wrote: >> >> OK, I have emailed you the link to snoop files. I have noticed quite a few >> carrier_errors on the server >> >> bash-3.00# kstat -n elxl0 | grep carrier >> carrier_errors 842217 >> >> but how came it affects NFS but doesn''t affect e.g. ftp? > > I don''t have an answer for you. As mentioned, NFS does place > a heavier load on networking components and I have never seen > that induce carrier_errors but there is a correlation. > > In the short snoop trace you provided to me offline, for the > 788 unique NFS READ requests there were 121 retransmits. 15 percent > of the READs were retransmitted to the server and will result in > the type of throughput you are seeing. The snoop shows gaps in > traffic (for the retransmits) as large as 60 seconds. > > So, solve the carrier_errors and the situation will improve immensely. >Thanks Spencer, will talk to our network admin and will try to get a different NIC and see if that helps. Bye, Dragan -- Dragan Cvetkovic, To be or not to be is true. G. Boole No it isn''t. L. E. J. Brouwer