On Tue, Oct 25, 2016 at 12:48 PM, Matt Garman <matthew.garman at gmail.com> wrote:> On Mon, Oct 24, 2016 at 6:09 PM, Larry Martell <larry.martell at gmail.com> wrote: >> The machines are on a local network. I access them with putty from a >> windows machine, but I have to be at the site to do that. > > So that means when you are offsite there is no way to access either > machine? Does anyone have a means to access these machines from > offsite? > >> Yes, the C6 instance is running on the C7 machine. What could be >> mis-configured? What would I check to find out? > > OK, so these two machines are actually the same physical hardware, correct?Yes.> > Do you know, is the networking between the two machines "soft", as in > done locally on the machine (typically through NAT or briding)? Or is > it "hard", in that you have a dedicated NIC for the host and a > separate dedicated NIC for the guest, and actual cables going out of > each interface and connected to a switch/hub/router? I would expect > the former...I don't know, but would also guess the former.> If it truly is a "soft" network between the machines, then that is > more evidence of a configuration error. Now, unfortunately, with what > to look for: I have virtually no experience setting up C6 guests on a > C7 host; at least not enough to help you troubleshoot the issue. But > in general, you should be able to hit up a web search and look for > howtos and other documents on setting up networking between a C7 host > and its guests. That will allow you to (1) understand how it's > currently setup, (2) verify if there is any misconfig, and (3) correct > or change if needed. > >> Yes, that is potential solution I had not thought of. The issue with >> this is that we have the same system installed at many, many sites, >> and they all work fine. It is only this site that is having an issue. >> We really do not want to have different SW running at just this one >> site. Running the script on the C7 host is a change, but at least it >> will be the same software as every place else. > > IIRC, you said this is the only C7 instance? That would mean it is > already not the same as every other site. It may be conceptually the > same, but "under the hood", there are a tremendous number of changes > between C6 and C7. Effectively every single package is different, > from the kernel all the way to trivial userspace tools.Yes, of course it's different at that level. But I was talking about our application software and set up. It is that that I want to keep consistent across deployments.>> netperf is not installed. > > Again, if you can use putty (which is ssh) to access these systems, > you implicitly have the ability to upload files (i.e. packages) to the > systems. A simple tool like netperf should have few (if any) > dependencies, so you don't have to mess with mirroring the whole > centos repo. Just grab the netperf rpm file from wherever, then use > scp (I believe it's called pscp when part of the Putty package) to > copy to your servers, yum install and start testing.Again, no machine on the internal network that my 2 CentOS hosts are on are connected to the internet. I have no way to download anything., There is an onerous and protracted process to get files into the internal network and I will see if I can get netperf in.
On Tue, Oct 25, 2016 at 7:22 PM, Larry Martell <larry.martell at gmail.com> wrote:> Again, no machine on the internal network that my 2 CentOS hosts are > on are connected to the internet. I have no way to download anything., > There is an onerous and protracted process to get files into the > internal network and I will see if I can get netperf in.Right, but do you have physical access to those machines? Do you have physical access to the machine which on which you use PuTTY to connect to those machines? If yes to either question, then you can use another system (that does have Internet access) to download the files you want, put them on a USB drive (or burn to a CD, etc), and bring the USB/CD to the C6/C7/PuTTY machines. There's almost always a technical way to get files on to (or out of) a system. :) Now, your company might have *policies* that forbid skirting around the technical measures that are in place. Here's another way you might be able to test network connectivity between C6 and C7 without installing new tools: see if both machines have "nc" (netcat) installed. I've seen this tool referred to as "the swiss army knife of network testing tools", and that is indeed an apt description. So if you have that installed, you can hit up the web for various examples of its use. It's designed to be easily scripted, so you can write your own tests, and in theory implement something similar to netperf. OK, I just thought of another "poor man's" way to at least do some sanity testing between C6 and C7: scp. First generate a huge file. General rule of thumb is at least 2x the amount of RAM in the C7 host. You could create a tarball of /usr, for example (e.g. "tar czvf /tmp/bigfile.tar.gz /usr" assuming your /tmp partition is big enough to hold this). Then, first do this: "time scp /tmp/bigfile.tar.gz localhost:/tmp/bigfile_copy.tar.gz". This will literally make a copy of that big file, but will route through most of of the network stack. Make a note of how long it took. And also be sure your /tmp partition is big enough for two copies of that big file. Now, repeat that, but instead of copying to localhost, copy to the C6 box. Something like: "time scp /tmp/bigfile.tar.gz <IP address of C6 host>:/tmp/". Does the time reported differ greatly from when you copied to localhost? I would expect them to be reasonably close. (And this is another reason why you want a fairly large file, so the transfer time is dominated by actual file transfer, rather than the overhead.) Lastly, do the reverse test: log in to the C6 box, and copy the file back to C7, e.g. "time scp /tmp/bigfile.tar.gz <IP of C7 host>:/tmp/bigfile_copy2.tar.gz". Again, the time should be approximately the same for all three transfers. If either or both of the latter two copies take dramatically longer than the first, then there's a good chance something is askew with the network config between C6 and C7. Oh... all this time I've been jumping to fancy tests. Have you tried the simplest form of testing, that is, doing by hand what your scripts do automatically? In other words, simply try copying files between C6 and C7 using the existing NFS config? Can you manually trigger the errors/timeouts you initially posted? Is it when copying lots of small files? Or when you copy a single huge file? Any kind of file copying "profile" you can determine that consistently triggers the error? That could be another clue. Good luck!
I am sorry, I am stepping into the conversation late and may not fully understand all aspects of the situation but I wonder if it may make sense to set up a server process on the NFS server machine that simply listens for incoming requests to perform a file copy and then does so as requested - only locally. If files in question are large - which I suspect they may be, given the timeouts becoming an issue - that may resolve the issue and help speed things up at the same time. Cheers, Boris. On Wed, Oct 26, 2016 at 9:35 AM, Matt Garman <matthew.garman at gmail.com> wrote:> On Tue, Oct 25, 2016 at 7:22 PM, Larry Martell <larry.martell at gmail.com> > wrote: > > Again, no machine on the internal network that my 2 CentOS hosts are > > on are connected to the internet. I have no way to download anything., > > There is an onerous and protracted process to get files into the > > internal network and I will see if I can get netperf in. > > Right, but do you have physical access to those machines? Do you have > physical access to the machine which on which you use PuTTY to connect > to those machines? If yes to either question, then you can use > another system (that does have Internet access) to download the files > you want, put them on a USB drive (or burn to a CD, etc), and bring > the USB/CD to the C6/C7/PuTTY machines. > > There's almost always a technical way to get files on to (or out of) a > system. :) Now, your company might have *policies* that forbid > skirting around the technical measures that are in place. > > Here's another way you might be able to test network connectivity > between C6 and C7 without installing new tools: see if both machines > have "nc" (netcat) installed. I've seen this tool referred to as "the > swiss army knife of network testing tools", and that is indeed an apt > description. So if you have that installed, you can hit up the web > for various examples of its use. It's designed to be easily scripted, > so you can write your own tests, and in theory implement something > similar to netperf. > > OK, I just thought of another "poor man's" way to at least do some > sanity testing between C6 and C7: scp. First generate a huge file. > General rule of thumb is at least 2x the amount of RAM in the C7 host. > You could create a tarball of /usr, for example (e.g. "tar czvf > /tmp/bigfile.tar.gz /usr" assuming your /tmp partition is big enough > to hold this). Then, first do this: "time scp /tmp/bigfile.tar.gz > localhost:/tmp/bigfile_copy.tar.gz". This will literally make a copy > of that big file, but will route through most of of the network stack. > Make a note of how long it took. And also be sure your /tmp partition > is big enough for two copies of that big file. > > Now, repeat that, but instead of copying to localhost, copy to the C6 > box. Something like: "time scp /tmp/bigfile.tar.gz <IP address of C6 > host>:/tmp/". Does the time reported differ greatly from when you > copied to localhost? I would expect them to be reasonably close. > (And this is another reason why you want a fairly large file, so the > transfer time is dominated by actual file transfer, rather than the > overhead.) > > Lastly, do the reverse test: log in to the C6 box, and copy the file > back to C7, e.g. "time scp /tmp/bigfile.tar.gz <IP of C7 > host>:/tmp/bigfile_copy2.tar.gz". Again, the time should be > approximately the same for all three transfers. If either or both of > the latter two copies take dramatically longer than the first, then > there's a good chance something is askew with the network config > between C6 and C7. > > Oh... all this time I've been jumping to fancy tests. Have you tried > the simplest form of testing, that is, doing by hand what your scripts > do automatically? In other words, simply try copying files between C6 > and C7 using the existing NFS config? Can you manually trigger the > errors/timeouts you initially posted? Is it when copying lots of > small files? Or when you copy a single huge file? Any kind of file > copying "profile" you can determine that consistently triggers the > error? That could be another clue. > > Good luck! > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >
On Wed, Oct 26, 2016 at 9:35 AM, Matt Garman <matthew.garman at gmail.com> wrote:> On Tue, Oct 25, 2016 at 7:22 PM, Larry Martell <larry.martell at gmail.com> wrote: >> Again, no machine on the internal network that my 2 CentOS hosts are >> on are connected to the internet. I have no way to download anything., >> There is an onerous and protracted process to get files into the >> internal network and I will see if I can get netperf in. > > Right, but do you have physical access to those machines? Do you have > physical access to the machine which on which you use PuTTY to connect > to those machines? If yes to either question, then you can use > another system (that does have Internet access) to download the files > you want, put them on a USB drive (or burn to a CD, etc), and bring > the USB/CD to the C6/C7/PuTTY machines.This site is locked down like no other I have ever seen. You cannot bring anything into the site - no computers, no media, no phone. You have to empty your pockets and go through an airport type naked body scan.> There's almost always a technical way to get files on to (or out of) a > system. :) Now, your company might have *policies* that forbid > skirting around the technical measures that are in place.This is my client's client, and even if I could circumvent their policy I would not do that. They have a zero tolerance policy and if you are caught violating it you are banned for life from the company. And that would not make my client happy.> Here's another way you might be able to test network connectivity > between C6 and C7 without installing new tools: see if both machines > have "nc" (netcat) installed. I've seen this tool referred to as "the > swiss army knife of network testing tools", and that is indeed an apt > description. So if you have that installed, you can hit up the web > for various examples of its use. It's designed to be easily scripted, > so you can write your own tests, and in theory implement something > similar to netperf. > > OK, I just thought of another "poor man's" way to at least do some > sanity testing between C6 and C7: scp. First generate a huge file. > General rule of thumb is at least 2x the amount of RAM in the C7 host. > You could create a tarball of /usr, for example (e.g. "tar czvf > /tmp/bigfile.tar.gz /usr" assuming your /tmp partition is big enough > to hold this). Then, first do this: "time scp /tmp/bigfile.tar.gz > localhost:/tmp/bigfile_copy.tar.gz". This will literally make a copy > of that big file, but will route through most of of the network stack. > Make a note of how long it took. And also be sure your /tmp partition > is big enough for two copies of that big file. > > Now, repeat that, but instead of copying to localhost, copy to the C6 > box. Something like: "time scp /tmp/bigfile.tar.gz <IP address of C6 > host>:/tmp/". Does the time reported differ greatly from when you > copied to localhost? I would expect them to be reasonably close. > (And this is another reason why you want a fairly large file, so the > transfer time is dominated by actual file transfer, rather than the > overhead.) > > Lastly, do the reverse test: log in to the C6 box, and copy the file > back to C7, e.g. "time scp /tmp/bigfile.tar.gz <IP of C7 > host>:/tmp/bigfile_copy2.tar.gz". Again, the time should be > approximately the same for all three transfers. If either or both of > the latter two copies take dramatically longer than the first, then > there's a good chance something is askew with the network config > between C6 and C7. > > Oh... all this time I've been jumping to fancy tests. Have you tried > the simplest form of testing, that is, doing by hand what your scripts > do automatically? In other words, simply try copying files between C6 > and C7 using the existing NFS config? Can you manually trigger the > errors/timeouts you initially posted? Is it when copying lots of > small files? Or when you copy a single huge file? Any kind of file > copying "profile" you can determine that consistently triggers the > error? That could be another clue.These are all good debugging techniques, and I have tried some of them, but I think the issue is load related. There are 50 external machines ftp-ing to the C7 server, 24/7, thousands of files a day. And on the C6 client the script that processes them is running continuously. It will sometimes run for 7 hours then hang, but it has run for as long as 3 days before hanging. I have never been able to reproduce the errors/hanging situation manually. And again, this is only at this site. We have the same software deployed at 10 different sites all doing the same thing, and it all works fine at all of those.