thr3ads.net - Mechanize users - [Mechanize-users] getaddrinfo: Temporary failure in name resolution [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Eric Marthinsen

2012-Nov-01 20:04 UTC

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

Hi Everyone-

I''ve written a scraper to go through the website of all of our clients
and
verify that their website url is still active and to see if they have a
twitter link on their homepage. You can see the code I''m using to do
the
scraping here: https://gist.github.com/3996079

There are around 13,000 urls that I''m trying to visit. I get through
about
1000 of them and then this error starts showing up for all of the requests:

getaddrinfo: Temporary failure in name resolution

It''s extremely consistent. I''m running this off an EC2
instance. At first,
I was using Amazon''s DNS servers and thought that maybe it was an issue
within their walls. So, I changed my DNS servers to point to Google''s
public DNS servers. The result was exactly the same and the error presented
itself at the same point.

Does anything stand out as a potential culprit here?

Regards,
Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20121101/c0ea155f/attachment.html>

Eric Hodel

2012-Nov-02 01:39 UTC

head link

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

On Nov 1, 2012, at 14:04, Eric Marthinsen <eric at sortfolio.com>
wrote:> Hi Everyone-
> 
> I''ve written a scraper to go through the website of all of our
clients and verify that their website url is still active and to see if they
have a twitter link on their homepage. You can see the code I''m using
to do the scraping here: https://gist.github.com/3996079
> 
> There are around 13,000 urls that I''m trying to visit. I get
through about 1000 of them and then this error starts showing up for all of the
requests:
> 
> getaddrinfo: Temporary failure in name resolution
> 
> It''s extremely consistent. I''m running this off an EC2
instance. At first, I was using Amazon''s DNS servers and thought that
maybe it was an issue within their walls. So, I changed my DNS servers to point
to Google''s public DNS servers. The result was exactly the same and the
error presented itself at the same point.
> 
> Does anything stand out as a potential culprit here?
Try require ''resolv-replace'' as a temporary workaround. This
enables a pure-ruby DNS resolver.

This message likely comes directly from resolv(3), making it an OS-level issue.
I''ll poke around in the ruby sources to see what I can find.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20121101/a378d273/attachment.html>

Eric Marthinsen

2012-Nov-02 03:56 UTC

head link

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

Hi Eric-

I might have figured it out. I''m re-running the script now and am about
7,000 records in. I made two changes. The first is that I set the idle
timeout to 1 second. The second is that I set http keep alive to false.
What I think was happening was that as I was creating page objects, they
were keeping a persistent connection to the remote server. These
connections might have timed out after 5 seconds, but after cranking
through enough records, the number of live connections grew to the point
where there were no available connections with which to do a DNS lookup.
This is all speculation for what might have been happening (I''ve never
heard about a fixed number of outbound connections), but it fits my mental
model of what''s going on.

Regards,
Eric

On Thu, Nov 1, 2012 at 9:39 PM, Eric Hodel <drbrain at segment7.net>
wrote:
> On Nov 1, 2012, at 14:04, Eric Marthinsen <eric at sortfolio.com>
wrote:
>
> Hi Everyone-
>
> I''ve written a scraper to go through the website of all of our
clients and
> verify that their website url is still active and to see if they have a
> twitter link on their homepage. You can see the code I''m using to
do the
> scraping here: https://gist.github.com/3996079
>
> There are around 13,000 urls that I''m trying to visit. I get
through about
> 1000 of them and then this error starts showing up for all of the requests:
>
> getaddrinfo: Temporary failure in name resolution
>
> It''s extremely consistent. I''m running this off an EC2
instance. At first,
> I was using Amazon''s DNS servers and thought that maybe it was an
issue
> within their walls. So, I changed my DNS servers to point to
Google''s
> public DNS servers. The result was exactly the same and the error presented
> itself at the same point.
>
> Does anything stand out as a potential culprit here?
>
>
> Try require ''resolv-replace'' as a temporary workaround.
This enables a
> pure-ruby DNS resolver.
>
> This message likely comes directly from resolv(3), making it an OS-level
> issue. I''ll poke around in the ruby sources to see what I can
find.
>
>
> _______________________________________________
> Mechanize-users mailing list
> Mechanize-users at rubyforge.org
> http://rubyforge.org/mailman/listinfo/mechanize-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20121101/e29a3c11/attachment-0001.html>

Eric Marthinsen

2012-Nov-02 22:41 UTC

head link

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

As a follow-up, the script now runs. Changing the idle timeout and
keep-alive setting did the trick. I think either change in isolation would
have done the trick, but both certainly got the job done.



On Thu, Nov 1, 2012 at 11:56 PM, Eric Marthinsen <eric at sortfolio.com>
wrote:
> Hi Eric-
>
> I might have figured it out. I''m re-running the script now and am
about
> 7,000 records in. I made two changes. The first is that I set the idle
> timeout to 1 second. The second is that I set http keep alive to false.
> What I think was happening was that as I was creating page objects, they
> were keeping a persistent connection to the remote server. These
> connections might have timed out after 5 seconds, but after cranking
> through enough records, the number of live connections grew to the point
> where there were no available connections with which to do a DNS lookup.
> This is all speculation for what might have been happening (I''ve
never
> heard about a fixed number of outbound connections), but it fits my mental
> model of what''s going on.
>
> Regards,
> Eric
>
>
>
>
> On Thu, Nov 1, 2012 at 9:39 PM, Eric Hodel <drbrain at segment7.net>
wrote:
>
>> On Nov 1, 2012, at 14:04, Eric Marthinsen <eric at sortfolio.com>
wrote:
>>
>> Hi Everyone-
>>
>> I''ve written a scraper to go through the website of all of our
clients
>> and verify that their website url is still active and to see if they
have a
>> twitter link on their homepage. You can see the code I''m using
to do the
>> scraping here: https://gist.github.com/3996079
>>
>> There are around 13,000 urls that I''m trying to visit. I get
through
>> about 1000 of them and then this error starts showing up for all of the
>> requests:
>>
>> getaddrinfo: Temporary failure in name resolution
>>
>> It''s extremely consistent. I''m running this off an
EC2 instance. At
>> first, I was using Amazon''s DNS servers and thought that maybe
it was an
>> issue within their walls. So, I changed my DNS servers to point to
Google''s
>> public DNS servers. The result was exactly the same and the error
presented
>> itself at the same point.
>>
>> Does anything stand out as a potential culprit here?
>>
>>
>> Try require ''resolv-replace'' as a temporary
workaround. This enables a
>> pure-ruby DNS resolver.
>>
>> This message likely comes directly from resolv(3), making it an
OS-level
>> issue. I''ll poke around in the ruby sources to see what I can
find.
>>
>>
>> _______________________________________________
>> Mechanize-users mailing list
>> Mechanize-users at rubyforge.org
>> http://rubyforge.org/mailman/listinfo/mechanize-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://rubyforge.org/pipermail/mechanize-users/attachments/20121102/83fb3764/attachment.html>

Eric Hodel

2012-Nov-03 01:51 UTC

head link

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

On Nov 2, 2012, at 4:41 PM, Eric Marthinsen <eric at sortfolio.com>
wrote:> As a follow-up, the script now runs. Changing the idle timeout and
keep-alive setting did the trick. I think either change in isolation would have
done the trick, but both certainly got the job done.
I''ll add some tuning information to the mechanize documentation for
people using mechanize against many different servers, and I''ll see
what I can do in mechanize to improve the idle timeout.
> On Thu, Nov 1, 2012 at 11:56 PM, Eric Marthinsen <eric at
sortfolio.com> wrote:
> Hi Eric-
> 
> I might have figured it out. I''m re-running the script now and am
about 7,000 records in. I made two changes. The first is that I set the idle
timeout to 1 second. The second is that I set http keep alive to false. What I
think was happening was that as I was creating page objects, they were keeping a
persistent connection to the remote server. These connections might have timed
out after 5 seconds, but after cranking through enough records, the number of
live connections grew to the point where there were no available connections
with which to do a DNS lookup. This is all speculation for what might have been
happening (I''ve never heard about a fixed number of outbound
connections), but it fits my mental model of what''s going on.
I bet this is exactly what happened.

Setting the idle timeout low will not help, the idle timeout controls whether or
not the connection is reset.  Mechanize doesn''t clean up sockets that
have passed the idle timeout, it lets the GC take care of it.  Disabling
keep-alive will immediately close the connection, so this would solve the
problem.

Disabling the history may help performance without using up all your sockets
when you leave keep-alive enabled if you''re connecting to the same host
for several requests.  Mechanize relies on the GC to close the connections.

Mechanize users - Nov 2012 - getaddrinfo: Temporary failure in name resolution

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

[Mechanize-users] getaddrinfo: Temporary failure in name resolution

[Mechanize-users] getaddrinfo: Temporary failure in name resolution