thr3ads.net - Xen devel - [Xen-devel] xenolinux /dev/random [May 2004]

If this information is useful, please help other people find it:
Share via:

Ian Pratt

2004-May-05 08:36 UTC

[Xen-devel] xenolinux /dev/random

I''ve checked in a fix to the 1.2 and unstable trees for a problem
we discovered yesterday with /dev/random. Basically, the virtual
drivers weren''t adding entropy to the kernel entropy pool, which
tended to mean that /dev/random blocked for long periods of
time. The problem was particularly acute with NFS root systems,
where with no entropy input /dev/random blocked forever. 

If you were having problems with Apache being slow to start
(listening on port 80, but not servicing requests), you should
find the problem goes away the latest tar balls.


Ian


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we''ll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Steve Traugott

2004-May-13 03:09 UTC

head link

Re: [Xen-devel] xenolinux /dev/random

Hi All,

My goodness!  See the message I just now posted to xen-devel about NFS
root hangs; could this be what we''re hitting?  The most recent hang we
saw happened while an rsync was running over ssh *and* someone restarted
apache...

This wouldn''t cause the "NFS server not responding/NFS server
OK"
messages on the domain''s console, though (or does that show up as a
symptom of this too?)

Steve

On Wed, May 05, 2004 at 09:36:08AM +0100, Ian Pratt
wrote:> 
> I''ve checked in a fix to the 1.2 and unstable trees for a problem
> we discovered yesterday with /dev/random. Basically, the virtual
> drivers weren''t adding entropy to the kernel entropy pool, which
> tended to mean that /dev/random blocked for long periods of
> time. The problem was particularly acute with NFS root systems,
> where with no entropy input /dev/random blocked forever. 
> 
> If you were having problems with Apache being slow to start
> (listening on port 80, but not servicing requests), you should
> find the problem goes away the latest tar balls.
> 
> 
> Ian
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: Oracle 10g
> Get certified on the hottest thing ever to hit the market... Oracle 10g. 
> Take an Oracle 10g class now, and we''ll give you the exam FREE.
> http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel

-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Steven Hand

2004-May-13 07:13 UTC

head link

Re: [Xen-devel] xenolinux /dev/random

>My goodness!  See the message I just now posted to xen-devel about NFS
>root hangs; could this be what we''re hitting?  The most recent hang
we
>saw happened while an rsync was running over ssh *and* someone restarted
>apache...
>
>This wouldn''t cause the "NFS server not responding/NFS server
OK"
>messages on the domain''s console, though (or does that show up as a
>symptom of this too?)
I don''t think this is the cause of the NFS hangs you''ve been
seeing; that
appears to be a generic linux thing (at least we see it with our regular
linux boxes as well as with xen boxes); however if you want to test the 
theory the easiest thing to do is to change the /dev/random device node
to be an alias for /dev/urandom (a non-blocking but potentiallyweaker
source of randomness). 

The /dev/random bug only really manifested for us during boot, only on
Xen, and resulted in a permanenent hang.

The "NFS server foo not responding" followed by later "NFS server
foo OK"
messages from linux appear to be due to a combination of stupid timeouts 
in the linux sunrpc code and another bug which can cause automounters 
to fall into an uninterruptible sleep. If you check "ps auwwx" on a 
machine which is having problems and notice proceesses in state
''D''
then this is biting you. Even if this doesn''t occur, the crappy
timeouts
in the regular linux code mean that linux perfroms very badly if it gets 
any errors/loss/congestion during nfs operations.

cheers, 

S.

Kip Macy

2004-May-13 14:54 UTC

head link

Re: [Xen-devel] xenolinux /dev/random

Are you also using Linux as an NFS server? We use Linux extensively
in-house for client machines and have not seen this. Although I''m sure
we don''t use the default Linux settings.


				-Kip


On Thu, 13 May 2004, Steven Hand wrote:
>
> >My goodness!  See the message I just now posted to xen-devel about NFS
> >root hangs; could this be what we''re hitting?  The most recent
hang we
> >saw happened while an rsync was running over ssh *and* someone
restarted
> >apache...
> >
> >This wouldn''t cause the "NFS server not responding/NFS
server OK"
> >messages on the domain''s console, though (or does that show up
as a
> >symptom of this too?)
>
> I don''t think this is the cause of the NFS hangs you''ve
been seeing; that
> appears to be a generic linux thing (at least we see it with our regular
> linux boxes as well as with xen boxes); however if you want to test the
> theory the easiest thing to do is to change the /dev/random device node
> to be an alias for /dev/urandom (a non-blocking but potentiallyweaker
> source of randomness).
>
> The /dev/random bug only really manifested for us during boot, only on
> Xen, and resulted in a permanenent hang.
>
> The "NFS server foo not responding" followed by later "NFS
server foo OK"
> messages from linux appear to be due to a combination of stupid timeouts
> in the linux sunrpc code and another bug which can cause automounters
> to fall into an uninterruptible sleep. If you check "ps auwwx" on
a
> machine which is having problems and notice proceesses in state
''D''
> then this is biting you. Even if this doesn''t occur, the crappy
timeouts
> in the regular linux code mean that linux perfroms very badly if it gets
> any errors/loss/congestion during nfs operations.
>
> cheers,
>
> S.
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: SourceForge.net Broadband
> Sign-up now for SourceForge Broadband and get the fastest
> 6.0/768 connection for only $19.95/mo for the first 3 months!
> http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>

Steve Traugott

2004-May-14 01:42 UTC

head link

Re: [Xen-devel] xenolinux /dev/random

The server is debian woody, 2.4.21.  

I''ve never seen any obvious way to actually set any timeout, block
size, or other parameters for an NFS root partition -- and it seems to
ignore whatever''s in fstab, which makes sense.  

Right now the only reason I''m even using NFS is because a Xenoserver
provider needs to be able to do backups, migration, failover, and so on.
How are other people meeting these requirements?  Has the CoW
development stalled?  What about live migration?

Steve

On Thu, May 13, 2004 at 07:54:57AM -0700, Kip Macy
wrote:> Are you also using Linux as an NFS server? We use Linux extensively
> in-house for client machines and have not seen this. Although I''m
sure
> we don''t use the default Linux settings.
> 
> 
> 				-Kip
> 
> 
> On Thu, 13 May 2004, Steven Hand wrote:
> 
> >
> > >My goodness!  See the message I just now posted to xen-devel about
NFS
> > >root hangs; could this be what we''re hitting?  The most
recent hang we
> > >saw happened while an rsync was running over ssh *and* someone
restarted
> > >apache...
> > >
> > >This wouldn''t cause the "NFS server not
responding/NFS server OK"
> > >messages on the domain''s console, though (or does that
show up as a
> > >symptom of this too?)
> >
> > I don''t think this is the cause of the NFS hangs
you''ve been seeing; that
> > appears to be a generic linux thing (at least we see it with our
regular
> > linux boxes as well as with xen boxes); however if you want to test
the
> > theory the easiest thing to do is to change the /dev/random device
node
> > to be an alias for /dev/urandom (a non-blocking but potentiallyweaker
> > source of randomness).
> >
> > The /dev/random bug only really manifested for us during boot, only on
> > Xen, and resulted in a permanenent hang.
> >
> > The "NFS server foo not responding" followed by later
"NFS server foo OK"
> > messages from linux appear to be due to a combination of stupid
timeouts
> > in the linux sunrpc code and another bug which can cause automounters
> > to fall into an uninterruptible sleep. If you check "ps
auwwx" on a
> > machine which is having problems and notice proceesses in state
''D''
> > then this is biting you. Even if this doesn''t occur, the
crappy timeouts
> > in the regular linux code mean that linux perfroms very badly if it
gets
> > any errors/loss/congestion during nfs operations.
> >
> > cheers,
> >
> > S.
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: SourceForge.net Broadband
> > Sign-up now for SourceForge Broadband and get the fastest
> > 6.0/768 connection for only $19.95/mo for the first 3 months!
> > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
> >
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: SourceForge.net Broadband
> Sign-up now for SourceForge Broadband and get the fastest
> 6.0/768 connection for only $19.95/mo for the first 3 months!
> http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
-- 
Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org 
http://www.stevegt.com -- http://Infrastructures.Org

Ian Pratt

2004-May-14 08:03 UTC

head link

Re: [Xen-devel] xenolinux /dev/random

> I''ve never seen any obvious way to actually set any timeout, block
> size, or other parameters for an NFS root partition -- and it seems to
> ignore whatever''s in fstab, which makes sense.  
What happens if you do a "mount -o remount" on an NFS root? Is it
just ignored?  

It is possible to set some options on the command line. See
linux-2.4.26/Documentation/nfsroot.txt
 > Right now the only reason I''m even using NFS is because a
Xenoserver
> provider needs to be able to do backups, migration, failover, and so on.
> How are other people meeting these requirements?  Has the CoW
> development stalled?  
NFS root should be a good strategy -- its unfortunate the Linux
code has problems.  If you can find a reliable way of triggering
the Linux lockup, we''ll have a sporting chance of being able to
fix it, and hopefully get a patch into the mainline tree.

Bin Ren developed a CoW block device driver, but I don''t think
its received a huge amount of testing. Bin: could you check this in?

We also have a user-space CoW NFS server that runs in domain0 and
exports file systems to other domains. This is undergoing testing
right now.
> What about live migration?
Live migration is now working nicely -- I''ve got "one last
bug"
that effects SMP systems then I''ll check it in.

Ian

Kip Macy

2004-May-14 13:57 UTC

head link

Re: [Xen-devel] xenolinux /dev/random

>
> Right now the only reason I''m even using NFS is because a
Xenoserver
> provider needs to be able to do backups, migration, failover, and so on.
> How are other people meeting these requirements?  Has the CoW
> development stalled?  What about live migration?
I think iSCSI is the way to go. However, I don''t know of any good open
source iSCSI targets.


					-Kip
>
> Steve
>
> On Thu, May 13, 2004 at 07:54:57AM -0700, Kip Macy wrote:
> > Are you also using Linux as an NFS server? We use Linux extensively
> > in-house for client machines and have not seen this. Although
I''m sure
> > we don''t use the default Linux settings.
> >
> >
> > 				-Kip
> >
> >
> > On Thu, 13 May 2004, Steven Hand wrote:
> >
> > >
> > > >My goodness!  See the message I just now posted to xen-devel
about NFS
> > > >root hangs; could this be what we''re hitting?  The
most recent hang we
> > > >saw happened while an rsync was running over ssh *and*
someone restarted
> > > >apache...
> > > >
> > > >This wouldn''t cause the "NFS server not
responding/NFS server OK"
> > > >messages on the domain''s console, though (or does
that show up as a
> > > >symptom of this too?)
> > >
> > > I don''t think this is the cause of the NFS hangs
you''ve been seeing; that
> > > appears to be a generic linux thing (at least we see it with our
regular
> > > linux boxes as well as with xen boxes); however if you want to
test the
> > > theory the easiest thing to do is to change the /dev/random
device node
> > > to be an alias for /dev/urandom (a non-blocking but
potentiallyweaker
> > > source of randomness).
> > >
> > > The /dev/random bug only really manifested for us during boot,
only on
> > > Xen, and resulted in a permanenent hang.
> > >
> > > The "NFS server foo not responding" followed by later
"NFS server foo OK"
> > > messages from linux appear to be due to a combination of stupid
timeouts
> > > in the linux sunrpc code and another bug which can cause
automounters
> > > to fall into an uninterruptible sleep. If you check "ps
auwwx" on a
> > > machine which is having problems and notice proceesses in state
''D''
> > > then this is biting you. Even if this doesn''t occur, the
crappy timeouts
> > > in the regular linux code mean that linux perfroms very badly if
it gets
> > > any errors/loss/congestion during nfs operations.
> > >
> > > cheers,
> > >
> > > S.
> > >
> > >
> > > -------------------------------------------------------
> > > This SF.Net email is sponsored by: SourceForge.net Broadband
> > > Sign-up now for SourceForge Broadband and get the fastest
> > > 6.0/768 connection for only $19.95/mo for the first 3 months!
> > > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/xen-devel
> > >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: SourceForge.net Broadband
> > Sign-up now for SourceForge Broadband and get the fastest
> > 6.0/768 connection for only $19.95/mo for the first 3 months!
> > http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xen-devel
>
> --
> Stephen G. Traugott  (KG6HDQ)
> UNIX/Linux Infrastructure Architect, TerraLuna LLC
> stevegt@TerraLuna.Org
> http://www.stevegt.com -- http://Infrastructures.Org
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: SourceForge.net Broadband
> Sign-up now for SourceForge Broadband and get the fastest
> 6.0/768 connection for only $19.95/mo for the first 3 months!
> http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xen-devel
>

Kip Macy

2004-May-14 14:14 UTC

head link

Re: [Xen-devel] xenolinux /dev/random

>
> I''ve never seen any obvious way to actually set any timeout, block
> size, or other parameters for an NFS root partition -- and it seems to
> ignore whatever''s in fstab, which makes sense.
>Have you considered doing a -o remount early in boot passing a different
set of options? These are the options we use on linux:
defaults,intr,rsize=8192,wsize=8192,nfsvers=3,tcp,timeo=600

We obviously don''t use Linux as an NFS server so YMMV.


				-Kip

Xen devel - May 2004 - xenolinux /dev/random

[Xen-devel] xenolinux /dev/random

Re: [Xen-devel] xenolinux /dev/random

Re: [Xen-devel] xenolinux /dev/random

Re: [Xen-devel] xenolinux /dev/random

Re: [Xen-devel] xenolinux /dev/random

Re: [Xen-devel] xenolinux /dev/random

Re: [Xen-devel] xenolinux /dev/random

Re: [Xen-devel] xenolinux /dev/random