thr3ads.net - Lustre discuss - [Lustre-discuss] clients gets EINTR from time to time [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Francois Chassaing

2011-Feb-24 10:50 UTC

[Lustre-discuss] clients gets EINTR from time to time

Dear list members, 

We are using Lustre 1.8.5 (upgraded from 1.8.4) running on 1 MGS, 3 OSS over DDR
IB, and 2 patched clients mounted with the flock option.
We are experiencing issues with an application that gets a EINTR when trying to
write to a file.
Those errors happens "randomly" on both clients, which makes it
difficult to clearly spot the problem. So my app treats the error as if the file
was full (which is the case when dealing with a "normal" disk) when it
is not.
I''ve tryed to change the IB switch, so it is most probably not coming
from here (while it is a "cheap" switch). I''ve also tried to
change the client mount options, changed the stripping policy from -1 to 1, but
it did not change anything neither.
And no log of any kind is helpful on MDS or OSSs. 
I would really appreciate pointers or suggestions to debug this issue. 

Thanks 

Fran?ois CHASSAING 
Directeur Technique - CTO 
WEBORAMA 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/19230bd5/attachment.html

Brian J. Murrell

2011-Feb-24 12:17 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

On 11-02-24 05:50 AM, Francois Chassaing wrote:> Dear list members, 
Hi,
> We are experiencing issues with an application that gets a EINTR when
trying to write to a file.
If I understand that errno properly, that is to be expected.
> Those errors happens "randomly" on both clients,
Well, not "randomly".  It happens when a signal arrives.
> So my app treats the error as if the file was full
This is wrong.  Your app is broken and needs to be fixed.
> I''ve tryed to change the IB switch, so it is most probably not
coming from here (while it is a "cheap" switch). I''ve also
tried to change the client mount options, changed the stripping policy from -1
to 1, but it did not change anything neither.
None of this is going to resolve your problem.  Yours is a problem of
application programming defect, not a system fault.
> I would really appreciate pointers or suggestions to debug this issue. 
Maybe some understanding of how signals can affect system calls.  A
quick google found this for me:

http://www.gnu.org/s/libc/manual/html_node/Interrupted-Primitives.html#Interrupted-Primitives

Probably there is more detailed text out there to help you and your
application programmer to handle this application programming fault
better.  But alas, it is an application programming problem and not a
Lustre filesystem or equipment problem.

b.

-- 
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/04cfabec/attachment.bin

Francois Chassaing

2011-Feb-24 13:16 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

Well, as I understand your point and I do also understand that this signal is
not a malfunction,
my question was regarding to the intrinsic "why" (and when) does this
signal is sent to the client.

Thnaks

line
weborama	line	Fran?ois Chassaing Directeur Technique - CTO 
weborama.com - fch at weborama.com 
T : +33 (0)1 53 19 21 51 F : +33 (0)1 53 19 21 41 
Weborama - 15 rue Clavel 75019 Paris 

----- Mail Original -----
De: "Brian J. Murrell" <brian at whamcloud.com>
?: lustre-discuss at lists.lustre.org
Envoy?: Jeudi 24 F?vrier 2011 13h17:33 GMT +01:00 Amsterdam / Berlin / Berne /
Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time

On 11-02-24 05:50 AM, Francois Chassaing wrote:> Dear list members, 
Hi,
> We are experiencing issues with an application that gets a EINTR when
trying to write to a file.
If I understand that errno properly, that is to be expected.
> Those errors happens "randomly" on both clients,
Well, not "randomly".  It happens when a signal arrives.
> So my app treats the error as if the file was full
This is wrong.  Your app is broken and needs to be fixed.
> I''ve tryed to change the IB switch, so it is most probably not
coming from here (while it is a "cheap" switch). I''ve also
tried to change the client mount options, changed the stripping policy from -1
to 1, but it did not change anything neither.
None of this is going to resolve your problem.  Yours is a problem of
application programming defect, not a system fault.
> I would really appreciate pointers or suggestions to debug this issue. 
Maybe some understanding of how signals can affect system calls.  A
quick google found this for me:

http://www.gnu.org/s/libc/manual/html_node/Interrupted-Primitives.html#Interrupted-Primitives

Probably there is more detailed text out there to help you and your
application programmer to handle this application programming fault
better.  But alas, it is an application programming problem and not a
Lustre filesystem or equipment problem.

b.

-- 
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Brian J. Murrell

2011-Feb-24 13:29 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

On 11-02-24 08:16 AM, Francois Chassaing wrote:> Well, as I understand your point and I do also understand that this signal
is not a malfunction,
No, but not handling it properly is.  Interpreting an EINTR as "the disk
must be full" (i.e. a fatal error) is wrong.
> my question was regarding to the intrinsic "why" (and when) does
this signal is sent to the client.
That''s completely up to your application.  It''s the way your
application
has been written that is determining the hows and whys of signals.

b.

-- 
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/62817268/attachment.bin

Francois Chassaing

2011-Feb-24 14:35 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

OK, the app is used to deal with standard disks, that is why it is not handling
the EINTR signal propoerly.
But I assumed that Lustre is ''just'' a filesystem, so
applications do not need to handle access to it any other way that the usual
way.
Anyhow, the signal is from the OS not from the App... So it means that the OS
signals the app that it has encoutered an error while trying to write to a file,
and it is the source of that that I want to track down.
Because this app error only arise every few days, it means that it is not a
normal condition : something sowewhere in the FS causes it.
Interpreting it as a fatal error is certainly a mistake, but I still
don''t know why I''m getting this EINTR signal from the OS...

Regards

weborama line Fran?ois Chassaing Directeur Technique - CTO

----- Mail Original -----
De: "Brian J. Murrell" <brian at whamcloud.com>
?: lustre-discuss at lists.lustre.org
Envoy?: Jeudi 24 F?vrier 2011 14h29:27 GMT +01:00 Amsterdam / Berlin / Berne /
Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time

On 11-02-24 08:16 AM, Francois Chassaing wrote:> Well, as I understand your point and I do also understand that this signal
is not a malfunction,
No, but not handling it properly is. Interpreting an EINTR as "the disk
must be full" (i.e. a fatal error) is wrong.
> my question was regarding to the intrinsic "why" (and when) does
this signal is sent to the client.
That''s completely up to your application. It''s the way your
application
has been written that is determining the hows and whys of signals.

--
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Ken Hornstein

2011-Feb-24 14:54 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

>OK, the app is used to deal with standard disks, that is why it is not
>handling the EINTR signal propoerly.
I think you''re misunderstanding what a "signal" is in the
Unix sense.

EINTR isn''t a signal; it''s a return code from the write()
system call
that says, "Hey, you got a signal in the middle of this write() call
and it didn''t complete".  It doesn''t mean that there was
an error
writing the file; if that was happening, you''d get a (presumably
different) error code.  Signals can be sent by the operating system,
but those signals are things like SIGSEGV, which basically means,
"you''re
program screwed up".  Programs can also send signals to each other,
with kill(2) and the like.

Now, NORMALLY systems calls like write() are interrupted by signals
when you''re writing to "slow" devices, like network sockets. 
According
to the signal(7) man page, disks are not normally considered slow
devices, so I can understand the application not being used to handling
this.  And you know, now that I think about it I''m not even sure that
network filesystems SHOULD allow I/O system calls to be interrupted by
signals ... I''d have to think more about it.

I suspect what happened is that something changed between 1.8.5 and the
previous version of Lustre that you were using that allowed some operations
to be interruptable by signals.  Some things to try:

- Check to see if you are, in fact, receiving a signal in your application
  and Lustre isn''t returning EINTR for some other reason.
- If you are receiving a signal, when you set the signal handler for it
  you could use the SA_RESTART flag to restart the interrupted I/O; I think
  that would make everything work like it did before.

--Ken

Francois Chassaing

2011-Feb-24 15:52 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

OK, thanks it makes it more clear.
I indeed messed up my mind (and words) between signals and error return codes.
I did understood that the write()/pwrite() system was returning the EINTR error
code because it received a signal, but I supposed that the signal was sent
because of an error condition somewhere in the FS.
This is where I now think I''m wrong. 
 
As for your questions :
- I have to mention that I always had had this issue, and this is why
I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it.
- I will try to have that SA_RESTART flag set in the app... if I can find where
the signal handler is set.
- How can I see that lustre is returning EINTR for any other reason ? As I said
no logs shows nothing neither on MDS or OSSs, but I didn''t go through
examining "lctl debug_kernel" yet... which I''m going to do
right away...

my last question is : how can I tell which signal I am receiving ? because my
app doesn''t say, it just dumps outs the write/pwrite error code.
And if there is no signal handler, then it should follow the
"standard" actions (as of man 7 signal). On the other hand, my app
does not stop or dump core, and is not ignored, so it has to be handled in the
code. Correct me if I''m wrong...

At that point, you realize that I didn''t write the app, nor am I a good
Linux guru ;-)

Tnaks a lot.

weborama	line	Fran?ois Chassaing Directeur Technique - CTO 

----- Mail Original -----
De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil>
?: "Francois Chassaing" <fch at weborama.com>
Cc: lustre-discuss at lists.lustre.org
Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin / Berne /
Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time
>OK, the app is used to deal with standard disks, that is why it is not
>handling the EINTR signal propoerly.
I think you''re misunderstanding what a "signal" is in the
Unix sense.

EINTR isn''t a signal; it''s a return code from the write()
system call
that says, "Hey, you got a signal in the middle of this write() call
and it didn''t complete".  It doesn''t mean that there was
an error
writing the file; if that was happening, you''d get a (presumably
different) error code.  Signals can be sent by the operating system,
but those signals are things like SIGSEGV, which basically means,
"you''re
program screwed up".  Programs can also send signals to each other,
with kill(2) and the like.

Now, NORMALLY systems calls like write() are interrupted by signals
when you''re writing to "slow" devices, like network sockets. 
According
to the signal(7) man page, disks are not normally considered slow
devices, so I can understand the application not being used to handling
this.  And you know, now that I think about it I''m not even sure that
network filesystems SHOULD allow I/O system calls to be interrupted by
signals ... I''d have to think more about it.

I suspect what happened is that something changed between 1.8.5 and the
previous version of Lustre that you were using that allowed some operations
to be interruptable by signals.  Some things to try:

- Check to see if you are, in fact, receiving a signal in your application
  and Lustre isn''t returning EINTR for some other reason.
- If you are receiving a signal, when you set the signal handler for it
  you could use the SA_RESTART flag to restart the interrupted I/O; I think
  that would make everything work like it did before.

--Ken

Ken Hornstein

2011-Feb-24 16:21 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

>As for your questions :
>- I have to mention that I always had had this issue, and this is why
>I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it.
Ah, okay, I misunderstood that; my apologies.
>- I will try to have that SA_RESTART flag set in the app... if I can
>find where the signal handler is set.
Searching for sigaction or signal should help there.
>- How can I see that lustre is returning EINTR for any other reason ?
>As I said no logs shows nothing neither on MDS or OSSs, but I
didn''t go
>through examining "lctl debug_kernel" yet... which I''m
going to do
>right away...
Weeelll ... that was just a guess on my part.  I did a quick grep
though the Lustre sources and saw a few places where EINTR was
returned, but most of those seemed to deal with the case where I/O was
interrupted (those places happened fairly far down in the stack; it
wasn''t clear to me that those errors would ever bubble back up to a
return code to a system call).  If _that_ is the issue, then tracking
that down will be a challenge.
>my last question is : how can I tell which signal I am receiving ?
>because my app doesn''t say, it just dumps outs the write/pwrite
error
>code.
I think your easiest way is to use strace; something like "strace -e
signal"
should do the right thing (that will only trace signals, not all system calls).
>And if there is no signal handler, then it should follow the
"standard"
>actions (as of man 7 signal). On the other hand, my app does not stop
>or dump core, and is not ignored, so it has to be handled in the code.
>Correct me if I''m wrong...
That is my understanding as well; if you don''t have a signal handler
installed, the default action should be taking place, and if the
default action is to ignore the signal that you shouldn''t be getting
EINTR.  But hey, I''ve been wrong before :-)

--Ken

DEGREMONT Aurelien

2011-Feb-24 16:57 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

Hello

 From my understanding, Lustre can return EINTR for some I/O error cases.
I think that when a client gets evicted in the middle of one of its RPC, 
it can returns EINTR to the caller.
Is this can explain your issue?

Can your verify your clients where not evicted at the same time?

Aur?lien

Francois Chassaing a ?crit :> OK, thanks it makes it more clear.
> I indeed messed up my mind (and words) between signals and error return
codes.
> I did understood that the write()/pwrite() system was returning the EINTR
error code because it received a signal, but I supposed that the signal was sent
because of an error condition somewhere in the FS.
> This is where I now think I''m wrong. 
>  
> As for your questions :
> - I have to mention that I always had had this issue, and this is why
I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it.
> - I will try to have that SA_RESTART flag set in the app... if I can find
where the signal handler is set.
> - How can I see that lustre is returning EINTR for any other reason ? As I
said no logs shows nothing neither on MDS or OSSs, but I didn''t go
through examining "lctl debug_kernel" yet... which I''m going
to do right away...
>
> my last question is : how can I tell which signal I am receiving ? because
my app doesn''t say, it just dumps outs the write/pwrite error code.
> And if there is no signal handler, then it should follow the
"standard" actions (as of man 7 signal). On the other hand, my app
does not stop or dump core, and is not ignored, so it has to be handled in the
code. Correct me if I''m wrong...
>
> At that point, you realize that I didn''t write the app, nor am I a
good Linux guru ;-)
>
> Tnaks a lot.
>
> weborama	line	Fran?ois Chassaing Directeur Technique - CTO 
>
> ----- Mail Original -----
> De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil>
> ?: "Francois Chassaing" <fch at weborama.com>
> Cc: lustre-discuss at lists.lustre.org
> Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin /
Berne / Rome / Stockholm / Vienne
> Objet: Re: [Lustre-discuss] clients gets EINTR from time to time
>
>   
>> OK, the app is used to deal with standard disks, that is why it is not
>> handling the EINTR signal propoerly.
>>     
>
> I think you''re misunderstanding what a "signal" is in
the Unix sense.
>
> EINTR isn''t a signal; it''s a return code from the write()
system call
> that says, "Hey, you got a signal in the middle of this write() call
> and it didn''t complete".  It doesn''t mean that there
was an error
> writing the file; if that was happening, you''d get a (presumably
> different) error code.  Signals can be sent by the operating system,
> but those signals are things like SIGSEGV, which basically means,
"you''re
> program screwed up".  Programs can also send signals to each other,
> with kill(2) and the like.
>
> Now, NORMALLY systems calls like write() are interrupted by signals
> when you''re writing to "slow" devices, like network
sockets.  According
> to the signal(7) man page, disks are not normally considered slow
> devices, so I can understand the application not being used to handling
> this.  And you know, now that I think about it I''m not even sure
that
> network filesystems SHOULD allow I/O system calls to be interrupted by
> signals ... I''d have to think more about it.
>
> I suspect what happened is that something changed between 1.8.5 and the
> previous version of Lustre that you were using that allowed some operations
> to be interruptable by signals.  Some things to try:
>
> - Check to see if you are, in fact, receiving a signal in your application
>   and Lustre isn''t returning EINTR for some other reason.
> - If you are receiving a signal, when you set the signal handler for it
>   you could use the SA_RESTART flag to restart the interrupted I/O; I think
>   that would make everything work like it did before.
>
> --Ken
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Brian J. Murrell

2011-Feb-24 17:34 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

On 11-02-24 11:57 AM, DEGREMONT Aurelien wrote:> Hello
Hi,
>  From my understanding, Lustre can return EINTR for some I/O error cases.
I think that should/would be an EIO.
> I think that when a client gets evicted in the middle of one of its RPC, 
> it can returns EINTR to the caller.
An evicted client should get an EIO on it''s I/O calls, IIRC.

b.

-- 
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/1a27c142/attachment.bin

Kevin Van Maren

2011-Feb-24 17:43 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

No, in case of an eviction or IO errors, EIO is returned to the 
application, not EINTR.

Kevin


DEGREMONT Aurelien wrote:> Hello
>
>  From my understanding, Lustre can return EINTR for some I/O error cases.
> I think that when a client gets evicted in the middle of one of its RPC, 
> it can returns EINTR to the caller.
> Is this can explain your issue?
>
> Can your verify your clients where not evicted at the same time?
>
> Aur?lien
>
> Francois Chassaing a ?crit :
>   
>> OK, thanks it makes it more clear.
>> I indeed messed up my mind (and words) between signals and error return
codes.
>> I did understood that the write()/pwrite() system was returning the
EINTR error code because it received a signal, but I supposed that the signal
was sent because of an error condition somewhere in the FS.
>> This is where I now think I''m wrong. 
>>  
>> As for your questions :
>> - I have to mention that I always had had this issue, and this is why
I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it.
>> - I will try to have that SA_RESTART flag set in the app... if I can
find where the signal handler is set.
>> - How can I see that lustre is returning EINTR for any other reason ?
As I said no logs shows nothing neither on MDS or OSSs, but I didn''t go
through examining "lctl debug_kernel" yet... which I''m going
to do right away...
>>
>> my last question is : how can I tell which signal I am receiving ?
because my app doesn''t say, it just dumps outs the write/pwrite error
code.
>> And if there is no signal handler, then it should follow the
"standard" actions (as of man 7 signal). On the other hand, my app
does not stop or dump core, and is not ignored, so it has to be handled in the
code. Correct me if I''m wrong...
>>
>> At that point, you realize that I didn''t write the app, nor am
I a good Linux guru ;-)
>>
>> Tnaks a lot.
>>
>> weborama	line	Fran?ois Chassaing Directeur Technique - CTO 
>>
>> ----- Mail Original -----
>> De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil>
>> ?: "Francois Chassaing" <fch at weborama.com>
>> Cc: lustre-discuss at lists.lustre.org
>> Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin /
Berne / Rome / Stockholm / Vienne
>> Objet: Re: [Lustre-discuss] clients gets EINTR from time to time
>>
>>   
>>     
>>> OK, the app is used to deal with standard disks, that is why it is
not
>>> handling the EINTR signal propoerly.
>>>     
>>>       
>> I think you''re misunderstanding what a "signal" is
in the Unix sense.
>>
>> EINTR isn''t a signal; it''s a return code from the
write() system call
>> that says, "Hey, you got a signal in the middle of this write()
call
>> and it didn''t complete".  It doesn''t mean that
there was an error
>> writing the file; if that was happening, you''d get a
(presumably
>> different) error code.  Signals can be sent by the operating system,
>> but those signals are things like SIGSEGV, which basically means,
"you''re
>> program screwed up".  Programs can also send signals to each
other,
>> with kill(2) and the like.
>>
>> Now, NORMALLY systems calls like write() are interrupted by signals
>> when you''re writing to "slow" devices, like network
sockets.  According
>> to the signal(7) man page, disks are not normally considered slow
>> devices, so I can understand the application not being used to handling
>> this.  And you know, now that I think about it I''m not even
sure that
>> network filesystems SHOULD allow I/O system calls to be interrupted by
>> signals ... I''d have to think more about it.
>>
>> I suspect what happened is that something changed between 1.8.5 and the
>> previous version of Lustre that you were using that allowed some
operations
>> to be interruptable by signals.  Some things to try:
>>
>> - Check to see if you are, in fact, receiving a signal in your
application
>>   and Lustre isn''t returning EINTR for some other reason.
>> - If you are receiving a signal, when you set the signal handler for it
>>   you could use the SA_RESTART flag to restart the interrupted I/O; I
think
>>   that would make everything work like it did before.
>>
>> --Ken
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Francois Chassaing

2011-Feb-25 11:18 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

Thanks, but anyway, logs on the MDS/MGS does not show evicted client of any
kind.
Also, the log output by lctl debug_kernel on clients does not show much, I can
only see in there the last administrative actions I''ve taken (such as
setting striping policy on a directory, creating a new server pool, ...) and
four unrelated (because not happening at my problem hours) "Dropping PUT
from"

I continue to parse debug logs and keep them posted.

Thanks

weborama	line	Fran?ois Chassaing Directeur Technique - CTO 

----- Mail Original -----
De: "Kevin Van Maren" <kevin.van.maren at oracle.com>
?: "DEGREMONT Aurelien" <aurelien.degremont at cea.fr>
Cc: "Francois Chassaing" <fch at weborama.com>, lustre-discuss
at lists.lustre.org
Envoy?: Jeudi 24 F?vrier 2011 18h43:25 GMT +01:00 Amsterdam / Berlin / Berne /
Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time

No, in case of an eviction or IO errors, EIO is returned to the 
application, not EINTR.

Kevin


DEGREMONT Aurelien wrote:> Hello
>
>  From my understanding, Lustre can return EINTR for some I/O error cases.
> I think that when a client gets evicted in the middle of one of its RPC, 
> it can returns EINTR to the caller.
> Is this can explain your issue?
>
> Can your verify your clients where not evicted at the same time?
>
> Aur?lien
>
> Francois Chassaing a ?crit :
>   
>> OK, thanks it makes it more clear.
>> I indeed messed up my mind (and words) between signals and error return
codes.
>> I did understood that the write()/pwrite() system was returning the
EINTR error code because it received a signal, but I supposed that the signal
was sent because of an error condition somewhere in the FS.
>> This is where I now think I''m wrong. 
>>  
>> As for your questions :
>> - I have to mention that I always had had this issue, and this is why
I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it.
>> - I will try to have that SA_RESTART flag set in the app... if I can
find where the signal handler is set.
>> - How can I see that lustre is returning EINTR for any other reason ?
As I said no logs shows nothing neither on MDS or OSSs, but I didn''t go
through examining "lctl debug_kernel" yet... which I''m going
to do right away...
>>
>> my last question is : how can I tell which signal I am receiving ?
because my app doesn''t say, it just dumps outs the write/pwrite error
code.
>> And if there is no signal handler, then it should follow the
"standard" actions (as of man 7 signal). On the other hand, my app
does not stop or dump core, and is not ignored, so it has to be handled in the
code. Correct me if I''m wrong...
>>
>> At that point, you realize that I didn''t write the app, nor am
I a good Linux guru ;-)
>>
>> Tnaks a lot.
>>
>> weborama	line	Fran?ois Chassaing Directeur Technique - CTO 
>>
>> ----- Mail Original -----
>> De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil>
>> ?: "Francois Chassaing" <fch at weborama.com>
>> Cc: lustre-discuss at lists.lustre.org
>> Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin /
Berne / Rome / Stockholm / Vienne
>> Objet: Re: [Lustre-discuss] clients gets EINTR from time to time
>>
>>   
>>     
>>> OK, the app is used to deal with standard disks, that is why it is
not
>>> handling the EINTR signal propoerly.
>>>     
>>>       
>> I think you''re misunderstanding what a "signal" is
in the Unix sense.
>>
>> EINTR isn''t a signal; it''s a return code from the
write() system call
>> that says, "Hey, you got a signal in the middle of this write()
call
>> and it didn''t complete".  It doesn''t mean that
there was an error
>> writing the file; if that was happening, you''d get a
(presumably
>> different) error code.  Signals can be sent by the operating system,
>> but those signals are things like SIGSEGV, which basically means,
"you''re
>> program screwed up".  Programs can also send signals to each
other,
>> with kill(2) and the like.
>>
>> Now, NORMALLY systems calls like write() are interrupted by signals
>> when you''re writing to "slow" devices, like network
sockets.  According
>> to the signal(7) man page, disks are not normally considered slow
>> devices, so I can understand the application not being used to handling
>> this.  And you know, now that I think about it I''m not even
sure that
>> network filesystems SHOULD allow I/O system calls to be interrupted by
>> signals ... I''d have to think more about it.
>>
>> I suspect what happened is that something changed between 1.8.5 and the
>> previous version of Lustre that you were using that allowed some
operations
>> to be interruptable by signals.  Some things to try:
>>
>> - Check to see if you are, in fact, receiving a signal in your
application
>>   and Lustre isn''t returning EINTR for some other reason.
>> - If you are receiving a signal, when you set the signal handler for it
>>   you could use the SA_RESTART flag to restart the interrupted I/O; I
think
>>   that would make everything work like it did before.
>>
>> --Ken
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>   
>>     
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Brian J. Murrell

2011-Feb-25 13:28 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

On 11-02-25 06:18 AM, Francois Chassaing wrote:> Thanks, but anyway, logs on the MDS/MGS does not show evicted client of any
kind.
> Also, the log output by lctl debug_kernel on clients does not show much, I
can only see in there the last administrative actions I''ve taken (such
as setting striping policy on a directory, creating a new server pool, ...) and
four unrelated (because not happening at my problem hours) "Dropping PUT
from"
> 
> I continue to parse debug logs and keep them posted.
I don''t understand why you don''t just fix your application to
handle a
perfectly valid and expected condition (that it''s currently not
handling) instead of wasting time trying to find the cause of the
expected condition.  Even if you find it, it''s likely not a bug and not
something that can/will be fixed.  It''s your application that needs to
be fixed.

b.

-- 
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110225/d1b55bb8/attachment.bin

Francois Chassaing

2011-Feb-25 13:39 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

Maybe, once you''ll explain to me why it is a perfectly expected
condition. Because this I still don''t get.
How can it be expected that a file cannot be written because of an interrupted
system call, when all conditions are apparently met to write successfully to
this file : a single client writing to a single file, no locking or concurrent
access from other clients, no client eviction shown, no hardware failure, no
nothing.
Please remember that this application traditionnaly deals with standard disks
which DO NOT get EINTR except on error conditions...

Regards

weborama line Fran?ois Chassaing Directeur Technique - CTO

----- Mail Original -----
De: "Brian J. Murrell" <brian at whamcloud.com>
?: lustre-discuss at lists.lustre.org
Envoy?: Vendredi 25 F?vrier 2011 14h28:02 GMT +01:00 Amsterdam / Berlin / Berne
/ Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time

On 11-02-25 06:18 AM, Francois Chassaing wrote:> Thanks, but anyway, logs on the MDS/MGS does not show evicted client of any
kind.
> Also, the log output by lctl debug_kernel on clients does not show much, I
can only see in there the last administrative actions I''ve taken (such
as setting striping policy on a directory, creating a new server pool, ...) and
four unrelated (because not happening at my problem hours) "Dropping PUT
from"
>
> I continue to parse debug logs and keep them posted.
I don''t understand why you don''t just fix your application to
handle a
perfectly valid and expected condition (that it''s currently not
handling) instead of wasting time trying to find the cause of the
expected condition. Even if you find it, it''s likely not a bug and not
something that can/will be fixed. It''s your application that needs to
be fixed.

--
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Ken Hornstein

2011-Feb-25 14:00 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

>I don''t understand why you don''t just fix your application
to handle a
>perfectly valid and expected condition (that it''s currently not
>handling) instead of wasting time trying to find the cause of the
>expected condition.  Even if you find it, it''s likely not a bug and
not
>something that can/will be fixed.  It''s your application that needs
to
>be fixed.
To be fair ... normally disk I/O operations are not interruptable by
signals, so it''s not an unreasonable behavior on the part of an
application.  I did check POSIX, and it doesn''t say that behavior is
restricted only to network sockets, so yeah, it''s TECHNICALLY allowable
behavior according to the standard (although the Linux manpage for
signal(7) says that it will not happen).  But honestly, I''ve seen
plenty of cases where applications handle this for network I/O; it''s
normal, everyone knows it will happen there.  But for _disk_ I/O?
Never seen it done.  I''m not saying that there are no applications that
handle this case, but it''s certainly very uncommon.  I freely admit
that network filesystems sort of mix the concepts of "network socket"
and "disk I/O" together, and what is the "right" behavior is
unclear.
But calling this perfectly valid and expected is not quite accurate.
It would be interesting to see what other network filesystems do under
the same circumstances.

--Ken

Cory Spitz

2011-Feb-25 15:02 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

Hi.

I think it would help if you knew what the signal was.  Do you have that 
yet?

I have a report from a user that is is getting EINTR when a SIGALRM goes 
off on his write().  It isn''t unexpected to get SIGALRM because he 
called the alarm, but he also has SA_RESTART set.  I can''t remember 
whose responsibility it is to restart the call, syscall or whereever, 
but it seems that someone is dropping the ball because if EINTR is 
returned then SA_RESTART didn''t seem to do the trick, right?

Thanks,
-Cory

On 2/25/2011 8:00 AM, Ken Hornstein wrote:>> I don''t understand why you don''t just fix your
application to handle a
>> perfectly valid and expected condition (that it''s currently
not
>> handling) instead of wasting time trying to find the cause of the
>> expected condition.  Even if you find it, it''s likely not a
bug and not
>> something that can/will be fixed.  It''s your application that
needs to
>> be fixed.
>
> To be fair ... normally disk I/O operations are not interruptable by
> signals, so it''s not an unreasonable behavior on the part of an
> application.  I did check POSIX, and it doesn''t say that behavior
is
> restricted only to network sockets, so yeah, it''s TECHNICALLY
allowable
> behavior according to the standard (although the Linux manpage for
> signal(7) says that it will not happen).  But honestly, I''ve seen
> plenty of cases where applications handle this for network I/O;
it''s
> normal, everyone knows it will happen there.  But for _disk_ I/O?
> Never seen it done.  I''m not saying that there are no applications
that
> handle this case, but it''s certainly very uncommon.  I freely
admit
> that network filesystems sort of mix the concepts of "network
socket"
> and "disk I/O" together, and what is the "right"
behavior is unclear.
> But calling this perfectly valid and expected is not quite accurate.
> It would be interesting to see what other network filesystems do under
> the same circumstances.
>
> --Ken
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Ken Hornstein

2011-Feb-25 15:47 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

>I have a report from a user that is is getting EINTR when a SIGALRM goes 
>off on his write().  It isn''t unexpected to get SIGALRM because he 
>called the alarm, but he also has SA_RESTART set.  I can''t remember
>whose responsibility it is to restart the call, syscall or whereever, 
>but it seems that someone is dropping the ball because if EINTR is 
>returned then SA_RESTART didn''t seem to do the trick, right?
I would agree with you on that one; if you''re setting SA_RESTART then
you shouldn''t ever get EINTR.  It looks like what should be happening
is that if you get interrupted the system call should return
ERESTARTSYS and then after the signal handler is done the system call
should be re-run for you by the signal handling code.

I see that at least for some cases, Lustre will use ERESTARTSYS; just a
guess, but maybe somewhere Lustre is returning EINTR itself instead of
returning ERESTARTSYS?

--Ken

Andreas Dilger

2011-Feb-25 17:39 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at
whamcloud.com> wrote:> On 11-02-25 06:18 AM, Francois  wrote:
>> 
>> I continue to parse debug logs and keep them posted.
> 
> I don''t understand why you don''t just fix your
application to handle a
> perfectly valid and expected condition (that it''s currently not
> handling) instead of wasting time trying to find the cause of the
> expected condition.  Even if you find it, it''s likely not a bug
and not
> something that can/will be fixed.  It''s your application that
needs to
> be fixed.
In all fairness Brian, it isn''t always possible to fix an application
like you suggest. It might be commercial (binary only), it might be complex code
using 3rd party libraries to do the IO that would lose support if modifed, etc.

I think the first action to debug this is to run on the client with "lctl
set_param debug=+trace" or "=~0" which will enable function
entry/exit tracing in Lustre. Then when the problem us hit run "lctl dk
/tmp/debug" to dump the Lustre debug log, and search for -4 (which is
-EINTR) to see where this error is first appearing.

At that point we can make a determination where the source of the error is, and
if it is Lustre''s fault. I know at one time there was a related problem
in the l_wait_event() macro that was improperly masking signals, but I thought
it was fixed by 1.8.5.

Cheers, Andreas

John Hammond

2011-Feb-25 20:16 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

On 02/25/2011 11:39 AM, Andreas Dilger wrote:> On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at
whamcloud.com> wrote:
>> On 11-02-25 06:18 AM, Francois  wrote:
>>>
>>> I continue to parse debug logs and keep them posted.
>>
>> I don''t understand why you don''t just fix your
application to handle a
>> perfectly valid and expected condition (that it''s currently
not
>> handling) instead of wasting time trying to find the cause of the
>> expected condition.  Even if you find it, it''s likely not a
bug and not
>> something that can/will be fixed.  It''s your application that
needs to
>> be fixed.
> 
> In all fairness Brian, it isn''t always possible to fix an
application like you suggest. It might be commercial (binary only), it might be
complex code using 3rd party libraries to do the IO that would lose support if
modifed, etc.
> 
> I think the first action to debug this is to run on the client with
"lctl set_param debug=+trace" or "=~0" which will enable
function entry/exit tracing in Lustre. Then when the problem us hit run
"lctl dk /tmp/debug" to dump the Lustre debug log, and search for -4
(which is -EINTR) to see where this error is first appearing.
> 
> At that point we can make a determination where the source of the error is,
and if it is Lustre''s fault. I know at one time there was a related
problem in the l_wait_event() macro that was improperly masking signals, but I
thought it was fixed by 1.8.5.
Setting aside the moral question of which calls should be interruptible,
I think that the handling of the LUSTRE_FATAL_SIGS (defined in
lustre_lib.h to be SIGKILL, SIGINT, SIGTERM, SIGQUIT, SIGALRM) is
slightly broken.  Under certain situations, Lustre will return -EINTR
although no signals were delivered.  That''s probably not the end of the
world for most applications, but OTOH I don''t think anybody assumes
that
-EINTR will be delivered spuriously.

Consider the following sequence:

1) Process P has a Lustre file F open.

2) P has SIGALRM pending (but blocked).

3) P starts to writing to F and ends up sleeping in (something like):

  sys_write()
   ...
    ll_extent_lock()
     ...
      osc_enqueue()
       ...
        ptlrpc_queue_wait().

4) The OST does not respond to the request before the deadline, so
l_wait_event() replaces the signal mask of P with the LUSTRE_FATAL_SIGS,
notices that SIGALRM is now deliverable, restores the signal mask of P,
and ptlrpc_queue_wait() returns -EINTR.

5) P is exiting from sys_write(), SIGALRM is blocked (but still pending)
so it doesn''t get delivered.

6) P spuriously returns -EINTR from sys_write().

I can reproduce this on 1.8.5/RHEL 5.5.  If the goal is to emulate
NFS''s
interruptibility during congestion then returning -ERESTARTSYS would be
more appropriate.  Also, it might be worthwhile to make this extra
interruptibility a mount flag, as NFS does.

Best,

John

-- 
John L. Hammond, Ph.D.
TACC, The University of Texas at Austin
jhammond at tacc.utexas.edu

Francois Chassaing

2011-Mar-04 10:04 UTC

head link

[Lustre-discuss] clients gets EINTR from time to time

Dear list,
still investigating on this issue, I am now struggling with debugging..
The issue arose once more yesterday, so I started to look at it deeper and
decided that the "trace" debug should be written to disk using
debug_daemon.
Alas, debugging with only the "trace" debug active spits more than 100
MB/s worth of log ! (yes these are busy clients)...
I''ve tried several strategies like using debug_kernel from a cron job,
or while watching my products error log, but even there dk would dump 70MB worth
of data representing less that one second of debug log...
So chances for me to trace the signal seems looow.
Is there any debug flag less verbose but that may include the signal
I''m looking for ?

Given John''s answers could I maybe use /proc/sys/lustre/dump_on_timeout
to dump the log only when timeout happens, but this will work only if my problem
is matching what John can reproduce.

Please also note that I''ve looked around for abnormal threads_started
numbers, it is everywhere at the same value than threads_min, except for one mdt
entry which is at thread_min+1...

Regards

weborama	line	Fran?ois Chassaing Directeur Technique - CTO 

----- Mail Original -----
De: "John Hammond" <jhammond at tacc.utexas.edu>
?: "Andreas Dilger" <adilger at whamcloud.com>
Cc: lustre-discuss at lists.lustre.org
Envoy?: Vendredi 25 F?vrier 2011 21h16:36 GMT +01:00 Amsterdam / Berlin / Berne
/ Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time

On 02/25/2011 11:39 AM, Andreas Dilger wrote:> On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at
whamcloud.com> wrote:
>> On 11-02-25 06:18 AM, Francois  wrote:
>>>
>>> I continue to parse debug logs and keep them posted.
>>
>> I don''t understand why you don''t just fix your
application to handle a
>> perfectly valid and expected condition (that it''s currently
not
>> handling) instead of wasting time trying to find the cause of the
>> expected condition.  Even if you find it, it''s likely not a
bug and not
>> something that can/will be fixed.  It''s your application that
needs to
>> be fixed.
> 
> In all fairness Brian, it isn''t always possible to fix an
application like you suggest. It might be commercial (binary only), it might be
complex code using 3rd party libraries to do the IO that would lose support if
modifed, etc.
> 
> I think the first action to debug this is to run on the client with
"lctl set_param debug=+trace" or "=~0" which will enable
function entry/exit tracing in Lustre. Then when the problem us hit run
"lctl dk /tmp/debug" to dump the Lustre debug log, and search for -4
(which is -EINTR) to see where this error is first appearing.
> 
> At that point we can make a determination where the source of the error is,
and if it is Lustre''s fault. I know at one time there was a related
problem in the l_wait_event() macro that was improperly masking signals, but I
thought it was fixed by 1.8.5.
Setting aside the moral question of which calls should be interruptible,
I think that the handling of the LUSTRE_FATAL_SIGS (defined in
lustre_lib.h to be SIGKILL, SIGINT, SIGTERM, SIGQUIT, SIGALRM) is
slightly broken.  Under certain situations, Lustre will return -EINTR
although no signals were delivered.  That''s probably not the end of the
world for most applications, but OTOH I don''t think anybody assumes
that
-EINTR will be delivered spuriously.

Consider the following sequence:

1) Process P has a Lustre file F open.

2) P has SIGALRM pending (but blocked).

3) P starts to writing to F and ends up sleeping in (something like):

  sys_write()
   ...
    ll_extent_lock()
     ...
      osc_enqueue()
       ...
        ptlrpc_queue_wait().

4) The OST does not respond to the request before the deadline, so
l_wait_event() replaces the signal mask of P with the LUSTRE_FATAL_SIGS,
notices that SIGALRM is now deliverable, restores the signal mask of P,
and ptlrpc_queue_wait() returns -EINTR.

5) P is exiting from sys_write(), SIGALRM is blocked (but still pending)
so it doesn''t get delivered.

6) P spuriously returns -EINTR from sys_write().

I can reproduce this on 1.8.5/RHEL 5.5.  If the goal is to emulate
NFS''s
interruptibility during congestion then returning -ERESTARTSYS would be
more appropriate.  Also, it might be worthwhile to make this extra
interruptibility a mount flag, as NFS does.

Best,

John

-- 
John L. Hammond, Ph.D.
TACC, The University of Texas at Austin
jhammond at tacc.utexas.edu
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre discuss - Feb 2011 - clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time

[Lustre-discuss] clients gets EINTR from time to time