Francois Chassaing
2011-Feb-24 10:50 UTC
[Lustre-discuss] clients gets EINTR from time to time
Dear list members, We are using Lustre 1.8.5 (upgraded from 1.8.4) running on 1 MGS, 3 OSS over DDR IB, and 2 patched clients mounted with the flock option. We are experiencing issues with an application that gets a EINTR when trying to write to a file. Those errors happens "randomly" on both clients, which makes it difficult to clearly spot the problem. So my app treats the error as if the file was full (which is the case when dealing with a "normal" disk) when it is not. I''ve tryed to change the IB switch, so it is most probably not coming from here (while it is a "cheap" switch). I''ve also tried to change the client mount options, changed the stripping policy from -1 to 1, but it did not change anything neither. And no log of any kind is helpful on MDS or OSSs. I would really appreciate pointers or suggestions to debug this issue. Thanks Fran?ois CHASSAING Directeur Technique - CTO WEBORAMA -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/19230bd5/attachment.html
Brian J. Murrell
2011-Feb-24 12:17 UTC
[Lustre-discuss] clients gets EINTR from time to time
On 11-02-24 05:50 AM, Francois Chassaing wrote:> Dear list members,Hi,> We are experiencing issues with an application that gets a EINTR when trying to write to a file.If I understand that errno properly, that is to be expected.> Those errors happens "randomly" on both clients,Well, not "randomly". It happens when a signal arrives.> So my app treats the error as if the file was fullThis is wrong. Your app is broken and needs to be fixed.> I''ve tryed to change the IB switch, so it is most probably not coming from here (while it is a "cheap" switch). I''ve also tried to change the client mount options, changed the stripping policy from -1 to 1, but it did not change anything neither.None of this is going to resolve your problem. Yours is a problem of application programming defect, not a system fault.> I would really appreciate pointers or suggestions to debug this issue.Maybe some understanding of how signals can affect system calls. A quick google found this for me: http://www.gnu.org/s/libc/manual/html_node/Interrupted-Primitives.html#Interrupted-Primitives Probably there is more detailed text out there to help you and your application programmer to handle this application programming fault better. But alas, it is an application programming problem and not a Lustre filesystem or equipment problem. b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/04cfabec/attachment.bin
Francois Chassaing
2011-Feb-24 13:16 UTC
[Lustre-discuss] clients gets EINTR from time to time
Well, as I understand your point and I do also understand that this signal is not a malfunction, my question was regarding to the intrinsic "why" (and when) does this signal is sent to the client. Thnaks line weborama line Fran?ois Chassaing Directeur Technique - CTO weborama.com - fch at weborama.com T : +33 (0)1 53 19 21 51 F : +33 (0)1 53 19 21 41 Weborama - 15 rue Clavel 75019 Paris ----- Mail Original ----- De: "Brian J. Murrell" <brian at whamcloud.com> ?: lustre-discuss at lists.lustre.org Envoy?: Jeudi 24 F?vrier 2011 13h17:33 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [Lustre-discuss] clients gets EINTR from time to time On 11-02-24 05:50 AM, Francois Chassaing wrote:> Dear list members,Hi,> We are experiencing issues with an application that gets a EINTR when trying to write to a file.If I understand that errno properly, that is to be expected.> Those errors happens "randomly" on both clients,Well, not "randomly". It happens when a signal arrives.> So my app treats the error as if the file was fullThis is wrong. Your app is broken and needs to be fixed.> I''ve tryed to change the IB switch, so it is most probably not coming from here (while it is a "cheap" switch). I''ve also tried to change the client mount options, changed the stripping policy from -1 to 1, but it did not change anything neither.None of this is going to resolve your problem. Yours is a problem of application programming defect, not a system fault.> I would really appreciate pointers or suggestions to debug this issue.Maybe some understanding of how signals can affect system calls. A quick google found this for me: http://www.gnu.org/s/libc/manual/html_node/Interrupted-Primitives.html#Interrupted-Primitives Probably there is more detailed text out there to help you and your application programmer to handle this application programming fault better. But alas, it is an application programming problem and not a Lustre filesystem or equipment problem. b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Brian J. Murrell
2011-Feb-24 13:29 UTC
[Lustre-discuss] clients gets EINTR from time to time
On 11-02-24 08:16 AM, Francois Chassaing wrote:> Well, as I understand your point and I do also understand that this signal is not a malfunction,No, but not handling it properly is. Interpreting an EINTR as "the disk must be full" (i.e. a fatal error) is wrong.> my question was regarding to the intrinsic "why" (and when) does this signal is sent to the client.That''s completely up to your application. It''s the way your application has been written that is determining the hows and whys of signals. b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/62817268/attachment.bin
Francois Chassaing
2011-Feb-24 14:35 UTC
[Lustre-discuss] clients gets EINTR from time to time
OK, the app is used to deal with standard disks, that is why it is not handling the EINTR signal propoerly. But I assumed that Lustre is ''just'' a filesystem, so applications do not need to handle access to it any other way that the usual way. Anyhow, the signal is from the OS not from the App... So it means that the OS signals the app that it has encoutered an error while trying to write to a file, and it is the source of that that I want to track down. Because this app error only arise every few days, it means that it is not a normal condition : something sowewhere in the FS causes it. Interpreting it as a fatal error is certainly a mistake, but I still don''t know why I''m getting this EINTR signal from the OS... Regards weborama line Fran?ois Chassaing Directeur Technique - CTO ----- Mail Original ----- De: "Brian J. Murrell" <brian at whamcloud.com> ?: lustre-discuss at lists.lustre.org Envoy?: Jeudi 24 F?vrier 2011 14h29:27 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [Lustre-discuss] clients gets EINTR from time to time On 11-02-24 08:16 AM, Francois Chassaing wrote:> Well, as I understand your point and I do also understand that this signal is not a malfunction,No, but not handling it properly is. Interpreting an EINTR as "the disk must be full" (i.e. a fatal error) is wrong.> my question was regarding to the intrinsic "why" (and when) does this signal is sent to the client.That''s completely up to your application. It''s the way your application has been written that is determining the hows and whys of signals. b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
>OK, the app is used to deal with standard disks, that is why it is not >handling the EINTR signal propoerly.I think you''re misunderstanding what a "signal" is in the Unix sense. EINTR isn''t a signal; it''s a return code from the write() system call that says, "Hey, you got a signal in the middle of this write() call and it didn''t complete". It doesn''t mean that there was an error writing the file; if that was happening, you''d get a (presumably different) error code. Signals can be sent by the operating system, but those signals are things like SIGSEGV, which basically means, "you''re program screwed up". Programs can also send signals to each other, with kill(2) and the like. Now, NORMALLY systems calls like write() are interrupted by signals when you''re writing to "slow" devices, like network sockets. According to the signal(7) man page, disks are not normally considered slow devices, so I can understand the application not being used to handling this. And you know, now that I think about it I''m not even sure that network filesystems SHOULD allow I/O system calls to be interrupted by signals ... I''d have to think more about it. I suspect what happened is that something changed between 1.8.5 and the previous version of Lustre that you were using that allowed some operations to be interruptable by signals. Some things to try: - Check to see if you are, in fact, receiving a signal in your application and Lustre isn''t returning EINTR for some other reason. - If you are receiving a signal, when you set the signal handler for it you could use the SA_RESTART flag to restart the interrupted I/O; I think that would make everything work like it did before. --Ken
Francois Chassaing
2011-Feb-24 15:52 UTC
[Lustre-discuss] clients gets EINTR from time to time
OK, thanks it makes it more clear. I indeed messed up my mind (and words) between signals and error return codes. I did understood that the write()/pwrite() system was returning the EINTR error code because it received a signal, but I supposed that the signal was sent because of an error condition somewhere in the FS. This is where I now think I''m wrong. As for your questions : - I have to mention that I always had had this issue, and this is why I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it. - I will try to have that SA_RESTART flag set in the app... if I can find where the signal handler is set. - How can I see that lustre is returning EINTR for any other reason ? As I said no logs shows nothing neither on MDS or OSSs, but I didn''t go through examining "lctl debug_kernel" yet... which I''m going to do right away... my last question is : how can I tell which signal I am receiving ? because my app doesn''t say, it just dumps outs the write/pwrite error code. And if there is no signal handler, then it should follow the "standard" actions (as of man 7 signal). On the other hand, my app does not stop or dump core, and is not ignored, so it has to be handled in the code. Correct me if I''m wrong... At that point, you realize that I didn''t write the app, nor am I a good Linux guru ;-) Tnaks a lot. weborama line Fran?ois Chassaing Directeur Technique - CTO ----- Mail Original ----- De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil> ?: "Francois Chassaing" <fch at weborama.com> Cc: lustre-discuss at lists.lustre.org Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [Lustre-discuss] clients gets EINTR from time to time>OK, the app is used to deal with standard disks, that is why it is not >handling the EINTR signal propoerly.I think you''re misunderstanding what a "signal" is in the Unix sense. EINTR isn''t a signal; it''s a return code from the write() system call that says, "Hey, you got a signal in the middle of this write() call and it didn''t complete". It doesn''t mean that there was an error writing the file; if that was happening, you''d get a (presumably different) error code. Signals can be sent by the operating system, but those signals are things like SIGSEGV, which basically means, "you''re program screwed up". Programs can also send signals to each other, with kill(2) and the like. Now, NORMALLY systems calls like write() are interrupted by signals when you''re writing to "slow" devices, like network sockets. According to the signal(7) man page, disks are not normally considered slow devices, so I can understand the application not being used to handling this. And you know, now that I think about it I''m not even sure that network filesystems SHOULD allow I/O system calls to be interrupted by signals ... I''d have to think more about it. I suspect what happened is that something changed between 1.8.5 and the previous version of Lustre that you were using that allowed some operations to be interruptable by signals. Some things to try: - Check to see if you are, in fact, receiving a signal in your application and Lustre isn''t returning EINTR for some other reason. - If you are receiving a signal, when you set the signal handler for it you could use the SA_RESTART flag to restart the interrupted I/O; I think that would make everything work like it did before. --Ken
>As for your questions : >- I have to mention that I always had had this issue, and this is why >I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it.Ah, okay, I misunderstood that; my apologies.>- I will try to have that SA_RESTART flag set in the app... if I can >find where the signal handler is set.Searching for sigaction or signal should help there.>- How can I see that lustre is returning EINTR for any other reason ? >As I said no logs shows nothing neither on MDS or OSSs, but I didn''t go >through examining "lctl debug_kernel" yet... which I''m going to do >right away...Weeelll ... that was just a guess on my part. I did a quick grep though the Lustre sources and saw a few places where EINTR was returned, but most of those seemed to deal with the case where I/O was interrupted (those places happened fairly far down in the stack; it wasn''t clear to me that those errors would ever bubble back up to a return code to a system call). If _that_ is the issue, then tracking that down will be a challenge.>my last question is : how can I tell which signal I am receiving ? >because my app doesn''t say, it just dumps outs the write/pwrite error >code.I think your easiest way is to use strace; something like "strace -e signal" should do the right thing (that will only trace signals, not all system calls).>And if there is no signal handler, then it should follow the "standard" >actions (as of man 7 signal). On the other hand, my app does not stop >or dump core, and is not ignored, so it has to be handled in the code. >Correct me if I''m wrong...That is my understanding as well; if you don''t have a signal handler installed, the default action should be taking place, and if the default action is to ignore the signal that you shouldn''t be getting EINTR. But hey, I''ve been wrong before :-) --Ken
DEGREMONT Aurelien
2011-Feb-24 16:57 UTC
[Lustre-discuss] clients gets EINTR from time to time
Hello From my understanding, Lustre can return EINTR for some I/O error cases. I think that when a client gets evicted in the middle of one of its RPC, it can returns EINTR to the caller. Is this can explain your issue? Can your verify your clients where not evicted at the same time? Aur?lien Francois Chassaing a ?crit :> OK, thanks it makes it more clear. > I indeed messed up my mind (and words) between signals and error return codes. > I did understood that the write()/pwrite() system was returning the EINTR error code because it received a signal, but I supposed that the signal was sent because of an error condition somewhere in the FS. > This is where I now think I''m wrong. > > As for your questions : > - I have to mention that I always had had this issue, and this is why I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it. > - I will try to have that SA_RESTART flag set in the app... if I can find where the signal handler is set. > - How can I see that lustre is returning EINTR for any other reason ? As I said no logs shows nothing neither on MDS or OSSs, but I didn''t go through examining "lctl debug_kernel" yet... which I''m going to do right away... > > my last question is : how can I tell which signal I am receiving ? because my app doesn''t say, it just dumps outs the write/pwrite error code. > And if there is no signal handler, then it should follow the "standard" actions (as of man 7 signal). On the other hand, my app does not stop or dump core, and is not ignored, so it has to be handled in the code. Correct me if I''m wrong... > > At that point, you realize that I didn''t write the app, nor am I a good Linux guru ;-) > > Tnaks a lot. > > weborama line Fran?ois Chassaing Directeur Technique - CTO > > ----- Mail Original ----- > De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil> > ?: "Francois Chassaing" <fch at weborama.com> > Cc: lustre-discuss at lists.lustre.org > Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne > Objet: Re: [Lustre-discuss] clients gets EINTR from time to time > > >> OK, the app is used to deal with standard disks, that is why it is not >> handling the EINTR signal propoerly. >> > > I think you''re misunderstanding what a "signal" is in the Unix sense. > > EINTR isn''t a signal; it''s a return code from the write() system call > that says, "Hey, you got a signal in the middle of this write() call > and it didn''t complete". It doesn''t mean that there was an error > writing the file; if that was happening, you''d get a (presumably > different) error code. Signals can be sent by the operating system, > but those signals are things like SIGSEGV, which basically means, "you''re > program screwed up". Programs can also send signals to each other, > with kill(2) and the like. > > Now, NORMALLY systems calls like write() are interrupted by signals > when you''re writing to "slow" devices, like network sockets. According > to the signal(7) man page, disks are not normally considered slow > devices, so I can understand the application not being used to handling > this. And you know, now that I think about it I''m not even sure that > network filesystems SHOULD allow I/O system calls to be interrupted by > signals ... I''d have to think more about it. > > I suspect what happened is that something changed between 1.8.5 and the > previous version of Lustre that you were using that allowed some operations > to be interruptable by signals. Some things to try: > > - Check to see if you are, in fact, receiving a signal in your application > and Lustre isn''t returning EINTR for some other reason. > - If you are receiving a signal, when you set the signal handler for it > you could use the SA_RESTART flag to restart the interrupted I/O; I think > that would make everything work like it did before. > > --Ken > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Brian J. Murrell
2011-Feb-24 17:34 UTC
[Lustre-discuss] clients gets EINTR from time to time
On 11-02-24 11:57 AM, DEGREMONT Aurelien wrote:> HelloHi,> From my understanding, Lustre can return EINTR for some I/O error cases.I think that should/would be an EIO.> I think that when a client gets evicted in the middle of one of its RPC, > it can returns EINTR to the caller.An evicted client should get an EIO on it''s I/O calls, IIRC. b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110224/1a27c142/attachment.bin
Kevin Van Maren
2011-Feb-24 17:43 UTC
[Lustre-discuss] clients gets EINTR from time to time
No, in case of an eviction or IO errors, EIO is returned to the application, not EINTR. Kevin DEGREMONT Aurelien wrote:> Hello > > From my understanding, Lustre can return EINTR for some I/O error cases. > I think that when a client gets evicted in the middle of one of its RPC, > it can returns EINTR to the caller. > Is this can explain your issue? > > Can your verify your clients where not evicted at the same time? > > Aur?lien > > Francois Chassaing a ?crit : > >> OK, thanks it makes it more clear. >> I indeed messed up my mind (and words) between signals and error return codes. >> I did understood that the write()/pwrite() system was returning the EINTR error code because it received a signal, but I supposed that the signal was sent because of an error condition somewhere in the FS. >> This is where I now think I''m wrong. >> >> As for your questions : >> - I have to mention that I always had had this issue, and this is why I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it. >> - I will try to have that SA_RESTART flag set in the app... if I can find where the signal handler is set. >> - How can I see that lustre is returning EINTR for any other reason ? As I said no logs shows nothing neither on MDS or OSSs, but I didn''t go through examining "lctl debug_kernel" yet... which I''m going to do right away... >> >> my last question is : how can I tell which signal I am receiving ? because my app doesn''t say, it just dumps outs the write/pwrite error code. >> And if there is no signal handler, then it should follow the "standard" actions (as of man 7 signal). On the other hand, my app does not stop or dump core, and is not ignored, so it has to be handled in the code. Correct me if I''m wrong... >> >> At that point, you realize that I didn''t write the app, nor am I a good Linux guru ;-) >> >> Tnaks a lot. >> >> weborama line Fran?ois Chassaing Directeur Technique - CTO >> >> ----- Mail Original ----- >> De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil> >> ?: "Francois Chassaing" <fch at weborama.com> >> Cc: lustre-discuss at lists.lustre.org >> Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne >> Objet: Re: [Lustre-discuss] clients gets EINTR from time to time >> >> >> >>> OK, the app is used to deal with standard disks, that is why it is not >>> handling the EINTR signal propoerly. >>> >>> >> I think you''re misunderstanding what a "signal" is in the Unix sense. >> >> EINTR isn''t a signal; it''s a return code from the write() system call >> that says, "Hey, you got a signal in the middle of this write() call >> and it didn''t complete". It doesn''t mean that there was an error >> writing the file; if that was happening, you''d get a (presumably >> different) error code. Signals can be sent by the operating system, >> but those signals are things like SIGSEGV, which basically means, "you''re >> program screwed up". Programs can also send signals to each other, >> with kill(2) and the like. >> >> Now, NORMALLY systems calls like write() are interrupted by signals >> when you''re writing to "slow" devices, like network sockets. According >> to the signal(7) man page, disks are not normally considered slow >> devices, so I can understand the application not being used to handling >> this. And you know, now that I think about it I''m not even sure that >> network filesystems SHOULD allow I/O system calls to be interrupted by >> signals ... I''d have to think more about it. >> >> I suspect what happened is that something changed between 1.8.5 and the >> previous version of Lustre that you were using that allowed some operations >> to be interruptable by signals. Some things to try: >> >> - Check to see if you are, in fact, receiving a signal in your application >> and Lustre isn''t returning EINTR for some other reason. >> - If you are receiving a signal, when you set the signal handler for it >> you could use the SA_RESTART flag to restart the interrupted I/O; I think >> that would make everything work like it did before. >> >> --Ken >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Francois Chassaing
2011-Feb-25 11:18 UTC
[Lustre-discuss] clients gets EINTR from time to time
Thanks, but anyway, logs on the MDS/MGS does not show evicted client of any kind. Also, the log output by lctl debug_kernel on clients does not show much, I can only see in there the last administrative actions I''ve taken (such as setting striping policy on a directory, creating a new server pool, ...) and four unrelated (because not happening at my problem hours) "Dropping PUT from" I continue to parse debug logs and keep them posted. Thanks weborama line Fran?ois Chassaing Directeur Technique - CTO ----- Mail Original ----- De: "Kevin Van Maren" <kevin.van.maren at oracle.com> ?: "DEGREMONT Aurelien" <aurelien.degremont at cea.fr> Cc: "Francois Chassaing" <fch at weborama.com>, lustre-discuss at lists.lustre.org Envoy?: Jeudi 24 F?vrier 2011 18h43:25 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [Lustre-discuss] clients gets EINTR from time to time No, in case of an eviction or IO errors, EIO is returned to the application, not EINTR. Kevin DEGREMONT Aurelien wrote:> Hello > > From my understanding, Lustre can return EINTR for some I/O error cases. > I think that when a client gets evicted in the middle of one of its RPC, > it can returns EINTR to the caller. > Is this can explain your issue? > > Can your verify your clients where not evicted at the same time? > > Aur?lien > > Francois Chassaing a ?crit : > >> OK, thanks it makes it more clear. >> I indeed messed up my mind (and words) between signals and error return codes. >> I did understood that the write()/pwrite() system was returning the EINTR error code because it received a signal, but I supposed that the signal was sent because of an error condition somewhere in the FS. >> This is where I now think I''m wrong. >> >> As for your questions : >> - I have to mention that I always had had this issue, and this is why I''ve upgraded from 1.8.4 to 1.8.5, hoping this would solve it. >> - I will try to have that SA_RESTART flag set in the app... if I can find where the signal handler is set. >> - How can I see that lustre is returning EINTR for any other reason ? As I said no logs shows nothing neither on MDS or OSSs, but I didn''t go through examining "lctl debug_kernel" yet... which I''m going to do right away... >> >> my last question is : how can I tell which signal I am receiving ? because my app doesn''t say, it just dumps outs the write/pwrite error code. >> And if there is no signal handler, then it should follow the "standard" actions (as of man 7 signal). On the other hand, my app does not stop or dump core, and is not ignored, so it has to be handled in the code. Correct me if I''m wrong... >> >> At that point, you realize that I didn''t write the app, nor am I a good Linux guru ;-) >> >> Tnaks a lot. >> >> weborama line Fran?ois Chassaing Directeur Technique - CTO >> >> ----- Mail Original ----- >> De: "Ken Hornstein" <kenh at cmf.nrl.navy.mil> >> ?: "Francois Chassaing" <fch at weborama.com> >> Cc: lustre-discuss at lists.lustre.org >> Envoy?: Jeudi 24 F?vrier 2011 15h54:24 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne >> Objet: Re: [Lustre-discuss] clients gets EINTR from time to time >> >> >> >>> OK, the app is used to deal with standard disks, that is why it is not >>> handling the EINTR signal propoerly. >>> >>> >> I think you''re misunderstanding what a "signal" is in the Unix sense. >> >> EINTR isn''t a signal; it''s a return code from the write() system call >> that says, "Hey, you got a signal in the middle of this write() call >> and it didn''t complete". It doesn''t mean that there was an error >> writing the file; if that was happening, you''d get a (presumably >> different) error code. Signals can be sent by the operating system, >> but those signals are things like SIGSEGV, which basically means, "you''re >> program screwed up". Programs can also send signals to each other, >> with kill(2) and the like. >> >> Now, NORMALLY systems calls like write() are interrupted by signals >> when you''re writing to "slow" devices, like network sockets. According >> to the signal(7) man page, disks are not normally considered slow >> devices, so I can understand the application not being used to handling >> this. And you know, now that I think about it I''m not even sure that >> network filesystems SHOULD allow I/O system calls to be interrupted by >> signals ... I''d have to think more about it. >> >> I suspect what happened is that something changed between 1.8.5 and the >> previous version of Lustre that you were using that allowed some operations >> to be interruptable by signals. Some things to try: >> >> - Check to see if you are, in fact, receiving a signal in your application >> and Lustre isn''t returning EINTR for some other reason. >> - If you are receiving a signal, when you set the signal handler for it >> you could use the SA_RESTART flag to restart the interrupted I/O; I think >> that would make everything work like it did before. >> >> --Ken >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Brian J. Murrell
2011-Feb-25 13:28 UTC
[Lustre-discuss] clients gets EINTR from time to time
On 11-02-25 06:18 AM, Francois Chassaing wrote:> Thanks, but anyway, logs on the MDS/MGS does not show evicted client of any kind. > Also, the log output by lctl debug_kernel on clients does not show much, I can only see in there the last administrative actions I''ve taken (such as setting striping policy on a directory, creating a new server pool, ...) and four unrelated (because not happening at my problem hours) "Dropping PUT from" > > I continue to parse debug logs and keep them posted.I don''t understand why you don''t just fix your application to handle a perfectly valid and expected condition (that it''s currently not handling) instead of wasting time trying to find the cause of the expected condition. Even if you find it, it''s likely not a bug and not something that can/will be fixed. It''s your application that needs to be fixed. b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110225/d1b55bb8/attachment.bin
Francois Chassaing
2011-Feb-25 13:39 UTC
[Lustre-discuss] clients gets EINTR from time to time
Maybe, once you''ll explain to me why it is a perfectly expected condition. Because this I still don''t get. How can it be expected that a file cannot be written because of an interrupted system call, when all conditions are apparently met to write successfully to this file : a single client writing to a single file, no locking or concurrent access from other clients, no client eviction shown, no hardware failure, no nothing. Please remember that this application traditionnaly deals with standard disks which DO NOT get EINTR except on error conditions... Regards weborama line Fran?ois Chassaing Directeur Technique - CTO ----- Mail Original ----- De: "Brian J. Murrell" <brian at whamcloud.com> ?: lustre-discuss at lists.lustre.org Envoy?: Vendredi 25 F?vrier 2011 14h28:02 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [Lustre-discuss] clients gets EINTR from time to time On 11-02-25 06:18 AM, Francois Chassaing wrote:> Thanks, but anyway, logs on the MDS/MGS does not show evicted client of any kind. > Also, the log output by lctl debug_kernel on clients does not show much, I can only see in there the last administrative actions I''ve taken (such as setting striping policy on a directory, creating a new server pool, ...) and four unrelated (because not happening at my problem hours) "Dropping PUT from" > > I continue to parse debug logs and keep them posted.I don''t understand why you don''t just fix your application to handle a perfectly valid and expected condition (that it''s currently not handling) instead of wasting time trying to find the cause of the expected condition. Even if you find it, it''s likely not a bug and not something that can/will be fixed. It''s your application that needs to be fixed. b. -- Brian J. Murrell Senior Software Engineer Whamcloud, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
>I don''t understand why you don''t just fix your application to handle a >perfectly valid and expected condition (that it''s currently not >handling) instead of wasting time trying to find the cause of the >expected condition. Even if you find it, it''s likely not a bug and not >something that can/will be fixed. It''s your application that needs to >be fixed.To be fair ... normally disk I/O operations are not interruptable by signals, so it''s not an unreasonable behavior on the part of an application. I did check POSIX, and it doesn''t say that behavior is restricted only to network sockets, so yeah, it''s TECHNICALLY allowable behavior according to the standard (although the Linux manpage for signal(7) says that it will not happen). But honestly, I''ve seen plenty of cases where applications handle this for network I/O; it''s normal, everyone knows it will happen there. But for _disk_ I/O? Never seen it done. I''m not saying that there are no applications that handle this case, but it''s certainly very uncommon. I freely admit that network filesystems sort of mix the concepts of "network socket" and "disk I/O" together, and what is the "right" behavior is unclear. But calling this perfectly valid and expected is not quite accurate. It would be interesting to see what other network filesystems do under the same circumstances. --Ken
Hi. I think it would help if you knew what the signal was. Do you have that yet? I have a report from a user that is is getting EINTR when a SIGALRM goes off on his write(). It isn''t unexpected to get SIGALRM because he called the alarm, but he also has SA_RESTART set. I can''t remember whose responsibility it is to restart the call, syscall or whereever, but it seems that someone is dropping the ball because if EINTR is returned then SA_RESTART didn''t seem to do the trick, right? Thanks, -Cory On 2/25/2011 8:00 AM, Ken Hornstein wrote:>> I don''t understand why you don''t just fix your application to handle a >> perfectly valid and expected condition (that it''s currently not >> handling) instead of wasting time trying to find the cause of the >> expected condition. Even if you find it, it''s likely not a bug and not >> something that can/will be fixed. It''s your application that needs to >> be fixed. > > To be fair ... normally disk I/O operations are not interruptable by > signals, so it''s not an unreasonable behavior on the part of an > application. I did check POSIX, and it doesn''t say that behavior is > restricted only to network sockets, so yeah, it''s TECHNICALLY allowable > behavior according to the standard (although the Linux manpage for > signal(7) says that it will not happen). But honestly, I''ve seen > plenty of cases where applications handle this for network I/O; it''s > normal, everyone knows it will happen there. But for _disk_ I/O? > Never seen it done. I''m not saying that there are no applications that > handle this case, but it''s certainly very uncommon. I freely admit > that network filesystems sort of mix the concepts of "network socket" > and "disk I/O" together, and what is the "right" behavior is unclear. > But calling this perfectly valid and expected is not quite accurate. > It would be interesting to see what other network filesystems do under > the same circumstances. > > --Ken > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
>I have a report from a user that is is getting EINTR when a SIGALRM goes >off on his write(). It isn''t unexpected to get SIGALRM because he >called the alarm, but he also has SA_RESTART set. I can''t remember >whose responsibility it is to restart the call, syscall or whereever, >but it seems that someone is dropping the ball because if EINTR is >returned then SA_RESTART didn''t seem to do the trick, right?I would agree with you on that one; if you''re setting SA_RESTART then you shouldn''t ever get EINTR. It looks like what should be happening is that if you get interrupted the system call should return ERESTARTSYS and then after the signal handler is done the system call should be re-run for you by the signal handling code. I see that at least for some cases, Lustre will use ERESTARTSYS; just a guess, but maybe somewhere Lustre is returning EINTR itself instead of returning ERESTARTSYS? --Ken
Andreas Dilger
2011-Feb-25 17:39 UTC
[Lustre-discuss] clients gets EINTR from time to time
On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at whamcloud.com> wrote:> On 11-02-25 06:18 AM, Francois wrote: >> >> I continue to parse debug logs and keep them posted. > > I don''t understand why you don''t just fix your application to handle a > perfectly valid and expected condition (that it''s currently not > handling) instead of wasting time trying to find the cause of the > expected condition. Even if you find it, it''s likely not a bug and not > something that can/will be fixed. It''s your application that needs to > be fixed.In all fairness Brian, it isn''t always possible to fix an application like you suggest. It might be commercial (binary only), it might be complex code using 3rd party libraries to do the IO that would lose support if modifed, etc. I think the first action to debug this is to run on the client with "lctl set_param debug=+trace" or "=~0" which will enable function entry/exit tracing in Lustre. Then when the problem us hit run "lctl dk /tmp/debug" to dump the Lustre debug log, and search for -4 (which is -EINTR) to see where this error is first appearing. At that point we can make a determination where the source of the error is, and if it is Lustre''s fault. I know at one time there was a related problem in the l_wait_event() macro that was improperly masking signals, but I thought it was fixed by 1.8.5. Cheers, Andreas
On 02/25/2011 11:39 AM, Andreas Dilger wrote:> On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at whamcloud.com> wrote: >> On 11-02-25 06:18 AM, Francois wrote: >>> >>> I continue to parse debug logs and keep them posted. >> >> I don''t understand why you don''t just fix your application to handle a >> perfectly valid and expected condition (that it''s currently not >> handling) instead of wasting time trying to find the cause of the >> expected condition. Even if you find it, it''s likely not a bug and not >> something that can/will be fixed. It''s your application that needs to >> be fixed. > > In all fairness Brian, it isn''t always possible to fix an application like you suggest. It might be commercial (binary only), it might be complex code using 3rd party libraries to do the IO that would lose support if modifed, etc. > > I think the first action to debug this is to run on the client with "lctl set_param debug=+trace" or "=~0" which will enable function entry/exit tracing in Lustre. Then when the problem us hit run "lctl dk /tmp/debug" to dump the Lustre debug log, and search for -4 (which is -EINTR) to see where this error is first appearing. > > At that point we can make a determination where the source of the error is, and if it is Lustre''s fault. I know at one time there was a related problem in the l_wait_event() macro that was improperly masking signals, but I thought it was fixed by 1.8.5.Setting aside the moral question of which calls should be interruptible, I think that the handling of the LUSTRE_FATAL_SIGS (defined in lustre_lib.h to be SIGKILL, SIGINT, SIGTERM, SIGQUIT, SIGALRM) is slightly broken. Under certain situations, Lustre will return -EINTR although no signals were delivered. That''s probably not the end of the world for most applications, but OTOH I don''t think anybody assumes that -EINTR will be delivered spuriously. Consider the following sequence: 1) Process P has a Lustre file F open. 2) P has SIGALRM pending (but blocked). 3) P starts to writing to F and ends up sleeping in (something like): sys_write() ... ll_extent_lock() ... osc_enqueue() ... ptlrpc_queue_wait(). 4) The OST does not respond to the request before the deadline, so l_wait_event() replaces the signal mask of P with the LUSTRE_FATAL_SIGS, notices that SIGALRM is now deliverable, restores the signal mask of P, and ptlrpc_queue_wait() returns -EINTR. 5) P is exiting from sys_write(), SIGALRM is blocked (but still pending) so it doesn''t get delivered. 6) P spuriously returns -EINTR from sys_write(). I can reproduce this on 1.8.5/RHEL 5.5. If the goal is to emulate NFS''s interruptibility during congestion then returning -ERESTARTSYS would be more appropriate. Also, it might be worthwhile to make this extra interruptibility a mount flag, as NFS does. Best, John -- John L. Hammond, Ph.D. TACC, The University of Texas at Austin jhammond at tacc.utexas.edu
Francois Chassaing
2011-Mar-04 10:04 UTC
[Lustre-discuss] clients gets EINTR from time to time
Dear list, still investigating on this issue, I am now struggling with debugging.. The issue arose once more yesterday, so I started to look at it deeper and decided that the "trace" debug should be written to disk using debug_daemon. Alas, debugging with only the "trace" debug active spits more than 100 MB/s worth of log ! (yes these are busy clients)... I''ve tried several strategies like using debug_kernel from a cron job, or while watching my products error log, but even there dk would dump 70MB worth of data representing less that one second of debug log... So chances for me to trace the signal seems looow. Is there any debug flag less verbose but that may include the signal I''m looking for ? Given John''s answers could I maybe use /proc/sys/lustre/dump_on_timeout to dump the log only when timeout happens, but this will work only if my problem is matching what John can reproduce. Please also note that I''ve looked around for abnormal threads_started numbers, it is everywhere at the same value than threads_min, except for one mdt entry which is at thread_min+1... Regards weborama line Fran?ois Chassaing Directeur Technique - CTO ----- Mail Original ----- De: "John Hammond" <jhammond at tacc.utexas.edu> ?: "Andreas Dilger" <adilger at whamcloud.com> Cc: lustre-discuss at lists.lustre.org Envoy?: Vendredi 25 F?vrier 2011 21h16:36 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [Lustre-discuss] clients gets EINTR from time to time On 02/25/2011 11:39 AM, Andreas Dilger wrote:> On 2011-02-25, at 6:28, "Brian J. Murrell" <brian at whamcloud.com> wrote: >> On 11-02-25 06:18 AM, Francois wrote: >>> >>> I continue to parse debug logs and keep them posted. >> >> I don''t understand why you don''t just fix your application to handle a >> perfectly valid and expected condition (that it''s currently not >> handling) instead of wasting time trying to find the cause of the >> expected condition. Even if you find it, it''s likely not a bug and not >> something that can/will be fixed. It''s your application that needs to >> be fixed. > > In all fairness Brian, it isn''t always possible to fix an application like you suggest. It might be commercial (binary only), it might be complex code using 3rd party libraries to do the IO that would lose support if modifed, etc. > > I think the first action to debug this is to run on the client with "lctl set_param debug=+trace" or "=~0" which will enable function entry/exit tracing in Lustre. Then when the problem us hit run "lctl dk /tmp/debug" to dump the Lustre debug log, and search for -4 (which is -EINTR) to see where this error is first appearing. > > At that point we can make a determination where the source of the error is, and if it is Lustre''s fault. I know at one time there was a related problem in the l_wait_event() macro that was improperly masking signals, but I thought it was fixed by 1.8.5.Setting aside the moral question of which calls should be interruptible, I think that the handling of the LUSTRE_FATAL_SIGS (defined in lustre_lib.h to be SIGKILL, SIGINT, SIGTERM, SIGQUIT, SIGALRM) is slightly broken. Under certain situations, Lustre will return -EINTR although no signals were delivered. That''s probably not the end of the world for most applications, but OTOH I don''t think anybody assumes that -EINTR will be delivered spuriously. Consider the following sequence: 1) Process P has a Lustre file F open. 2) P has SIGALRM pending (but blocked). 3) P starts to writing to F and ends up sleeping in (something like): sys_write() ... ll_extent_lock() ... osc_enqueue() ... ptlrpc_queue_wait(). 4) The OST does not respond to the request before the deadline, so l_wait_event() replaces the signal mask of P with the LUSTRE_FATAL_SIGS, notices that SIGALRM is now deliverable, restores the signal mask of P, and ptlrpc_queue_wait() returns -EINTR. 5) P is exiting from sys_write(), SIGALRM is blocked (but still pending) so it doesn''t get delivered. 6) P spuriously returns -EINTR from sys_write(). I can reproduce this on 1.8.5/RHEL 5.5. If the goal is to emulate NFS''s interruptibility during congestion then returning -ERESTARTSYS would be more appropriate. Also, it might be worthwhile to make this extra interruptibility a mount flag, as NFS does. Best, John -- John L. Hammond, Ph.D. TACC, The University of Texas at Austin jhammond at tacc.utexas.edu _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss