Hi everyone, I''m running 1.6.4.1 and applied the patches 14006, 14007 and 14008. I tried to apply the other patches (14363, 14693, 14442, 14591) recommended recently by Oleg but they are too different from the Lustre 1.6.4.1 sources (tried 1.6.3 as well) to patch even by hand. Any suggestions on how to apply them? I receive countless thousand of errors very similar to: Jan 10 11:59:37 wiglaf kernel: Lustre Error: 11-0: an error occurred while communicating with 0 at lo. The ost_write operation failed with -28 same error also with -16 The jobs writing to a NFS exported Lustre mount fail silently w/o errors except what I''ve posted in the logs and sent to Oleg. Several tests are running now, more detailed errors to come. Dan
Hello! On Jan 15, 2008, at 7:29 PM, Dan wrote:> I''m running 1.6.4.1 and applied the patches 14006, 14007 and 14008. I > tried to apply the other patches (14363, 14693, 14442, 14591) > recommended > recently by Oleg but they are too different from the Lustre 1.6.4.1 > sources (tried 1.6.3 as well) to patch even by hand. Any > suggestions on > how to apply them?My diffs are against later b1_6 tree, but if nobody can apply them, that''s obviously bad and I guess I need to make 1.6.4 port if only to get some wider test coverage from everybody interested. It would be better if you reference bug numbers instead of attachment ids, as that would help me to find out those bugs faster.> I receive countless thousand of errors very similar to: > Jan 10 11:59:37 wiglaf kernel: Lustre Error: 11-0: an error occurred > while > communicating with 0 at lo. The ost_write operation failed with -28-28 is ENOSPC which is just telling you you''ve run out of space on some of your OSTs.> same error also with -16 > The jobs writing to a NFS exported Lustre mount fail silently w/o > errors > except what I''ve posted in the logs and sent to Oleg. Several tests > are > running now, more detailed errors to come.The logs you provided indicate that your disk backed is overloaded and takes hundreds of seconds to process i/o requests. You need to lower the load somehow or improve disk backend performance. Now, seeing as to how you do not have patch from bug 13371, that might explain your hight load - due to gazillion of small requests generated by nfs client without writev/readv support in lustre. Bye, Oleg
Hello! On Jan 15, 2008, at 7:29 PM, Dan wrote:> I''m running 1.6.4.1 and applied the patches 14006, 14007 and 14008. I > tried to apply the other patches (14363, 14693, 14442, 14591) > recommended > recently by Oleg but they are too different from the Lustre 1.6.4.1 > sources (tried 1.6.3 as well) to patch even by hand. Any > suggestions on > how to apply them?Aha, I took a closer look. You do not need all the patches from those bugs, you only need patches that are applicable to b1_6 tree (usually stated in a comment somewhere or in landing flags). So that leaves you with a list of: bug 14360/att 14006 bug 14379/att 14007, 14008 (these three you already have - good). bug 13371/att 14591 - this one applies to 1.6.4.1 with one minor problem in autoconf script, trivial to resolve by hand. (Also by mistake there is ldlm/ldlm_lock.c change that you better avoid as it undoes att 14008 - you can overcome this by not applying 14008 at all and when applying this patch, patch would tell you there is likely reversed patch and do you want to assume -R, you just agree here). you do not need 14693, this is for different branch. att 14363 is not strictly necessary for you, so if you cannot apply it, don''t do it. att 14442 - you don''t need this one either as it is for different branch. Bye, Oleg