João Miguel Neves
2006-Jun-26 03:32 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Good morning, The situation is strange (at least for me). Every file is getting padded with zeros (0x00) until it''s size is a multiple of 4096 bytes. Any hints on what might be happening? I''m using 1 client and 1 mds/oss. The oss has 7 ost. The kernel is a vanilla 2.6.12.6 with drbd 0.7.18 and lustre 1.4.6.1. Kernel, lustre and drbd, packages for ubuntu/breezy and sources are at: http://mirror.bn.pt/~jneves/lustre-1.4.6.1/ -- Thanks in advance, Jo?o Miguel Neves -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060626/c1f1f93e/attachment.bin
João Miguel Neves
2006-Jun-26 03:43 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Sorry, config files are at: http://mirror.bn.pt/~jneves/lustre-20060616/ And it happens only with samba... Seg, 2006-06-26 ?s 10:32 +0100, Jo?o Miguel Neves escreveu:> Good morning, > > The situation is strange (at least for me). Every file is getting padded > with zeros (0x00) until it''s size is a multiple of 4096 bytes. Any hints > on what might be happening? > > I''m using 1 client and 1 mds/oss. The oss has 7 ost. The kernel is a > vanilla 2.6.12.6 with drbd 0.7.18 and lustre 1.4.6.1. > > Kernel, lustre and drbd, packages for ubuntu/breezy and sources are at: > http://mirror.bn.pt/~jneves/lustre-1.4.6.1/ > > -- > Thanks in advance, > Jo?o Miguel Neves > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060626/228a6f7a/attachment.bin
João Miguel Neves
2006-Jun-26 04:10 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
I''m sorry. Please ignore this. It stopped happening over the weekend... Seg, 2006-06-26 ?s 10:43 +0100, Jo?o Miguel Neves escreveu:> Sorry, config files are at: > > http://mirror.bn.pt/~jneves/lustre-20060616/ > > And it happens only with samba... > > Seg, 2006-06-26 ?s 10:32 +0100, Jo?o Miguel Neves escreveu: > > Good morning, > > > > The situation is strange (at least for me). Every file is getting padded > > with zeros (0x00) until it''s size is a multiple of 4096 bytes. Any hints > > on what might be happening? > > > > I''m using 1 client and 1 mds/oss. The oss has 7 ost. The kernel is a > > vanilla 2.6.12.6 with drbd 0.7.18 and lustre 1.4.6.1. > > > > Kernel, lustre and drbd, packages for ubuntu/breezy and sources are at: > > http://mirror.bn.pt/~jneves/lustre-1.4.6.1/ > > > > -- > > Thanks in advance, > > Jo?o Miguel Neves > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060626/2291cc9f/attachment.bin
João Miguel Neves
2006-Jun-29 08:22 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
I have this problem. Some files are appearing padded with 0x00. The only pattern I can find is that the resulting file is a multiple of 4096. The files seem to change on the filesystem. They''re are being copied through samba. I have no further clues so far. Any hints for this problem will be very much appreciated. Best regards, Jo?o Miguel Neves Seg, 2006-06-26 ?s 10:32 +0100, Jo?o Miguel Neves escreveu:> Good morning, > > The situation is strange (at least for me). Every file is getting padded > with zeros (0x00) until it''s size is a multiple of 4096 bytes. Any hints > on what might be happening? > > I''m using 1 client and 1 mds/oss. The oss has 7 ost. The kernel is a > vanilla 2.6.12.6 with drbd 0.7.18 and lustre 1.4.6.1. > > Kernel, lustre and drbd, packages for ubuntu/breezy and sources are at: > http://mirror.bn.pt/~jneves/lustre-1.4.6.1/ > > -- > Thanks in advance, > Jo?o Miguel Neves > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060629/13824c80/attachment.bin
João Miguel Neves
2006-Jun-29 08:24 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
dmesg is showing several messages like: Lustre: 12060:0:(rw.c:1380:ll_readpage()) ino 10581590 page 0 (0) not covered by a lock (mmap?). check debug logs. Could this explain it? Qui, 2006-06-29 ?s 15:22 +0100, Jo?o Miguel Neves escreveu:> I have this problem. Some files are appearing padded with 0x00. The only > pattern I can find is that the resulting file is a multiple of 4096. > > The files seem to change on the filesystem. They''re are being copied > through samba. > > I have no further clues so far. Any hints for this problem will be very > much appreciated. > > Best regards, > Jo?o Miguel Neves > > Seg, 2006-06-26 ?s 10:32 +0100, Jo?o Miguel Neves escreveu: > > Good morning, > > > > The situation is strange (at least for me). Every file is getting padded > > with zeros (0x00) until it''s size is a multiple of 4096 bytes. Any hints > > on what might be happening? > > > > I''m using 1 client and 1 mds/oss. The oss has 7 ost. The kernel is a > > vanilla 2.6.12.6 with drbd 0.7.18 and lustre 1.4.6.1. > > > > Kernel, lustre and drbd, packages for ubuntu/breezy and sources are at: > > http://mirror.bn.pt/~jneves/lustre-1.4.6.1/ > > > > -- > > Thanks in advance, > > Jo?o Miguel Neves > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060629/8a811a57/attachment.bin
Oleg Drokin
2006-Jun-29 08:58 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Hello! On Thu, Jun 29, 2006 at 03:24:13PM +0100, Jo?o Miguel Neves wrote:> dmesg is showing several messages like: > Lustre: 12060:0:(rw.c:1380:ll_readpage()) ino 10581590 page 0 (0) not > covered by a lock (mmap?). check debug logs. > Could this explain it?Is the problem goes away if you use ''use sendfile = no'' in smb.conf? Bye, Oleg
João Miguel Neves
2006-Jun-29 10:45 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
We just copied about 3.6GB of files and the problem seems to be gone with "use sendfile = no". Thanks a lot. We''ll be copying some 150GB during the night, but I don''t expect any errors to appear at the moment. Is this a problem of DLM vs MMAPed files? Or is there something else happening? Not using sendfile isn''t a problem for us to use at the moment. Best regards, Jo?o Miguel Neves Qui, 2006-06-29 ?s 17:58 +0300, Oleg Drokin escreveu:> Hello! > > On Thu, Jun 29, 2006 at 03:24:13PM +0100, Jo?o Miguel Neves wrote: > > dmesg is showing several messages like: > > Lustre: 12060:0:(rw.c:1380:ll_readpage()) ino 10581590 page 0 (0) not > > covered by a lock (mmap?). check debug logs. > > Could this explain it? > > Is the problem goes away if you use ''use sendfile = no'' in smb.conf? > > Bye, > Oleg > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060629/16184713/attachment.bin
João Miguel Neves
2006-Jun-29 10:50 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Bad news. The same problem is happening. It happened in 2 files that we were in our checksum database. :( Any other ideas? Qui, 2006-06-29 ?s 17:45 +0100, Jo?o Miguel Neves escreveu:> We just copied about 3.6GB of files and the problem seems to be gone > with "use sendfile = no". Thanks a lot. We''ll be copying some 150GB > during the night, but I don''t expect any errors to appear at the moment. > > Is this a problem of DLM vs MMAPed files? Or is there something else > happening? > > Not using sendfile isn''t a problem for us to use at the moment. > > Best regards, > Jo?o Miguel Neves > > Qui, 2006-06-29 ?s 17:58 +0300, Oleg Drokin escreveu: > > Hello! > > > > On Thu, Jun 29, 2006 at 03:24:13PM +0100, Jo?o Miguel Neves wrote: > > > dmesg is showing several messages like: > > > Lustre: 12060:0:(rw.c:1380:ll_readpage()) ino 10581590 page 0 (0) not > > > covered by a lock (mmap?). check debug logs. > > > Could this explain it? > > > > Is the problem goes away if you use ''use sendfile = no'' in smb.conf? > > > > Bye, > > Oleg > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060629/22834067/attachment.bin
Oleg Drokin
2006-Jun-29 11:12 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Hello! On Thu, Jun 29, 2006 at 05:50:02PM +0100, Jo?o Miguel Neves wrote:> Bad news. The same problem is happening. It happened in 2 files that we > were in our checksum database. :( Any other ideas?Do you still get that ll_readpage warning? This one should be gone now (this is bug 7020). Now back to padding, so you get files extended to next 4k boundary and that extra part is zerofilled? We have never heard of a problem like that. I know that Samba works like this when copying files: 1. create file 2. truncate(expected_file_size) 3. fill in file data. So if for whatever reason source file reports bigger file than there is data, you will get this problem you describe. What is the source of your files? Can you check that filesizes are correctly reported there? Bye, Oleg
João Miguel Neves
2006-Jun-29 11:37 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Qui, 2006-06-29 ?s 20:12 +0300, Oleg Drokin escreveu:> Hello! > > On Thu, Jun 29, 2006 at 05:50:02PM +0100, Jo?o Miguel Neves wrote: > > Bad news. The same problem is happening. It happened in 2 files that we > > were in our checksum database. :( Any other ideas? > > Do you still get that ll_readpage warning? This one should be gone now > (this is bug 7020).No. dmesg is at the same place it was.> Now back to padding, so you get files extended to next 4k boundary and > that extra part is zerofilled? We have never heard of a problem like that. > I know that Samba works like this when copying files: > 1. create file > 2. truncate(expected_file_size) > 3. fill in file data. > > So if for whatever reason source file reports bigger file than there is data, > you will get this problem you describe. > What is the source of your files? Can you check that filesizes are correctly > reported there? >They are reported there correctly. It''s a Windows 2003 server. And several of the files just copy fine. It''s more regular with files below 4096 bytes and the files I''m detecting are xml. Other than that, I can''t pinpoint the issue. Sometimes copying the same file works, sometimes I get a padded file. Should I try a vanilla version of samba instead of using the distribution one?> Bye, > Oleg > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060629/8a977976/attachment.bin
Oleg Drokin
2006-Jun-29 11:55 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Hello! On Thu, Jun 29, 2006 at 06:37:54PM +0100, Jo?o Miguel Neves wrote:> > > Bad news. The same problem is happening. It happened in 2 files that we > > > were in our checksum database. :( Any other ideas? > > Do you still get that ll_readpage warning? This one should be gone now > > (this is bug 7020). > No. dmesg is at the same place it was.If you continue to get those warnings - that means sendfile is still used. (note old messages won''t disappear from dmesg, so you should check for new ones and this is probably best done by looking in /var/log/messages on timestamps). I think samba rereads config all by itself when it notices config change, so it should be picking the change in any case, right?> pinpoint the issue. Sometimes copying the same file works, sometimes I > get a padded file.So the file size is always right? Just the content is zero sometimes? Sounds like some sort of sendfile issue then.> Should I try a vanilla version of samba instead of using the > distribution one?Make sure that no sendfile is used first. Bye, Oleg
João Miguel Neves
2006-Jun-29 12:00 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Qui, 2006-06-29 ?s 20:55 +0300, Oleg Drokin escreveu:> Hello! > > On Thu, Jun 29, 2006 at 06:37:54PM +0100, Jo?o Miguel Neves wrote: > > > > Bad news. The same problem is happening. It happened in 2 files that we > > > > were in our checksum database. :( Any other ideas? > > > Do you still get that ll_readpage warning? This one should be gone now > > > (this is bug 7020). > > No. dmesg is at the same place it was. > > If you continue to get those warnings - that means sendfile is still used. > (note old messages won''t disappear from dmesg, so you should check for new > ones and this is probably best done by looking in /var/log/messages on > timestamps). > > I think samba rereads config all by itself when it notices config change, > so it should be picking the change in any case, right? >Sorry that I wasn''t clearer: there were no new messages in dmesg. Only the old ones. So samba isn''t using sendfile anymore.> > pinpoint the issue. Sometimes copying the same file works, sometimes I > > get a padded file. > > So the file size is always right? Just the content is zero sometimes? > Sounds like some sort of sendfile issue then. >No. The file contents in always right (the 3331 bytes of content are there). But in some cases the file in lustre has 4096 bytes, with the last bytes being all zeroes.> > Should I try a vanilla version of samba instead of using the > > distribution one? > > Make sure that no sendfile is used first. >Done. Compiling samba at the moment. Thanks for all your help.> Bye, > Oleg-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060629/b9b72bcc/attachment.bin
João Miguel Neves
2006-Jun-30 04:45 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
FYI, samba 3.0.22 seems to be behaving well (against ubuntu''s 3.0.14). We''re going to transfer a lot more files now to see if there''s any change in behavior. Best regards and thanks, Jo?o Miguel Neves Qui, 2006-06-29 ?s 19:00 +0100, Jo?o Miguel Neves escreveu:> Qui, 2006-06-29 ?s 20:55 +0300, Oleg Drokin escreveu: > > Hello! > > > > On Thu, Jun 29, 2006 at 06:37:54PM +0100, Jo?o Miguel Neves wrote: > > > > > Bad news. The same problem is happening. It happened in 2 files that we > > > > > were in our checksum database. :( Any other ideas? > > > > Do you still get that ll_readpage warning? This one should be gone now > > > > (this is bug 7020). > > > No. dmesg is at the same place it was. > > > > If you continue to get those warnings - that means sendfile is still used. > > (note old messages won''t disappear from dmesg, so you should check for new > > ones and this is probably best done by looking in /var/log/messages on > > timestamps). > > > > I think samba rereads config all by itself when it notices config change, > > so it should be picking the change in any case, right? > > > Sorry that I wasn''t clearer: there were no new messages in dmesg. Only > the old ones. So samba isn''t using sendfile anymore. > > > > pinpoint the issue. Sometimes copying the same file works, sometimes I > > > get a padded file. > > > > So the file size is always right? Just the content is zero sometimes? > > Sounds like some sort of sendfile issue then. > > > No. The file contents in always right (the 3331 bytes of content are > there). But in some cases the file in lustre has 4096 bytes, with the > last bytes being all zeroes. > > > > Should I try a vanilla version of samba instead of using the > > > distribution one? > > > > Make sure that no sendfile is used first. > > > Done. Compiling samba at the moment. > > Thanks for all your help. > > > Bye, > > Oleg > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060630/03d78a37/attachment.bin
João Miguel Neves
2006-Jul-07 04:05 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
The problem continues. It''s worst than I thought. I''m still blaming sendfile, but this time seems to be in apache. From the logs, my best guess is: Accessing a file thru apache2 using sendfile to send a file from the lustre filesystem results in the padding on the file on the lustre filesystem. Yes, I''m saying that reading a file is changing the file that is being read. Disabling sendfile in apache to test. Sex, 2006-06-30 ?s 11:45 +0100, Jo?o Miguel Neves escreveu:> FYI, samba 3.0.22 seems to be behaving well (against ubuntu''s 3.0.14). > We''re going to transfer a lot more files now to see if there''s any > change in behavior. > > Best regards and thanks, > Jo?o Miguel Neves > > Qui, 2006-06-29 ?s 19:00 +0100, Jo?o Miguel Neves escreveu: > > Qui, 2006-06-29 ?s 20:55 +0300, Oleg Drokin escreveu: > > > Hello! > > > > > > On Thu, Jun 29, 2006 at 06:37:54PM +0100, Jo?o Miguel Neves wrote: > > > > > > Bad news. The same problem is happening. It happened in 2 files that we > > > > > > were in our checksum database. :( Any other ideas? > > > > > Do you still get that ll_readpage warning? This one should be gone now > > > > > (this is bug 7020). > > > > No. dmesg is at the same place it was. > > > > > > If you continue to get those warnings - that means sendfile is still used. > > > (note old messages won''t disappear from dmesg, so you should check for new > > > ones and this is probably best done by looking in /var/log/messages on > > > timestamps). > > > > > > I think samba rereads config all by itself when it notices config change, > > > so it should be picking the change in any case, right? > > > > > Sorry that I wasn''t clearer: there were no new messages in dmesg. Only > > the old ones. So samba isn''t using sendfile anymore. > > > > > > pinpoint the issue. Sometimes copying the same file works, sometimes I > > > > get a padded file. > > > > > > So the file size is always right? Just the content is zero sometimes? > > > Sounds like some sort of sendfile issue then. > > > > > No. The file contents in always right (the 3331 bytes of content are > > there). But in some cases the file in lustre has 4096 bytes, with the > > last bytes being all zeroes. > > > > > > Should I try a vanilla version of samba instead of using the > > > > distribution one? > > > > > > Make sure that no sendfile is used first. > > > > > Done. Compiling samba at the moment. > > > > Thanks for all your help. > > > > > Bye, > > > Oleg > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060707/b44f8481/attachment.bin
João Miguel Neves
2006-Jul-07 04:55 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
OK, seems like apache was the "culprit". First tests confirm it. In about 12 hours we''ll have the filesystem "refilled". By that time I''ll be sure. Best regards, Jo?o Miguel Neves Sex, 2006-07-07 ?s 11:05 +0100, Jo?o Miguel Neves escreveu:> The problem continues. It''s worst than I thought. > > I''m still blaming sendfile, but this time seems to be in apache. From > the logs, my best guess is: > > Accessing a file thru apache2 using sendfile to send a file from the > lustre filesystem results in the padding on the file on the lustre > filesystem. Yes, I''m saying that reading a file is changing the file > that is being read. > > Disabling sendfile in apache to test. > > Sex, 2006-06-30 ?s 11:45 +0100, Jo?o Miguel Neves escreveu: > > FYI, samba 3.0.22 seems to be behaving well (against ubuntu''s 3.0.14). > > We''re going to transfer a lot more files now to see if there''s any > > change in behavior. > > > > Best regards and thanks, > > Jo?o Miguel Neves > > > > Qui, 2006-06-29 ?s 19:00 +0100, Jo?o Miguel Neves escreveu: > > > Qui, 2006-06-29 ?s 20:55 +0300, Oleg Drokin escreveu: > > > > Hello! > > > > > > > > On Thu, Jun 29, 2006 at 06:37:54PM +0100, Jo?o Miguel Neves wrote: > > > > > > > Bad news. The same problem is happening. It happened in 2 files that we > > > > > > > were in our checksum database. :( Any other ideas? > > > > > > Do you still get that ll_readpage warning? This one should be gone now > > > > > > (this is bug 7020). > > > > > No. dmesg is at the same place it was. > > > > > > > > If you continue to get those warnings - that means sendfile is still used. > > > > (note old messages won''t disappear from dmesg, so you should check for new > > > > ones and this is probably best done by looking in /var/log/messages on > > > > timestamps). > > > > > > > > I think samba rereads config all by itself when it notices config change, > > > > so it should be picking the change in any case, right? > > > > > > > Sorry that I wasn''t clearer: there were no new messages in dmesg. Only > > > the old ones. So samba isn''t using sendfile anymore. > > > > > > > > pinpoint the issue. Sometimes copying the same file works, sometimes I > > > > > get a padded file. > > > > > > > > So the file size is always right? Just the content is zero sometimes? > > > > Sounds like some sort of sendfile issue then. > > > > > > > No. The file contents in always right (the 3331 bytes of content are > > > there). But in some cases the file in lustre has 4096 bytes, with the > > > last bytes being all zeroes. > > > > > > > > Should I try a vanilla version of samba instead of using the > > > > > distribution one? > > > > > > > > Make sure that no sendfile is used first. > > > > > > > Done. Compiling samba at the moment. > > > > > > Thanks for all your help. > > > > > > > Bye, > > > > Oleg > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss@clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060707/352efbb5/attachment.bin
Oleg Drokin
2006-Jul-07 10:13 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Hello! On Fri, Jul 07, 2006 at 11:55:36AM +0100, Jo?o Miguel Neves wrote:> OK, seems like apache was the "culprit". First tests confirm it. In > about 12 hours we''ll have the filesystem "refilled". By that time I''ll > be sure.Have you tried applying patches from bug 7020? Or perhaps you can give 1.4.7beta3 a try as well to see if patches from bug7020 actually addressed this problem already? Bye, Oleg
João Miguel Neves
2006-Jul-10 06:57 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
No, I haven''t tried those patches. Unfortunately the problem isn''t with sendfile. Disabling sendfile in samba and apache removed the log entries like this one: Lustre: 21924:0:(rw.c:1380:ll_readpage()) ino 10980186 page 0 (0) not covered by a lock (mmap?). check debug logs. But I''m still seeing file corruption with files being added 0x00 until the files reaches a multiple of 4096 bytes. These are files in the lustre filesystem that are only read. After being copied they are ok, after some time (usually 24 to 48h) some of the files fail the checksum and trigger my alarms. I can''t replicate the system with CFS kernel''s because I need drbd. Any suggestions on compiler tools and build environments would be great so I can try to understand what''s going on, while minimizing the impact of the toolset I''m using. Best regards, Jo?o Miguel Neves Sex, 2006-07-07 ?s 19:12 +0300, Oleg Drokin escreveu:> Hello! > > On Fri, Jul 07, 2006 at 11:55:36AM +0100, Jo?o Miguel Neves wrote: > > OK, seems like apache was the "culprit". First tests confirm it. In > > about 12 hours we''ll have the filesystem "refilled". By that time I''ll > > be sure. > > Have you tried applying patches from bug 7020? > Or perhaps you can give 1.4.7beta3 a try as well to see if patches from bug7020 > actually addressed this problem already? > > Bye, > Oleg > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060710/8947cd1c/attachment.bin
João Miguel Neves
2006-Jul-12 09:24 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
There are a couple of doubts I''m dealing with: 1) What is the service inside lustre that could be changing the files? Is there some kind of online file check? 2) Is there any situation where a ftruncate() can get nullified by such a service? Well, still diving through the docs trying to come up with answers, while waiting for a kernel with 1.4.7beta3 to compile. Best regards, Jo?o Miguel Neves Seg, 2006-07-10 ?s 13:57 +0100, Jo?o Miguel Neves escreveu:> No, I haven''t tried those patches. > > Unfortunately the problem isn''t with sendfile. Disabling sendfile in > samba and apache removed the log entries like this one: > > Lustre: 21924:0:(rw.c:1380:ll_readpage()) ino 10980186 page 0 (0) not covered by a lock (mmap?). check debug logs. > > But I''m still seeing file corruption with files being added 0x00 until > the files reaches a multiple of 4096 bytes. These are files in the > lustre filesystem that are only read. After being copied they are ok, > after some time (usually 24 to 48h) some of the files fail the checksum > and trigger my alarms. > > I can''t replicate the system with CFS kernel''s because I need drbd. Any > suggestions on compiler tools and build environments would be great so I > can try to understand what''s going on, while minimizing the impact of > the toolset I''m using. > > Best regards, > Jo?o Miguel Neves > > Sex, 2006-07-07 ?s 19:12 +0300, Oleg Drokin escreveu: > > Hello! > > > > On Fri, Jul 07, 2006 at 11:55:36AM +0100, Jo?o Miguel Neves wrote: > > > OK, seems like apache was the "culprit". First tests confirm it. In > > > about 12 hours we''ll have the filesystem "refilled". By that time I''ll > > > be sure. > > > > Have you tried applying patches from bug 7020? > > Or perhaps you can give 1.4.7beta3 a try as well to see if patches from bug7020 > > actually addressed this problem already? > > > > Bye, > > Oleg > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060712/618ab49c/attachment.bin
Andreas Dilger
2006-Jul-12 15:40 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
On Jul 12, 2006 16:24 +0100, Jo?o Miguel Neves wrote:> There are a couple of doubts I''m dealing with: > 1) What is the service inside lustre that could be changing the files? > Is there some kind of online file check?I doubt it is the file actually changing, rather that there is something that is changing the file size and that is being reset later when the locks are refreshed from the server.> 2) Is there any situation where a ftruncate() can get nullified by such > a service? > > Well, still diving through the docs trying to come up with answers, > while waiting for a kernel with 1.4.7beta3 to compile.If you can stomach it, you should look at the lustre kernel debug logs to find the answer. lctl clear # clears existing log {reproduce problem in minimal steps} lctl dk /tmp/debug VFS operations are marked "VFS", and size (sort of) is tracked by something called "kms" in the logs. Lock operations that relate to file size have " EXT " in the line. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
João Miguel Neves
2006-Jul-13 01:29 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Problem occured during the night with 1.4.7beta3. As it seems like truncating a file is causing the trouble, I''ll just try a script that creates small files with a size of 4096 and then truncate them to a smaller size. The scenario you describe makes sense to me. Is there a way to force lock refreshing? Maybe this: $ echo clear > /proc/fs/lustre/ldlm/ldlm/namespaces/<OSC name|MDC|name>/lru_size Thanks for all the help, Jo?o Miguel Neves Qua, 2006-07-12 ?s 15:40 -0600, Andreas Dilger escreveu:> On Jul 12, 2006 16:24 +0100, Jo?o Miguel Neves wrote: > > There are a couple of doubts I''m dealing with: > > 1) What is the service inside lustre that could be changing the files? > > Is there some kind of online file check? > > I doubt it is the file actually changing, rather that there is something > that is changing the file size and that is being reset later when the > locks are refreshed from the server. > > > 2) Is there any situation where a ftruncate() can get nullified by such > > a service? > > > > Well, still diving through the docs trying to come up with answers, > > while waiting for a kernel with 1.4.7beta3 to compile. > > If you can stomach it, you should look at the lustre kernel debug logs > to find the answer. > > lctl clear # clears existing log > {reproduce problem in minimal steps} > lctl dk /tmp/debug > > VFS operations are marked "VFS", and size (sort of) is tracked by something > called "kms" in the logs. Lock operations that relate to file size have > " EXT " in the line. > > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060713/1bf901f5/attachment.bin
João Miguel Neves
2006-Jul-13 06:22 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
I''ve put the dump here: http://mirror.bn.pt/~jneves/dump-failed-lustre.log It''s about 6MB. The problem is in reproducing the problem in minimal steps. So far the only way we can reproduce it is by copying dozens of gigabytes to the filesystem. The test was: * $ lkcl clear * Several hundred files were created with a size of 4096 and then were truncated to 5 bytes. Two scripts were used to do this: one in python, another in ruby. They use ftruncate64() and truncate64() system calls, respectively. (*) * In another directory of the same filesystem we copied 4.2GB in 432 files and 264 directories from a Windows 2003 server to the Lustre filesystem using samba 3.0.22. * At the end of the transfer ALL of the test files created on the other directory were reporting a file size of 4096. * $ lctl dk > /root/dump-failed-lustre.log Changes or recommendations for different testing are very welcome. (*) Roland Laifer has reported the same issue with ftruncate(). A quick analysis of the failed doesn''t reveal anything strange to me. I''ll look at it more seriously in a few hours. Thanks for everything, Jo?o Miguel Neves Qua, 2006-07-12 ?s 15:40 -0600, Andreas Dilger escreveu:> On Jul 12, 2006 16:24 +0100, Jo?o Miguel Neves wrote: > > There are a couple of doubts I''m dealing with: > > 1) What is the service inside lustre that could be changing the files? > > Is there some kind of online file check? > > I doubt it is the file actually changing, rather that there is something > that is changing the file size and that is being reset later when the > locks are refreshed from the server. > > > 2) Is there any situation where a ftruncate() can get nullified by such > > a service? > > > > Well, still diving through the docs trying to come up with answers, > > while waiting for a kernel with 1.4.7beta3 to compile. > > If you can stomach it, you should look at the lustre kernel debug logs > to find the answer. > > lctl clear # clears existing log > {reproduce problem in minimal steps} > lctl dk /tmp/debug > > VFS operations are marked "VFS", and size (sort of) is tracked by something > called "kms" in the logs. Lock operations that relate to file size have > " EXT " in the line. > > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060713/cc920fcc/attachment.bin
Andreas Dilger
2006-Jul-13 10:40 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
On Jul 13, 2006 08:29 +0100, Jo?o Miguel Neves wrote:> Problem occured during the night with 1.4.7beta3. As it seems like > truncating a file is causing the trouble, I''ll just try a script that > creates small files with a size of 4096 and then truncate them to a > smaller size. > > The scenario you describe makes sense to me. Is there a way to force > lock refreshing? Maybe this: > > $ echo clear > /proc/fs/lustre/ldlm/ldlm/namespaces/<OSC name|MDC|name>/lru_sizeYes, this will flush all of the locks on the client. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Andreas Dilger
2006-Jul-14 04:37 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
On Jul 13, 2006 13:22 +0100, Jo?o Miguel Neves wrote:> I''ve put the dump here: > http://mirror.bn.pt/~jneves/dump-failed-lustre.log > > It''s about 6MB. The problem is in reproducing the problem in minimal > steps. So far the only way we can reproduce it is by copying dozens of > gigabytes to the filesystem.Sorry, I haven''t had a chance to look at the log yet. It isn''t clear whether there will be anything of use in it, given the amount of operations that are being done.> The test was: > * $ lkcl clear > * Several hundred files were created with a size of 4096 and then were > truncated to 5 bytes. Two scripts were used to do this: one in python, > another in ruby. They use ftruncate64() and truncate64() system calls, > respectively. (*) > * In another directory of the same filesystem we copied 4.2GB in 432 > files and 264 directories from a Windows 2003 server to the Lustre > filesystem using samba 3.0.22. > * At the end of the transfer ALL of the test files created on the other > directory were reporting a file size of 4096.By default the client filesystem keeps 100 locks for each OST in the system. This would likely mean that the test files would have had their locks flushed from the system. It is possible to determine what the actual size of the file is, by doing: lfs getstripe /path/to/4096-byte-file It will report the OSTs and their integer "index", and for each file it will report the object ID(s) that this file is striped over. You then need to run on the OST(s) that this file is striped over: debugfs -c -R "stat /O/0/d$(({objid} % 32))/{objid}" /dev/ostNdev This will report the file size of the object on disk, which is the "real" size. It will be important to know whether the 4096-byte size reported on the client is the correct value or if it is incorrectly being reported on the client. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
João Miguel Neves
2006-Jul-14 09:23 UTC
[Lustre-discuss] Files padded with zeros until reaching 4k multiple?!?!
Sex, 2006-07-14 ?s 04:37 -0600, Andreas Dilger escreveu:> On Jul 13, 2006 13:22 +0100, Jo?o Miguel Neves wrote: > > I''ve put the dump here: > > http://mirror.bn.pt/~jneves/dump-failed-lustre.log > > > > It''s about 6MB. The problem is in reproducing the problem in minimal > > steps. So far the only way we can reproduce it is by copying dozens of > > gigabytes to the filesystem. > > Sorry, I haven''t had a chance to look at the log yet. It isn''t clear > whether there will be anything of use in it, given the amount of operations > that are being done. >Agreed. Even taking the log every 10s, I always get a ~5MB, which tells me I was getting nothing.> > The test was: > > * $ lkcl clear > > * Several hundred files were created with a size of 4096 and then were > > truncated to 5 bytes. Two scripts were used to do this: one in python, > > another in ruby. They use ftruncate64() and truncate64() system calls, > > respectively. (*) > > * In another directory of the same filesystem we copied 4.2GB in 432 > > files and 264 directories from a Windows 2003 server to the Lustre > > filesystem using samba 3.0.22. > > * At the end of the transfer ALL of the test files created on the other > > directory were reporting a file size of 4096. > > By default the client filesystem keeps 100 locks for each OST in the system. > This would likely mean that the test files would have had their locks flushed > from the system. > > It is possible to determine what the actual size of the file is, by doing: > > lfs getstripe /path/to/4096-byte-file > > It will report the OSTs and their integer "index", and for each file it will > report the object ID(s) that this file is striped over. You then need to > run on the OST(s) that this file is striped over: > > debugfs -c -R "stat /O/0/d$(({objid} % 32))/{objid}" /dev/ostNdev > > This will report the file size of the object on disk, which is the "real" > size. It will be important to know whether the 4096-byte size reported > on the client is the correct value or if it is incorrectly being reported > on the client. >Correctly reported on the client. The object file in the ost is also 4096 bytes and it''s contents are the 3 bytes of content padded with 0x00. debugfs was refusing to open the filesystem (Filesystem has unsupported feature(s) while opening filesystem - e2fsprogs 1.38), so I stopped lustre and drbd and did a mount: # mount -t ldiskfs -o mballoc,extents,ro /dev/sde2 1 Any more suggestions are welcome. Forcing to flush the locks on the client with: # echo clear > /proc/fs/lustre/ldlm/ldlm/namespaces/*/lru_size has not caused the problem in any way that we could observe. Thanks once again, Jo?o Miguel Neves -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060714/e8837603/attachment.bin