João Miguel Neves
2006-Jul-15 02:41 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060715/45cb3755/attachment.bin
João Miguel Neves
2006-Jul-15 04:34 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
S?b, 2006-07-15 ?s 09:41 +0100, Jo?o Miguel Neves escreveu:> 2) Under the test conditions not all files'' size becomes a multiple of > 4096. Both files already on the filesystem and the recently transfered > ones are changed. Tested with: > > # for i in `find -ls | tr -s '' '' | cut -d'' '' -f7`; do echo $((i % 4096)); done | grep -vw 0 >Just a note: all the script created files before the transfer do get changed from 3 to 4096 bytes. Is there anywhere else on lustre where a file size is kept. The feeling i get is that the size is kept in 2 places and that a truncate is only updating one of them. When some kind of check (upcall/recovery?) is issued, the wrong one becomes the filesize on the OST. -- Cumprimentos, Jo?o Miguel Neves -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060715/e912d673/attachment.bin
João Miguel Neves
2006-Jul-16 01:50 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/82659706/attachment.bin
Andreas Dilger
2006-Jul-16 02:49 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
On Jul 16, 2006 08:50 +0100, Jo?o Miguel Neves wrote:> If anyone is willing to try the attached script? I would really like to > understand if this is an issue that just happens in some uses or in some > lustre installations. > > The attached ruby script always recreates the problem for me. This test > creates nearly half a million files and takes about 2GB of disk space. > The /root/log in my system got a bit over 21GB of logs. The script takes > some hours to run in my system. > > A version of this script that checked the size of the file ''a/a/aa'' to > try to identify a log that would have caused the change did not cause > the problem. I''m not sure how useful the logs will be, but just comment > out the lines with "lctl" if you only want to try to replicate the > problem.Please give the following test script a try: #!/bin/bash export LUSTRE={your lustre mountpoint} lctl clear lfs setstripe $LUSTRE/foo 0 0 1 ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214 dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2 (sync; sleep 1; sync &) sleep 6 START=`date +%s` echo "$START: TRUNCATE START"; lctl mark TRUNCATE START truncate $LUSTRE/foo 3 # this is in the lustre tests dir, it just # calls truncate(filename, size) END=`date +%s` echo "$END: TRUNCATE END after $((END - START))s"; lctl mark TRUNCATE END sleep 2 lctl dk /tmp/debug lfs getstripe $LUSTRE/foo # verify size of object on OST with debugfs as before You should get some messages in the syslog on the OST that the write is being delayed, and the START and END markers should be about 20s+ apart. The script is inducing an aritficial delay in the write path on the OST, and then the truncate is trying to "pass" the write. This should be blocked on the client, because the truncate should block on the locked page while it is trying to do the truncate. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
João Miguel Neves
2006-Jul-16 02:50 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
BTW, in the last run, of the nearly half a million files created, 6599 kept the original size of 3 bytes. I''m really curious about how this bug works... Best regards, Jo?o Miguel Neves Dom, 2006-07-16 ?s 08:50 +0100, Jo?o Miguel Neves escreveu:> If anyone is willing to try the attached script? I would really like to > understand if this is an issue that just happens in some uses or in some > lustre installations. > > The attached ruby script always recreates the problem for me. This test > creates nearly half a million files and takes about 2GB of disk space. > The /root/log in my system got a bit over 21GB of logs. The script takes > some hours to run in my system. > > A version of this script that checked the size of the file ''a/a/aa'' to > try to identify a log that would have caused the change did not cause > the problem. I''m not sure how useful the logs will be, but just comment > out the lines with "lctl" if you only want to try to replicate the > problem. > > Possible related bugs: > https://bugzilla.lustre.org/show_bug.cgi?id=4558 > https://bugzilla.lustre.org/show_bug.cgi?id=10730 > > Should I enter a new bug with the info I''ve collected? I am almost sure > that 10730 and my bug are the same, but I''m not under a support contract > anymore and 10730 is marked as blocking 3840. > > Best regards, > Jo?o Miguel Neves > > S?b, 2006-07-15 ?s 11:34 +0100, Jo?o Miguel Neves escreveu: > > S?b, 2006-07-15 ?s 09:41 +0100, Jo?o Miguel Neves escreveu: > > > 2) Under the test conditions not all files'' size becomes a multiple of > > > 4096. Both files already on the filesystem and the recently transfered > > > ones are changed. Tested with: > > > > > > # for i in `find -ls | tr -s '' '' | cut -d'' '' -f7`; do echo $((i % 4096)); done | grep -vw 0 > > > > > Just a note: all the script created files before the transfer do get > > changed from 3 to 4096 bytes. > > > > Is there anywhere else on lustre where a file size is kept. The feeling > > i get is that the size is kept in 2 places and that a truncate is only > > updating one of them. When some kind of check (upcall/recovery?) is > > issued, the wrong one becomes the filesize on the OST. > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/1434f0a1/attachment.bin
João Miguel Neves
2006-Jul-16 03:16 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Doesn''t trigger the bug. Tried 3 times, one it took 0s to truncate, the other two, it took 18s. Thanks, Jo?o Miguel Neves Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu:> On Jul 16, 2006 08:50 +0100, Jo?o Miguel Neves wrote: > > If anyone is willing to try the attached script? I would really like to > > understand if this is an issue that just happens in some uses or in some > > lustre installations. > > > > The attached ruby script always recreates the problem for me. This test > > creates nearly half a million files and takes about 2GB of disk space. > > The /root/log in my system got a bit over 21GB of logs. The script takes > > some hours to run in my system. > > > > A version of this script that checked the size of the file ''a/a/aa'' to > > try to identify a log that would have caused the change did not cause > > the problem. I''m not sure how useful the logs will be, but just comment > > out the lines with "lctl" if you only want to try to replicate the > > problem. > > Please give the following test script a try: > > #!/bin/bash > export LUSTRE={your lustre mountpoint} > lctl clear > lfs setstripe $LUSTRE/foo 0 0 1 > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214 > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2 > (sync; sleep 1; sync &) > sleep 6 > START=`date +%s` > echo "$START: TRUNCATE START"; lctl mark TRUNCATE START > truncate $LUSTRE/foo 3 # this is in the lustre tests dir, it just > # calls truncate(filename, size) > END=`date +%s` > echo "$END: TRUNCATE END after $((END - START))s"; lctl mark TRUNCATE END > sleep 2 > lctl dk /tmp/debug > lfs getstripe $LUSTRE/foo > # verify size of object on OST with debugfs as before > > You should get some messages in the syslog on the OST that the write > is being delayed, and the START and END markers should be about 20s+ > apart. > > The script is inducing an aritficial delay in the write path on the OST, > and then the truncate is trying to "pass" the write. This should be > blocked on the client, because the truncate should block on the locked > page while it is trying to do the truncate. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/33af8ca0/attachment.bin
João Miguel Neves
2006-Jul-16 03:33 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Just a random shot: In file.c:ll_extent_lock there''s the following comment: /* XXX phil: can we do this? won''t it screw the file size up? */ if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) || (sbi->ll_flags & LL_SBI_NOLCK)) RETURN(0); I was looking at that function because of the line: inode->i_size = lvb.lvb_size; But I''m too ignorant about lustre internals to understand this totally, I just checked that ll_extent_lock was called in the log (not that it helps much - if I understand correctly, this is the usual way to call for a lock in the filesystem). Best regards, Jo?o Miguel Neves Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves escreveu:> Doesn''t trigger the bug. > > Tried 3 times, one it took 0s to truncate, the other two, it took 18s. > > Thanks, > Jo?o Miguel Neves > > Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu: > > On Jul 16, 2006 08:50 +0100, Jo?o Miguel Neves wrote: > > > If anyone is willing to try the attached script? I would really like to > > > understand if this is an issue that just happens in some uses or in some > > > lustre installations. > > > > > > The attached ruby script always recreates the problem for me. This test > > > creates nearly half a million files and takes about 2GB of disk space. > > > The /root/log in my system got a bit over 21GB of logs. The script takes > > > some hours to run in my system. > > > > > > A version of this script that checked the size of the file ''a/a/aa'' to > > > try to identify a log that would have caused the change did not cause > > > the problem. I''m not sure how useful the logs will be, but just comment > > > out the lines with "lctl" if you only want to try to replicate the > > > problem. > > > > Please give the following test script a try: > > > > #!/bin/bash > > export LUSTRE={your lustre mountpoint} > > lctl clear > > lfs setstripe $LUSTRE/foo 0 0 1 > > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214 > > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2 > > (sync; sleep 1; sync &) > > sleep 6 > > START=`date +%s` > > echo "$START: TRUNCATE START"; lctl mark TRUNCATE START > > truncate $LUSTRE/foo 3 # this is in the lustre tests dir, it just > > # calls truncate(filename, size) > > END=`date +%s` > > echo "$END: TRUNCATE END after $((END - START))s"; lctl mark TRUNCATE END > > sleep 2 > > lctl dk /tmp/debug > > lfs getstripe $LUSTRE/foo > > # verify size of object on OST with debugfs as before > > > > You should get some messages in the syslog on the OST that the write > > is being delayed, and the START and END markers should be about 20s+ > > apart. > > > > The script is inducing an aritficial delay in the write path on the OST, > > and then the truncate is trying to "pass" the write. This should be > > blocked on the client, because the truncate should block on the locked > > page while it is trying to do the truncate. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Principal Software Engineer > > Cluster File Systems, Inc. > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/41681114/attachment.bin
Andreas Dilger
2006-Jul-16 03:44 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
On Jul 16, 2006 10:33 +0100, Jo?o Miguel Neves wrote:> In file.c:ll_extent_lock there''s the following comment: > > /* XXX phil: can we do this? won''t it screw the file size up? */ > if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) || > (sbi->ll_flags & LL_SBI_NOLCK)) > RETURN(0);No, this is if someone is calling an ioctl to ignore the locking on the file. Since the locking is disabled (and the size depends on the locking) the size on the client will be incorrect. That ioctl should only be called by applications that know what they are doing. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
João Miguel Neves
2006-Jul-16 03:46 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
After the line "inode->i_size = lvb.lvb_size;", I noticed the update of the inode timestamps. I went to the problem files and noticed something very strange: -rw-r--r-- 1 root root 4096 2006-07-16 09:07 ga -rw-r--r-- 1 root root 4096 2006-07-16 09:07 gb -rw-r--r-- 1 root root 3 2006-07-16 00:05 gc -rw-r--r-- 1 root root 4096 2006-07-16 09:07 gd -rw-r--r-- 1 root root 4096 2006-07-16 09:07 ge As you can see, gc is the original 3 bytes, while the others have changed to 4096. The files were created sequentially, but the timestamps reflect a change later. atime = 09:07 for all mtime = ctime = 00:05 for gc and 09:07 for the others Best regards, Jo?o Miguel Neves Dom, 2006-07-16 ?s 10:33 +0100, Jo?o Miguel Neves escreveu:> Just a random shot: > > In file.c:ll_extent_lock there''s the following comment: > > /* XXX phil: can we do this? won''t it screw the file size up? */ > if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) || > (sbi->ll_flags & LL_SBI_NOLCK)) > RETURN(0); > > I was looking at that function because of the line: > > inode->i_size = lvb.lvb_size; > > But I''m too ignorant about lustre internals to understand this totally, > I just checked that ll_extent_lock was called in the log (not that it > helps much - if I understand correctly, this is the usual way to call > for a lock in the filesystem). > > Best regards, > Jo?o Miguel Neves > > Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves escreveu: > > Doesn''t trigger the bug. > > > > Tried 3 times, one it took 0s to truncate, the other two, it took 18s. > > > > Thanks, > > Jo?o Miguel Neves > > > > Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu: > > > On Jul 16, 2006 08:50 +0100, Jo?o Miguel Neves wrote: > > > > If anyone is willing to try the attached script? I would really like to > > > > understand if this is an issue that just happens in some uses or in some > > > > lustre installations. > > > > > > > > The attached ruby script always recreates the problem for me. This test > > > > creates nearly half a million files and takes about 2GB of disk space. > > > > The /root/log in my system got a bit over 21GB of logs. The script takes > > > > some hours to run in my system. > > > > > > > > A version of this script that checked the size of the file ''a/a/aa'' to > > > > try to identify a log that would have caused the change did not cause > > > > the problem. I''m not sure how useful the logs will be, but just comment > > > > out the lines with "lctl" if you only want to try to replicate the > > > > problem. > > > > > > Please give the following test script a try: > > > > > > #!/bin/bash > > > export LUSTRE={your lustre mountpoint} > > > lctl clear > > > lfs setstripe $LUSTRE/foo 0 0 1 > > > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214 > > > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2 > > > (sync; sleep 1; sync &) > > > sleep 6 > > > START=`date +%s` > > > echo "$START: TRUNCATE START"; lctl mark TRUNCATE START > > > truncate $LUSTRE/foo 3 # this is in the lustre tests dir, it just > > > # calls truncate(filename, size) > > > END=`date +%s` > > > echo "$END: TRUNCATE END after $((END - START))s"; lctl mark TRUNCATE END > > > sleep 2 > > > lctl dk /tmp/debug > > > lfs getstripe $LUSTRE/foo > > > # verify size of object on OST with debugfs as before > > > > > > You should get some messages in the syslog on the OST that the write > > > is being delayed, and the START and END markers should be about 20s+ > > > apart. > > > > > > The script is inducing an aritficial delay in the write path on the OST, > > > and then the truncate is trying to "pass" the write. This should be > > > blocked on the client, because the truncate should block on the locked > > > page while it is trying to do the truncate. > > > > > > Cheers, Andreas > > > -- > > > Andreas Dilger > > > Principal Software Engineer > > > Cluster File Systems, Inc. > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss@clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/10a1c01f/attachment-0001.bin
João Miguel Neves
2006-Jul-16 04:50 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Ignore the time-related issues. I''ve just found out that NTP synchronization isn''t working in the OSS/MDS. Could this be cause of all these? OSS and MDS are using the same clock. The client is the one that is different. I''m going to fix that and run the scripts again. Dom, 2006-07-16 ?s 10:46 +0100, Jo?o Miguel Neves escreveu:> After the line "inode->i_size = lvb.lvb_size;", I noticed the update of > the inode timestamps. I went to the problem files and noticed something > very strange: > > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 ga > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 gb > -rw-r--r-- 1 root root 3 2006-07-16 00:05 gc > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 gd > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 ge > > As you can see, gc is the original 3 bytes, while the others have > changed to 4096. The files were created sequentially, but the timestamps > reflect a change later. > > atime = 09:07 for all > mtime = ctime = 00:05 for gc and 09:07 for the others > > Best regards, > Jo?o Miguel Neves > > Dom, 2006-07-16 ?s 10:33 +0100, Jo?o Miguel Neves escreveu: > > Just a random shot: > > > > In file.c:ll_extent_lock there''s the following comment: > > > > /* XXX phil: can we do this? won''t it screw the file size up? */ > > if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) || > > (sbi->ll_flags & LL_SBI_NOLCK)) > > RETURN(0); > > > > I was looking at that function because of the line: > > > > inode->i_size = lvb.lvb_size; > > > > But I''m too ignorant about lustre internals to understand this totally, > > I just checked that ll_extent_lock was called in the log (not that it > > helps much - if I understand correctly, this is the usual way to call > > for a lock in the filesystem). > > > > Best regards, > > Jo?o Miguel Neves > > > > Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves escreveu: > > > Doesn''t trigger the bug. > > > > > > Tried 3 times, one it took 0s to truncate, the other two, it took 18s. > > > > > > Thanks, > > > Jo?o Miguel Neves > > > > > > Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu: > > > > On Jul 16, 2006 08:50 +0100, Jo?o Miguel Neves wrote: > > > > > If anyone is willing to try the attached script? I would really like to > > > > > understand if this is an issue that just happens in some uses or in some > > > > > lustre installations. > > > > > > > > > > The attached ruby script always recreates the problem for me. This test > > > > > creates nearly half a million files and takes about 2GB of disk space. > > > > > The /root/log in my system got a bit over 21GB of logs. The script takes > > > > > some hours to run in my system. > > > > > > > > > > A version of this script that checked the size of the file ''a/a/aa'' to > > > > > try to identify a log that would have caused the change did not cause > > > > > the problem. I''m not sure how useful the logs will be, but just comment > > > > > out the lines with "lctl" if you only want to try to replicate the > > > > > problem. > > > > > > > > Please give the following test script a try: > > > > > > > > #!/bin/bash > > > > export LUSTRE={your lustre mountpoint} > > > > lctl clear > > > > lfs setstripe $LUSTRE/foo 0 0 1 > > > > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214 > > > > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2 > > > > (sync; sleep 1; sync &) > > > > sleep 6 > > > > START=`date +%s` > > > > echo "$START: TRUNCATE START"; lctl mark TRUNCATE START > > > > truncate $LUSTRE/foo 3 # this is in the lustre tests dir, it just > > > > # calls truncate(filename, size) > > > > END=`date +%s` > > > > echo "$END: TRUNCATE END after $((END - START))s"; lctl mark TRUNCATE END > > > > sleep 2 > > > > lctl dk /tmp/debug > > > > lfs getstripe $LUSTRE/foo > > > > # verify size of object on OST with debugfs as before > > > > > > > > You should get some messages in the syslog on the OST that the write > > > > is being delayed, and the START and END markers should be about 20s+ > > > > apart. > > > > > > > > The script is inducing an aritficial delay in the write path on the OST, > > > > and then the truncate is trying to "pass" the write. This should be > > > > blocked on the client, because the truncate should block on the locked > > > > page while it is trying to do the truncate. > > > > > > > > Cheers, Andreas > > > > -- > > > > Andreas Dilger > > > > Principal Software Engineer > > > > Cluster File Systems, Inc. > > > > > > > > _______________________________________________ > > > > Lustre-discuss mailing list > > > > Lustre-discuss@clusterfs.com > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss@clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/62f31e67/attachment.bin
João Miguel Neves
2006-Jul-16 06:36 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Just in case that matters: OST/MDS was ~8h ahead of client. The test script should take a few more hours to finish, so I won''t have news soon. Dom, 2006-07-16 ?s 11:50 +0100, Jo?o Miguel Neves escreveu:> Ignore the time-related issues. I''ve just found out that NTP > synchronization isn''t working in the OSS/MDS. Could this be cause of all > these? OSS and MDS are using the same clock. The client is the one that > is different. > > I''m going to fix that and run the scripts again. > > Dom, 2006-07-16 ?s 10:46 +0100, Jo?o Miguel Neves escreveu: > > After the line "inode->i_size = lvb.lvb_size;", I noticed the update of > > the inode timestamps. I went to the problem files and noticed something > > very strange: > > > > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 ga > > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 gb > > -rw-r--r-- 1 root root 3 2006-07-16 00:05 gc > > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 gd > > -rw-r--r-- 1 root root 4096 2006-07-16 09:07 ge > > > > As you can see, gc is the original 3 bytes, while the others have > > changed to 4096. The files were created sequentially, but the timestamps > > reflect a change later. > > > > atime = 09:07 for all > > mtime = ctime = 00:05 for gc and 09:07 for the others > > > > Best regards, > > Jo?o Miguel Neves > > > > Dom, 2006-07-16 ?s 10:33 +0100, Jo?o Miguel Neves escreveu: > > > Just a random shot: > > > > > > In file.c:ll_extent_lock there''s the following comment: > > > > > > /* XXX phil: can we do this? won''t it screw the file size up? */ > > > if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) || > > > (sbi->ll_flags & LL_SBI_NOLCK)) > > > RETURN(0); > > > > > > I was looking at that function because of the line: > > > > > > inode->i_size = lvb.lvb_size; > > > > > > But I''m too ignorant about lustre internals to understand this totally, > > > I just checked that ll_extent_lock was called in the log (not that it > > > helps much - if I understand correctly, this is the usual way to call > > > for a lock in the filesystem). > > > > > > Best regards, > > > Jo?o Miguel Neves > > > > > > Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves escreveu: > > > > Doesn''t trigger the bug. > > > > > > > > Tried 3 times, one it took 0s to truncate, the other two, it took 18s. > > > > > > > > Thanks, > > > > Jo?o Miguel Neves > > > > > > > > Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu: > > > > > On Jul 16, 2006 08:50 +0100, Jo?o Miguel Neves wrote: > > > > > > If anyone is willing to try the attached script? I would really like to > > > > > > understand if this is an issue that just happens in some uses or in some > > > > > > lustre installations. > > > > > > > > > > > > The attached ruby script always recreates the problem for me. This test > > > > > > creates nearly half a million files and takes about 2GB of disk space. > > > > > > The /root/log in my system got a bit over 21GB of logs. The script takes > > > > > > some hours to run in my system. > > > > > > > > > > > > A version of this script that checked the size of the file ''a/a/aa'' to > > > > > > try to identify a log that would have caused the change did not cause > > > > > > the problem. I''m not sure how useful the logs will be, but just comment > > > > > > out the lines with "lctl" if you only want to try to replicate the > > > > > > problem. > > > > > > > > > > Please give the following test script a try: > > > > > > > > > > #!/bin/bash > > > > > export LUSTRE={your lustre mountpoint} > > > > > lctl clear > > > > > lfs setstripe $LUSTRE/foo 0 0 1 > > > > > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214 > > > > > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2 > > > > > (sync; sleep 1; sync &) > > > > > sleep 6 > > > > > START=`date +%s` > > > > > echo "$START: TRUNCATE START"; lctl mark TRUNCATE START > > > > > truncate $LUSTRE/foo 3 # this is in the lustre tests dir, it just > > > > > # calls truncate(filename, size) > > > > > END=`date +%s` > > > > > echo "$END: TRUNCATE END after $((END - START))s"; lctl mark TRUNCATE END > > > > > sleep 2 > > > > > lctl dk /tmp/debug > > > > > lfs getstripe $LUSTRE/foo > > > > > # verify size of object on OST with debugfs as before > > > > > > > > > > You should get some messages in the syslog on the OST that the write > > > > > is being delayed, and the START and END markers should be about 20s+ > > > > > apart. > > > > > > > > > > The script is inducing an aritficial delay in the write path on the OST, > > > > > and then the truncate is trying to "pass" the write. This should be > > > > > blocked on the client, because the truncate should block on the locked > > > > > page while it is trying to do the truncate. > > > > > > > > > > Cheers, Andreas > > > > > -- > > > > > Andreas Dilger > > > > > Principal Software Engineer > > > > > Cluster File Systems, Inc. > > > > > > > > > > _______________________________________________ > > > > > Lustre-discuss mailing list > > > > > Lustre-discuss@clusterfs.com > > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > > > _______________________________________________ > > > > Lustre-discuss mailing list > > > > Lustre-discuss@clusterfs.com > > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss@clusterfs.com > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/7108d1f1/attachment.bin
João Miguel Neves
2006-Jul-16 10:09 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
The time synchronization made no difference. The script still triggers the bug. Trying to regenerate the logs. Last time I reduced the logs that caused the trouble to 10MB with a 1 minute resolution and now I''m trying with a 1 second resolution. Best regards, Jo?o Miguel Neves Dom, 2006-07-16 ?s 13:36 +0100, Jo?o Miguel Neves escreveu:> Just in case that matters: OST/MDS was ~8h ahead of client. The test > script should take a few more hours to finish, so I won''t have news > soon. > > Dom, 2006-07-16 ?s 11:50 +0100, Jo?o Miguel Neves escreveu: > > Ignore the time-related issues. I''ve just found out that NTP > > synchronization isn''t working in the OSS/MDS. Could this be cause of all > > these? OSS and MDS are using the same clock. The client is the one that > > is different. > > > > I''m going to fix that and run the scripts again.-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/86fac491/attachment.bin
João Miguel Neves
2006-Jul-16 23:37 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
New data: every time I put a check for a file size of a single fixed file, the problem does not appear. I''ve tried with doing a read with the creation of every file or once every 676 files. The file I tested is the first one created by the script (''a/a/aa''). My idea was to stop the creation script as soon as possible after affecting that file. But seems like I''ve hit the uncertainty principle. On the other end, it does point out to be a problem that only happens under load, probably lock-related. I''m capturing a lustre log for each file creation. The number of lines of the log follows a pattern: normally it has something like 173 lines, and once every 10/15 files, it grows. On the version that triggers the bug it grows to over 2000 lines, while on the version without the bug it grows to around 257. Any clues on what can cause a truncate to be canceled could probably help me. I''ll keep you posted, in the worst case it helps me think about this. Best regards, Jo?o Miguel Neves -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/419245b9/attachment.bin
Oleg Drokin
2006-Jul-17 10:11 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Hello! On Mon, Jul 17, 2006 at 06:37:28AM +0100, Jo?o Miguel Neves wrote:> On the other end, it does point out to be a problem that only happens > under load, probably lock-related. I''m capturing a lustre log for each > file creation. The number of lines of the log follows a pattern: > normally it has something like 173 lines, and once every 10/15 files, it > grows. On the version that triggers the bug it grows to over 2000 lines, > while on the version without the bug it grows to around 257.And can we have this debug log for the case of bug happened please? Bye, Oleg
João Miguel Neves
2006-Jul-17 13:12 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
You can have the logs, I just don''t think they are useful :(. It''s 22.6GB of logs and I still can''t pinpoint which of them caused the problems. If you have some way to analyze this much data, I''ll find a way to send them to you (even by fedex, if needed). But other than that, they are just half a million logs without the culprit being known :(. Best regards, Jo?o Miguel Neves Seg, 2006-07-17 ?s 19:11 +0300, Oleg Drokin escreveu:> Hello! > > On Mon, Jul 17, 2006 at 06:37:28AM +0100, Jo?o Miguel Neves wrote: > > On the other end, it does point out to be a problem that only happens > > under load, probably lock-related. I''m capturing a lustre log for each > > file creation. The number of lines of the log follows a pattern: > > normally it has something like 173 lines, and once every 10/15 files, it > > grows. On the version that triggers the bug it grows to over 2000 lines, > > while on the version without the bug it grows to around 257. > > And can we have this debug log for the case of bug happened please? > > Bye, > Oleg > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060717/1463dba3/attachment.bin
Oleg Drokin
2006-Jul-17 13:23 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
Hello! On Mon, Jul 17, 2006 at 08:12:06PM +0100, Jo?o Miguel Neves wrote:> You can have the logs, I just don''t think they are useful :(. It''s > 22.6GB of logs and I still can''t pinpoint which of them caused the > problems. If you have some way to analyze this much data, I''ll find a > way to send them to you (even by fedex, if needed). But other than that, > they are just half a million logs without the culprit being known :(.Hm, I thought you have found that when the problem occurs, the logfile is bigger then when the problem did not occur? If I got you wrong, then nevermind. Bye, Oleg
João Miguel Neves
2006-Jul-17 13:48 UTC
[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!
I capture logs for each file creation (writing+truncate). Every 10/15 files, there are longer logs (created by locking, mostly). When the problem happens, those logs are bigger, but I can''t pinpoint exactly in which log is the result. So no, I still don''t have a sure culprit, just some hundreds of candidates. I still find it odd that the modified (incorrect) files get the atime/ctime/mtime from the OSS/MDS while the non-modified (correct) files keep the atime/ctime/mtime from the client. I found this out thanks to a time difference between both machines (client and MDS/OSS) Every time I included size queries in my tests,I wasn''t able to replicate the bug. Best regards, Jo?o Miguel Neves Seg, 2006-07-17 ?s 22:23 +0300, Oleg Drokin escreveu:> Hello! > > On Mon, Jul 17, 2006 at 08:12:06PM +0100, Jo?o Miguel Neves wrote: > > You can have the logs, I just don''t think they are useful :(. It''s > > 22.6GB of logs and I still can''t pinpoint which of them caused the > > problems. If you have some way to analyze this much data, I''ll find a > > way to send them to you (even by fedex, if needed). But other than that, > > they are just half a million logs without the culprit being known :(. > > Hm, I thought you have found that when the problem occurs, the logfile is bigger > then when the problem did not occur? > If I got you wrong, then nevermind. > > Bye, > Oleg > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060717/1046bdf5/attachment.bin
João Miguel Neves
2006-Sep-06 16:14 UTC
[Lustre-discuss] Solved - Files padded with zeros until reaching 4k multiple?!?!
Just to leave a clue in the list archive. This problem has been solved. More information and patch on: https://bugzilla.lustre.org/show_bug.cgi?id=10730 The patch should become part of Lustre 1.4.7.2. Thanks to all people involved, Jo?o Miguel Neves -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem assinada digitalmente Url : http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060906/ab698681/attachment.bin