thr3ads.net - Lustre discuss - [Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?! [Jul 2006]

If this information is useful, please help other people find it:
Share via:

João Miguel Neves

2006-Jul-15 02:41 UTC

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Skipped content of type multipart/mixed-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060715/45cb3755/attachment.bin

João Miguel Neves

2006-Jul-15 04:34 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

S?b, 2006-07-15 ?s 09:41 +0100, Jo?o Miguel Neves
escreveu:> 2) Under the test conditions not all files'' size becomes a
multiple of
> 4096. Both files already on the filesystem and the recently transfered
> ones are changed. Tested with:
> 
> # for i in `find -ls | tr -s '' '' | cut -d''
'' -f7`; do echo $((i % 4096)); done | grep -vw 0
> Just a note: all the script created files before the transfer do get
changed from 3 to 4096 bytes.

Is there anywhere else on lustre where a file size is kept. The feeling
i get is that the size is kept in 2 places and that a truncate is only
updating one of them. When some kind of check (upcall/recovery?) is
issued, the wrong one becomes the filesize on the OST.
-- 
Cumprimentos,
                                                Jo?o Miguel Neves
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060715/e912d673/attachment.bin

João Miguel Neves

2006-Jul-16 01:50 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Skipped content of type multipart/mixed-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/82659706/attachment.bin

Andreas Dilger

2006-Jul-16 02:49 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

On Jul 16, 2006  08:50 +0100, Jo?o Miguel Neves wrote:> If anyone is willing to try the attached script? I would really like to
> understand if this is an issue that just happens in some uses or in some
> lustre installations.
> 
> The attached ruby script always recreates the problem for me. This test
> creates nearly half a million files and takes about 2GB of disk space.
> The /root/log in my system got a bit over 21GB of logs. The script takes
> some hours to run in my system.
> 
> A version of this script that checked the size of the file
''a/a/aa'' to
> try to identify a log that would have caused the change did not cause
> the problem. I''m not sure how useful the logs will be, but just
comment
> out the lines with "lctl" if you only want to try to replicate
the
> problem.
Please give the following test script a try:

#!/bin/bash
export LUSTRE={your lustre mountpoint}
lctl clear
lfs setstripe $LUSTRE/foo 0 0 1
ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214
dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2
(sync; sleep 1; sync &)
sleep 6
START=`date +%s`
echo "$START: TRUNCATE START"; lctl mark TRUNCATE START
truncate $LUSTRE/foo 3  # this is in the lustre tests dir, it just
			# calls truncate(filename, size)
END=`date +%s`
echo "$END: TRUNCATE END after $((END - START))s"; lctl mark TRUNCATE
END
sleep 2
lctl dk /tmp/debug
lfs getstripe $LUSTRE/foo
# verify size of object on OST with debugfs as before

You should get some messages in the syslog on the OST that the write
is being delayed, and the START and END markers should be about 20s+
apart.

The script is inducing an aritficial delay in the write path on the OST,
and then the truncate is trying to "pass" the write.  This should be
blocked on the client, because the truncate should block on the locked
page while it is trying to do the truncate.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

João Miguel Neves

2006-Jul-16 02:50 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

BTW, in the last run, of the nearly half a million files created, 6599
kept the original size of 3 bytes. I''m really curious about how this
bug
works...

Best regards,
						Jo?o Miguel Neves

Dom, 2006-07-16 ?s 08:50 +0100, Jo?o Miguel Neves
escreveu:> If anyone is willing to try the attached script? I would really like to
> understand if this is an issue that just happens in some uses or in some
> lustre installations.
> 
> The attached ruby script always recreates the problem for me. This test
> creates nearly half a million files and takes about 2GB of disk space.
> The /root/log in my system got a bit over 21GB of logs. The script takes
> some hours to run in my system.
> 
> A version of this script that checked the size of the file
''a/a/aa'' to
> try to identify a log that would have caused the change did not cause
> the problem. I''m not sure how useful the logs will be, but just
comment
> out the lines with "lctl" if you only want to try to replicate
the
> problem.
> 
> Possible related bugs:
> https://bugzilla.lustre.org/show_bug.cgi?id=4558
> https://bugzilla.lustre.org/show_bug.cgi?id=10730
> 
> Should I enter a new bug with the info I''ve collected? I am almost
sure
> that 10730 and my bug are the same, but I''m not under a support
contract
> anymore and 10730 is marked as blocking 3840.
> 
> Best regards,
> 						Jo?o Miguel Neves
> 
> S?b, 2006-07-15 ?s 11:34 +0100, Jo?o Miguel Neves escreveu:
> > S?b, 2006-07-15 ?s 09:41 +0100, Jo?o Miguel Neves escreveu:
> > > 2) Under the test conditions not all files'' size becomes
a multiple of
> > > 4096. Both files already on the filesystem and the recently
transfered
> > > ones are changed. Tested with:
> > > 
> > > # for i in `find -ls | tr -s '' '' | cut
-d'' '' -f7`; do echo $((i % 4096)); done | grep -vw 0
> > > 
> > Just a note: all the script created files before the transfer do get
> > changed from 3 to 4096 bytes.
> > 
> > Is there anywhere else on lustre where a file size is kept. The
feeling
> > i get is that the size is kept in 2 places and that a truncate is only
> > updating one of them. When some kind of check (upcall/recovery?) is
> > issued, the wrong one becomes the filesize on the OST.
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/1434f0a1/attachment.bin

João Miguel Neves

2006-Jul-16 03:16 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Doesn''t trigger the bug.

Tried 3 times, one it took 0s to truncate, the other two, it took 18s.

Thanks,
						Jo?o Miguel Neves

Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu:> On Jul 16, 2006  08:50 +0100, Jo?o Miguel Neves wrote:
> > If anyone is willing to try the attached script? I would really like
to
> > understand if this is an issue that just happens in some uses or in
some
> > lustre installations.
> > 
> > The attached ruby script always recreates the problem for me. This
test
> > creates nearly half a million files and takes about 2GB of disk space.
> > The /root/log in my system got a bit over 21GB of logs. The script
takes
> > some hours to run in my system.
> > 
> > A version of this script that checked the size of the file
''a/a/aa'' to
> > try to identify a log that would have caused the change did not cause
> > the problem. I''m not sure how useful the logs will be, but
just comment
> > out the lines with "lctl" if you only want to try to
replicate the
> > problem.
> 
> Please give the following test script a try:
> 
> #!/bin/bash
> export LUSTRE={your lustre mountpoint}
> lctl clear
> lfs setstripe $LUSTRE/foo 0 0 1
> ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214
> dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2
> (sync; sleep 1; sync &)
> sleep 6
> START=`date +%s`
> echo "$START: TRUNCATE START"; lctl mark TRUNCATE START
> truncate $LUSTRE/foo 3  # this is in the lustre tests dir, it just
> 			# calls truncate(filename, size)
> END=`date +%s`
> echo "$END: TRUNCATE END after $((END - START))s"; lctl mark
TRUNCATE END
> sleep 2
> lctl dk /tmp/debug
> lfs getstripe $LUSTRE/foo
> # verify size of object on OST with debugfs as before
> 
> You should get some messages in the syslog on the OST that the write
> is being delayed, and the START and END markers should be about 20s+
> apart.
> 
> The script is inducing an aritficial delay in the write path on the OST,
> and then the truncate is trying to "pass" the write.  This should
be
> blocked on the client, because the truncate should block on the locked
> page while it is trying to do the truncate.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> -------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/33af8ca0/attachment.bin

João Miguel Neves

2006-Jul-16 03:33 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Just a random shot:

In file.c:ll_extent_lock there''s the following comment:

        /* XXX phil: can we do this?  won''t it screw the file size up?
*/
        if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) ||
            (sbi->ll_flags & LL_SBI_NOLCK))
                RETURN(0);

I was looking at that function because of the line:

 inode->i_size = lvb.lvb_size;

But I''m too ignorant about lustre internals to understand this totally,
I just checked that ll_extent_lock was called in the log (not that it
helps much - if I understand correctly, this is the usual way to call
for a lock in the filesystem).

Best regards,
						Jo?o Miguel Neves

Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves
escreveu:> Doesn''t trigger the bug.
> 
> Tried 3 times, one it took 0s to truncate, the other two, it took 18s.
> 
> Thanks,
> 						Jo?o Miguel Neves
> 
> Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu:
> > On Jul 16, 2006  08:50 +0100, Jo?o Miguel Neves wrote:
> > > If anyone is willing to try the attached script? I would really
like to
> > > understand if this is an issue that just happens in some uses or
in some
> > > lustre installations.
> > > 
> > > The attached ruby script always recreates the problem for me.
This test
> > > creates nearly half a million files and takes about 2GB of disk
space.
> > > The /root/log in my system got a bit over 21GB of logs. The
script takes
> > > some hours to run in my system.
> > > 
> > > A version of this script that checked the size of the file
''a/a/aa'' to
> > > try to identify a log that would have caused the change did not
cause
> > > the problem. I''m not sure how useful the logs will be,
but just comment
> > > out the lines with "lctl" if you only want to try to
replicate the
> > > problem.
> > 
> > Please give the following test script a try:
> > 
> > #!/bin/bash
> > export LUSTRE={your lustre mountpoint}
> > lctl clear
> > lfs setstripe $LUSTRE/foo 0 0 1
> > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214
> > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2
> > (sync; sleep 1; sync &)
> > sleep 6
> > START=`date +%s`
> > echo "$START: TRUNCATE START"; lctl mark TRUNCATE START
> > truncate $LUSTRE/foo 3  # this is in the lustre tests dir, it just
> > 			# calls truncate(filename, size)
> > END=`date +%s`
> > echo "$END: TRUNCATE END after $((END - START))s"; lctl mark
TRUNCATE END
> > sleep 2
> > lctl dk /tmp/debug
> > lfs getstripe $LUSTRE/foo
> > # verify size of object on OST with debugfs as before
> > 
> > You should get some messages in the syslog on the OST that the write
> > is being delayed, and the START and END markers should be about 20s+
> > apart.
> > 
> > The script is inducing an aritficial delay in the write path on the
OST,
> > and then the truncate is trying to "pass" the write.  This
should be
> > blocked on the client, because the truncate should block on the locked
> > page while it is trying to do the truncate.
> > 
> > Cheers, Andreas
> > --
> > Andreas Dilger
> > Principal Software Engineer
> > Cluster File Systems, Inc.
> > 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/41681114/attachment.bin

Andreas Dilger

2006-Jul-16 03:44 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

On Jul 16, 2006  10:33 +0100, Jo?o Miguel Neves wrote:> In file.c:ll_extent_lock there''s the following comment:
> 
>         /* XXX phil: can we do this?  won''t it screw the file size
up? */
>         if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) ||
>             (sbi->ll_flags & LL_SBI_NOLCK))
>                 RETURN(0);
No, this is if someone is calling an ioctl to ignore the locking on the
file.  Since the locking is disabled (and the size depends on the locking)
the size on the client will be incorrect.  That ioctl should only be
called by applications that know what they are doing.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

João Miguel Neves

2006-Jul-16 03:46 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

After the line "inode->i_size = lvb.lvb_size;", I noticed the
update of
the inode timestamps. I went to the problem files and noticed something
very strange:

-rw-r--r--  1 root root 4096 2006-07-16 09:07 ga
-rw-r--r--  1 root root 4096 2006-07-16 09:07 gb
-rw-r--r--  1 root root    3 2006-07-16 00:05 gc
-rw-r--r--  1 root root 4096 2006-07-16 09:07 gd
-rw-r--r--  1 root root 4096 2006-07-16 09:07 ge

As you can see, gc is the original 3 bytes, while the others have
changed to 4096. The files were created sequentially, but the timestamps
reflect a change later.

atime = 09:07 for all
mtime = ctime = 00:05 for gc and 09:07 for the others

Best regards,
						Jo?o Miguel Neves

Dom, 2006-07-16 ?s 10:33 +0100, Jo?o Miguel Neves
escreveu:> Just a random shot:
> 
> In file.c:ll_extent_lock there''s the following comment:
> 
>         /* XXX phil: can we do this?  won''t it screw the file size
up? */
>         if ((fd && (fd->fd_flags & LL_FILE_IGNORE_LOCK)) ||
>             (sbi->ll_flags & LL_SBI_NOLCK))
>                 RETURN(0);
> 
> I was looking at that function because of the line:
> 
>  inode->i_size = lvb.lvb_size;
> 
> But I''m too ignorant about lustre internals to understand this
totally,
> I just checked that ll_extent_lock was called in the log (not that it
> helps much - if I understand correctly, this is the usual way to call
> for a lock in the filesystem).
> 
> Best regards,
> 						Jo?o Miguel Neves
> 
> Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves escreveu:
> > Doesn''t trigger the bug.
> > 
> > Tried 3 times, one it took 0s to truncate, the other two, it took 18s.
> > 
> > Thanks,
> > 						Jo?o Miguel Neves
> > 
> > Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu:
> > > On Jul 16, 2006  08:50 +0100, Jo?o Miguel Neves wrote:
> > > > If anyone is willing to try the attached script? I would
really like to
> > > > understand if this is an issue that just happens in some
uses or in some
> > > > lustre installations.
> > > > 
> > > > The attached ruby script always recreates the problem for
me. This test
> > > > creates nearly half a million files and takes about 2GB of
disk space.
> > > > The /root/log in my system got a bit over 21GB of logs. The
script takes
> > > > some hours to run in my system.
> > > > 
> > > > A version of this script that checked the size of the file
''a/a/aa'' to
> > > > try to identify a log that would have caused the change did
not cause
> > > > the problem. I''m not sure how useful the logs will
be, but just comment
> > > > out the lines with "lctl" if you only want to try
to replicate the
> > > > problem.
> > > 
> > > Please give the following test script a try:
> > > 
> > > #!/bin/bash
> > > export LUSTRE={your lustre mountpoint}
> > > lctl clear
> > > lfs setstripe $LUSTRE/foo 0 0 1
> > > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214
> > > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2
> > > (sync; sleep 1; sync &)
> > > sleep 6
> > > START=`date +%s`
> > > echo "$START: TRUNCATE START"; lctl mark TRUNCATE START
> > > truncate $LUSTRE/foo 3  # this is in the lustre tests dir, it
just
> > > 			# calls truncate(filename, size)
> > > END=`date +%s`
> > > echo "$END: TRUNCATE END after $((END - START))s"; lctl
mark TRUNCATE END
> > > sleep 2
> > > lctl dk /tmp/debug
> > > lfs getstripe $LUSTRE/foo
> > > # verify size of object on OST with debugfs as before
> > > 
> > > You should get some messages in the syslog on the OST that the
write
> > > is being delayed, and the START and END markers should be about
20s+
> > > apart.
> > > 
> > > The script is inducing an aritficial delay in the write path on
the OST,
> > > and then the truncate is trying to "pass" the write. 
This should be
> > > blocked on the client, because the truncate should block on the
locked
> > > page while it is trying to do the truncate.
> > > 
> > > Cheers, Andreas
> > > --
> > > Andreas Dilger
> > > Principal Software Engineer
> > > Cluster File Systems, Inc.
> > > 
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss@clusterfs.com
> > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > > 
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/10a1c01f/attachment-0001.bin

João Miguel Neves

2006-Jul-16 04:50 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Ignore the time-related issues. I''ve just found out that NTP
synchronization isn''t working in the OSS/MDS. Could this be cause of
all
these? OSS and MDS are using the same clock. The client is the one that
is different.

I''m going to fix that and run the scripts again.

Dom, 2006-07-16 ?s 10:46 +0100, Jo?o Miguel Neves
escreveu:> After the line "inode->i_size = lvb.lvb_size;", I noticed the
update of
> the inode timestamps. I went to the problem files and noticed something
> very strange:
> 
> -rw-r--r--  1 root root 4096 2006-07-16 09:07 ga
> -rw-r--r--  1 root root 4096 2006-07-16 09:07 gb
> -rw-r--r--  1 root root    3 2006-07-16 00:05 gc
> -rw-r--r--  1 root root 4096 2006-07-16 09:07 gd
> -rw-r--r--  1 root root 4096 2006-07-16 09:07 ge
> 
> As you can see, gc is the original 3 bytes, while the others have
> changed to 4096. The files were created sequentially, but the timestamps
> reflect a change later.
> 
> atime = 09:07 for all
> mtime = ctime = 00:05 for gc and 09:07 for the others
> 
> Best regards,
> 						Jo?o Miguel Neves
> 
> Dom, 2006-07-16 ?s 10:33 +0100, Jo?o Miguel Neves escreveu:
> > Just a random shot:
> > 
> > In file.c:ll_extent_lock there''s the following comment:
> > 
> >         /* XXX phil: can we do this?  won''t it screw the file
size up? */
> >         if ((fd && (fd->fd_flags &
LL_FILE_IGNORE_LOCK)) ||
> >             (sbi->ll_flags & LL_SBI_NOLCK))
> >                 RETURN(0);
> > 
> > I was looking at that function because of the line:
> > 
> >  inode->i_size = lvb.lvb_size;
> > 
> > But I''m too ignorant about lustre internals to understand
this totally,
> > I just checked that ll_extent_lock was called in the log (not that it
> > helps much - if I understand correctly, this is the usual way to call
> > for a lock in the filesystem).
> > 
> > Best regards,
> > 						Jo?o Miguel Neves
> > 
> > Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves escreveu:
> > > Doesn''t trigger the bug.
> > > 
> > > Tried 3 times, one it took 0s to truncate, the other two, it took
18s.
> > > 
> > > Thanks,
> > > 						Jo?o Miguel Neves
> > > 
> > > Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu:
> > > > On Jul 16, 2006  08:50 +0100, Jo?o Miguel Neves wrote:
> > > > > If anyone is willing to try the attached script? I
would really like to
> > > > > understand if this is an issue that just happens in
some uses or in some
> > > > > lustre installations.
> > > > > 
> > > > > The attached ruby script always recreates the problem
for me. This test
> > > > > creates nearly half a million files and takes about 2GB
of disk space.
> > > > > The /root/log in my system got a bit over 21GB of logs.
The script takes
> > > > > some hours to run in my system.
> > > > > 
> > > > > A version of this script that checked the size of the
file ''a/a/aa'' to
> > > > > try to identify a log that would have caused the change
did not cause
> > > > > the problem. I''m not sure how useful the logs
will be, but just comment
> > > > > out the lines with "lctl" if you only want to
try to replicate the
> > > > > problem.
> > > > 
> > > > Please give the following test script a try:
> > > > 
> > > > #!/bin/bash
> > > > export LUSTRE={your lustre mountpoint}
> > > > lctl clear
> > > > lfs setstripe $LUSTRE/foo 0 0 1
> > > > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214
> > > > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2
> > > > (sync; sleep 1; sync &)
> > > > sleep 6
> > > > START=`date +%s`
> > > > echo "$START: TRUNCATE START"; lctl mark TRUNCATE
START
> > > > truncate $LUSTRE/foo 3  # this is in the lustre tests dir,
it just
> > > > 			# calls truncate(filename, size)
> > > > END=`date +%s`
> > > > echo "$END: TRUNCATE END after $((END - START))s";
lctl mark TRUNCATE END
> > > > sleep 2
> > > > lctl dk /tmp/debug
> > > > lfs getstripe $LUSTRE/foo
> > > > # verify size of object on OST with debugfs as before
> > > > 
> > > > You should get some messages in the syslog on the OST that
the write
> > > > is being delayed, and the START and END markers should be
about 20s+
> > > > apart.
> > > > 
> > > > The script is inducing an aritficial delay in the write path
on the OST,
> > > > and then the truncate is trying to "pass" the
write.  This should be
> > > > blocked on the client, because the truncate should block on
the locked
> > > > page while it is trying to do the truncate.
> > > > 
> > > > Cheers, Andreas
> > > > --
> > > > Andreas Dilger
> > > > Principal Software Engineer
> > > > Cluster File Systems, Inc.
> > > > 
> > > > _______________________________________________
> > > > Lustre-discuss mailing list
> > > > Lustre-discuss@clusterfs.com
> > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > > > 
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss@clusterfs.com
> > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/62f31e67/attachment.bin

João Miguel Neves

2006-Jul-16 06:36 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Just in case that matters: OST/MDS was ~8h ahead of client. The test
script should take a few more hours to finish, so I won''t have news
soon.

Dom, 2006-07-16 ?s 11:50 +0100, Jo?o Miguel Neves
escreveu:> Ignore the time-related issues. I''ve just found out that NTP
> synchronization isn''t working in the OSS/MDS. Could this be cause
of all
> these? OSS and MDS are using the same clock. The client is the one that
> is different.
> 
> I''m going to fix that and run the scripts again.
> 
> Dom, 2006-07-16 ?s 10:46 +0100, Jo?o Miguel Neves escreveu:
> > After the line "inode->i_size = lvb.lvb_size;", I noticed
the update of
> > the inode timestamps. I went to the problem files and noticed
something
> > very strange:
> > 
> > -rw-r--r--  1 root root 4096 2006-07-16 09:07 ga
> > -rw-r--r--  1 root root 4096 2006-07-16 09:07 gb
> > -rw-r--r--  1 root root    3 2006-07-16 00:05 gc
> > -rw-r--r--  1 root root 4096 2006-07-16 09:07 gd
> > -rw-r--r--  1 root root 4096 2006-07-16 09:07 ge
> > 
> > As you can see, gc is the original 3 bytes, while the others have
> > changed to 4096. The files were created sequentially, but the
timestamps
> > reflect a change later.
> > 
> > atime = 09:07 for all
> > mtime = ctime = 00:05 for gc and 09:07 for the others
> > 
> > Best regards,
> > 						Jo?o Miguel Neves
> > 
> > Dom, 2006-07-16 ?s 10:33 +0100, Jo?o Miguel Neves escreveu:
> > > Just a random shot:
> > > 
> > > In file.c:ll_extent_lock there''s the following comment:
> > > 
> > >         /* XXX phil: can we do this?  won''t it screw the
file size up? */
> > >         if ((fd && (fd->fd_flags &
LL_FILE_IGNORE_LOCK)) ||
> > >             (sbi->ll_flags & LL_SBI_NOLCK))
> > >                 RETURN(0);
> > > 
> > > I was looking at that function because of the line:
> > > 
> > >  inode->i_size = lvb.lvb_size;
> > > 
> > > But I''m too ignorant about lustre internals to
understand this totally,
> > > I just checked that ll_extent_lock was called in the log (not
that it
> > > helps much - if I understand correctly, this is the usual way to
call
> > > for a lock in the filesystem).
> > > 
> > > Best regards,
> > > 						Jo?o Miguel Neves
> > > 
> > > Dom, 2006-07-16 ?s 10:16 +0100, Jo?o Miguel Neves escreveu:
> > > > Doesn''t trigger the bug.
> > > > 
> > > > Tried 3 times, one it took 0s to truncate, the other two, it
took 18s.
> > > > 
> > > > Thanks,
> > > > 						Jo?o Miguel Neves
> > > > 
> > > > Dom, 2006-07-16 ?s 02:49 -0600, Andreas Dilger escreveu:
> > > > > On Jul 16, 2006  08:50 +0100, Jo?o Miguel Neves wrote:
> > > > > > If anyone is willing to try the attached script? I
would really like to
> > > > > > understand if this is an issue that just happens
in some uses or in some
> > > > > > lustre installations.
> > > > > > 
> > > > > > The attached ruby script always recreates the
problem for me. This test
> > > > > > creates nearly half a million files and takes
about 2GB of disk space.
> > > > > > The /root/log in my system got a bit over 21GB of
logs. The script takes
> > > > > > some hours to run in my system.
> > > > > > 
> > > > > > A version of this script that checked the size of
the file ''a/a/aa'' to
> > > > > > try to identify a log that would have caused the
change did not cause
> > > > > > the problem. I''m not sure how useful the
logs will be, but just comment
> > > > > > out the lines with "lctl" if you only
want to try to replicate the
> > > > > > problem.
> > > > > 
> > > > > Please give the following test script a try:
> > > > > 
> > > > > #!/bin/bash
> > > > > export LUSTRE={your lustre mountpoint}
> > > > > lctl clear
> > > > > lfs setstripe $LUSTRE/foo 0 0 1
> > > > > ssh $OST0NODE sysctl -w lustre.fail_loc=0x80000214
> > > > > dd if=/dev/zero of=$LUSTRE/foo bs=4k count=2
> > > > > (sync; sleep 1; sync &)
> > > > > sleep 6
> > > > > START=`date +%s`
> > > > > echo "$START: TRUNCATE START"; lctl mark
TRUNCATE START
> > > > > truncate $LUSTRE/foo 3  # this is in the lustre tests
dir, it just
> > > > > 			# calls truncate(filename, size)
> > > > > END=`date +%s`
> > > > > echo "$END: TRUNCATE END after $((END -
START))s"; lctl mark TRUNCATE END
> > > > > sleep 2
> > > > > lctl dk /tmp/debug
> > > > > lfs getstripe $LUSTRE/foo
> > > > > # verify size of object on OST with debugfs as before
> > > > > 
> > > > > You should get some messages in the syslog on the OST
that the write
> > > > > is being delayed, and the START and END markers should
be about 20s+
> > > > > apart.
> > > > > 
> > > > > The script is inducing an aritficial delay in the write
path on the OST,
> > > > > and then the truncate is trying to "pass" the
write.  This should be
> > > > > blocked on the client, because the truncate should
block on the locked
> > > > > page while it is trying to do the truncate.
> > > > > 
> > > > > Cheers, Andreas
> > > > > --
> > > > > Andreas Dilger
> > > > > Principal Software Engineer
> > > > > Cluster File Systems, Inc.
> > > > > 
> > > > > _______________________________________________
> > > > > Lustre-discuss mailing list
> > > > > Lustre-discuss@clusterfs.com
> > > > >
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > > > > 
> > > > _______________________________________________
> > > > Lustre-discuss mailing list
> > > > Lustre-discuss@clusterfs.com
> > > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > > _______________________________________________
> > > Lustre-discuss mailing list
> > > Lustre-discuss@clusterfs.com
> > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss@clusterfs.com
> > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/7108d1f1/attachment.bin

João Miguel Neves

2006-Jul-16 10:09 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

The time synchronization made no difference. The script still triggers
the bug. Trying to regenerate the logs. Last time I reduced the logs
that caused the trouble to 10MB with a 1 minute resolution and now I''m
trying with a 1 second resolution.

Best regards,
						Jo?o Miguel Neves

Dom, 2006-07-16 ?s 13:36 +0100, Jo?o Miguel Neves
escreveu:> Just in case that matters: OST/MDS was ~8h ahead of client. The test
> script should take a few more hours to finish, so I won''t have
news
> soon.
> 
> Dom, 2006-07-16 ?s 11:50 +0100, Jo?o Miguel Neves escreveu:
> > Ignore the time-related issues. I''ve just found out that NTP
> > synchronization isn''t working in the OSS/MDS. Could this be
cause of all
> > these? OSS and MDS are using the same clock. The client is the one
that
> > is different.
> > 
> > I''m going to fix that and run the scripts again.-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/86fac491/attachment.bin

João Miguel Neves

2006-Jul-16 23:37 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

New data: every time I put a check for a file size of a single fixed
file, the problem does not appear. I''ve tried with doing a read with
the
creation of every file or once every 676 files.

The file I tested is the first one created by the script
(''a/a/aa''). My
idea was to stop the creation script as soon as possible after affecting
that file. But seems like I''ve hit the uncertainty principle.

On the other end, it does point out to be a problem that only happens
under load, probably lock-related. I''m capturing a lustre log for each
file creation. The number of lines of the log follows a pattern:
normally it has something like 173 lines, and once every 10/15 files, it
grows. On the version that triggers the bug it grows to over 2000 lines,
while on the version without the bug it grows to around 257.

Any clues on what can cause a truncate to be canceled could probably
help me.

I''ll keep you posted, in the worst case it helps me think about this.

Best regards,
						Jo?o Miguel Neves
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060716/419245b9/attachment.bin

Oleg Drokin

2006-Jul-17 10:11 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Hello!

On Mon, Jul 17, 2006 at 06:37:28AM +0100, Jo?o Miguel Neves
wrote:> On the other end, it does point out to be a problem that only happens
> under load, probably lock-related. I''m capturing a lustre log for
each
> file creation. The number of lines of the log follows a pattern:
> normally it has something like 173 lines, and once every 10/15 files, it
> grows. On the version that triggers the bug it grows to over 2000 lines,
> while on the version without the bug it grows to around 257.
And can we have this debug log for the case of bug happened please?

Bye,
    Oleg

João Miguel Neves

2006-Jul-17 13:12 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

You can have the logs, I just don''t think they are useful :(.
It''s
22.6GB of logs and I still can''t pinpoint which of them caused the
problems. If you have some way to analyze this much data, I''ll find a
way to send them to you (even by fedex, if needed). But other than that,
they are just half a million logs without the culprit being known :(.

Best regards,
						Jo?o Miguel Neves

Seg, 2006-07-17 ?s 19:11 +0300, Oleg Drokin escreveu:> Hello!
> 
> On Mon, Jul 17, 2006 at 06:37:28AM +0100, Jo?o Miguel Neves wrote:
> > On the other end, it does point out to be a problem that only happens
> > under load, probably lock-related. I''m capturing a lustre log
for each
> > file creation. The number of lines of the log follows a pattern:
> > normally it has something like 173 lines, and once every 10/15 files,
it
> > grows. On the version that triggers the bug it grows to over 2000
lines,
> > while on the version without the bug it grows to around 257.
> 
> And can we have this debug log for the case of bug happened please?
> 
> Bye,
>     Oleg
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> -------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060717/1463dba3/attachment.bin

Oleg Drokin

2006-Jul-17 13:23 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

Hello!

On Mon, Jul 17, 2006 at 08:12:06PM +0100, Jo?o Miguel Neves
wrote:> You can have the logs, I just don''t think they are useful :(.
It''s
> 22.6GB of logs and I still can''t pinpoint which of them caused the
> problems. If you have some way to analyze this much data, I''ll
find a
> way to send them to you (even by fedex, if needed). But other than that,
> they are just half a million logs without the culprit being known :(.
Hm, I thought you have found that when the problem occurs, the logfile is bigger
then when the problem did not occur?
If I got you wrong, then nevermind.

Bye,
    Oleg

João Miguel Neves

2006-Jul-17 13:48 UTC

head link

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

I capture logs for each file creation (writing+truncate). Every 10/15
files, there are longer logs (created by locking, mostly). When the
problem happens, those logs are bigger, but I can''t pinpoint exactly in
which log is the result. So no, I still don''t have a sure culprit, just
some hundreds of candidates.

I still find it odd that the modified (incorrect) files get the
atime/ctime/mtime from the OSS/MDS while the non-modified (correct)
files keep the atime/ctime/mtime from the client. I found this out
thanks to a time difference between both machines (client and MDS/OSS)

Every time I included size queries in my tests,I wasn''t able to
replicate the bug.

Best regards,
						Jo?o Miguel Neves

Seg, 2006-07-17 ?s 22:23 +0300, Oleg Drokin escreveu:> Hello!
> 
> On Mon, Jul 17, 2006 at 08:12:06PM +0100, Jo?o Miguel Neves wrote:
> > You can have the logs, I just don''t think they are useful :(.
It''s
> > 22.6GB of logs and I still can''t pinpoint which of them
caused the
> > problems. If you have some way to analyze this much data,
I''ll find a
> > way to send them to you (even by fedex, if needed). But other than
that,
> > they are just half a million logs without the culprit being known :(.
> 
> Hm, I thought you have found that when the problem occurs, the logfile is
bigger
> then when the problem did not occur?
> If I got you wrong, then nevermind.
> 
> Bye,
>     Oleg
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@clusterfs.com
> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
> -------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060717/1046bdf5/attachment.bin

João Miguel Neves

2006-Sep-06 16:14 UTC

head link

[Lustre-discuss] Solved - Files padded with zeros until reaching 4k multiple?!?!

Just to leave a clue in the list archive. This problem has been solved.
More information and patch on:
https://bugzilla.lustre.org/show_bug.cgi?id=10730

The patch should become part of Lustre 1.4.7.2.

Thanks to all people involved,
					Jo?o Miguel Neves


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060906/ab698681/attachment.bin

Lustre discuss - Jul 2006 - Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Status - Files padded with zeros until reaching 4k multiple?!?!

[Lustre-discuss] Solved - Files padded with zeros until reaching 4k multiple?!?!