thr3ads.net - Lustre discuss - [Lustre-discuss] Bad lmm_size during open replay for inode [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Frederik Ferner

2010-Nov-23 12:01 UTC

[Lustre-discuss] Bad lmm_size during open replay for inode

Hi List,

during a planned MDT fail over today, we got a number of these messages 
below, can anyone explain what this could be?
> Nov 23 08:33:26 cs04r-sc-mds01-01 kernel: Lustre:
21054:0:(mds_open.c:367:mds_create_objects()) Bad lmm_size during open replay
for inode 111003141
> Nov 23 08:33:26 cs04r-sc-mds01-01 kernel: Lustre:
21043:0:(mds_open.c:367:mds_create_objects()) Bad lmm_size during open replay
for inode 111003144
> Nov 23 08:33:27 cs04r-sc-mds01-01 kernel: Lustre:
21059:0:(mds_open.c:367:mds_create_objects()) Bad lmm_size during open replay
for inode 110642714
> Nov 23 08:33:27 cs04r-sc-mds01-01 kernel: Lustre:
21059:0:(mds_open.c:367:mds_create_objects()) Skipped 7 previous similar
messages
Searching for this message produced only Lustre source code.

This is using Lustre 1.8.3-ddn3.3 on all servers and most clients, some 
clients use 1.8.4.

So far we''ve not noticed any ill effect but would like to know what
that
message is and if we can safely ignore it.

Kind regards,
Frederik
-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

Andreas Dilger

2010-Nov-23 17:12 UTC

head link

[Lustre-discuss] Bad lmm_size during open replay for inode

On 2010-11-23, at 05:01, Frederik Ferner wrote:> during a planned MDT fail over today, we got a number of these messages 
> below, can anyone explain what this could be?
> 
>> Nov 23 08:33:26 cs04r-sc-mds01-01 kernel: Lustre:
21054:0:(mds_open.c:367:mds_create_objects()) Bad lmm_size during open replay
for inode 111003141
This means that the client (trying to recreate a file that was not saved to disk
during the MDS failover) sent the layout information, but the size it reported
for the layout information did not match the size that the MDS thought it should
be for that kind of layout.

Unfortunately, the error message doesn''t report what those sizes are,
so it is hard to know what the impact might be.  The message is only a warning,
and it is not necessarily a problem if the client-specified size is larger than
the size expected, but it might be a problem if the client-specified size is
smaller than expected (which I think is the less likely case).
> This is using Lustre 1.8.3-ddn3.3 on all servers and most clients, some 
> clients use 1.8.4.
I can''t comment on what changes are in the DDN release, so I
don''t know if this is specific to that release or not.  In any case,
I''ve never seen these messages before.
> So far we''ve not noticed any ill effect but would like to know
what that
> message is and if we can safely ignore it.
It would only affect the listed inodes, if at all.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Alexey Lyashkov

2010-Nov-24 09:23 UTC

head link

[Lustre-discuss] Bad lmm_size during open replay for inode

On Nov 23, 2010, at 20:12, Andreas Dilger wrote:
> On 2010-11-23, at 05:01, Frederik Ferner wrote:
>> during a planned MDT fail over today, we got a number of these messages
>> below, can anyone explain what this could be?
>> 
>>> Nov 23 08:33:26 cs04r-sc-mds01-01 kernel: Lustre:
21054:0:(mds_open.c:367:mds_create_objects()) Bad lmm_size during open replay
for inode 111003141
> 
> This means that the client (trying to recreate a file that was not saved to
disk during the MDS failover) sent the layout information, but the size it
reported for the layout information did not match the size that the MDS thought
it should be for that kind of layout.if you don''t have PPC clients, that say MDS forget to shrink LOV EA
buffer before send to client or someone break code to shrink replay buffer on
client side.
(client trust LOV EA size from MDS reply)



--------------------------------------
Alexey Lyashkov
alexey.lyashkov at clusterstor.com

Frederik Ferner

2010-Nov-24 16:19 UTC

head link

[Lustre-discuss] Bad lmm_size during open replay for inode

Alexey Lyashkov wrote:> On Nov 23, 2010, at 20:12, Andreas Dilger wrote:
> 
>> On 2010-11-23, at 05:01, Frederik Ferner wrote:
>>> during a planned MDT fail over today, we got a number of these
>>> messages below, can anyone explain what this could be?
>>> 
>>>> Nov 23 08:33:26 cs04r-sc-mds01-01 kernel: Lustre:
>>>> 21054:0:(mds_open.c:367:mds_create_objects()) Bad lmm_size
>>>> during open replay for inode 111003141
>> This means that the client (trying to recreate a file that was not
>> saved to disk during the MDS failover) sent the layout information,
>> but the size it reported for the layout information did not match
>> the size that the MDS thought it should be for that kind of layout.
> if you don''t have PPC clients, that say MDS forget to shrink LOV
EA
> buffer before send to client or someone break code to shrink replay
> buffer on client side. (client trust LOV EA size from MDS reply)
No PPC clients here. Other than that I''m not sure I understand that 
paragraph, do you mean PPC clients mis-interpret the data send from the 
MDS during replay and that these warnings could happen if somehow the 
replay buffer on the client side shrinks?

Cheers,
Frederik

-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

Frederik Ferner

2010-Nov-24 16:27 UTC

head link

[Lustre-discuss] Bad lmm_size during open replay for inode

Andreas Dilger wrote:> On 2010-11-23, at 05:01, Frederik Ferner wrote:
>> during a planned MDT fail over today, we got a number of these
>> messages below, can anyone explain what this could be?
>> 
>>> Nov 23 08:33:26 cs04r-sc-mds01-01 kernel: Lustre:
>>> 21054:0:(mds_open.c:367:mds_create_objects()) Bad lmm_size during
>>> open replay for inode 111003141
> 
> This means that the client (trying to recreate a file that was not
> saved to disk during the MDS failover) sent the layout information,
> but the size it reported for the layout information did not match the
> size that the MDS thought it should be for that kind of layout.
> 
> Unfortunately, the error message doesn''t report what those sizes
are,
> so it is hard to know what the impact might be.  The message is only
> a warning, and it is not necessarily a problem if the
> client-specified size is larger than the size expected, but it might
> be a problem if the client-specified size is smaller than expected
> (which I think is the less likely case).
Thanks for this, I don''t think, I''ll worry to much about it
now as the
clients were all fairly quiet at the time of failover, so I don''t think
many important files have been written then.

We tried to suspend all cluster jobs about 10 minutes before the fail 
over and some of the files/inodes at least now seem to belong to some 
cluster jobs. So I''m not sure if the inodes still are the same files or
what was going on then.

Does this relate to the stripe layout? Most files should have a stripe 
count of 1, would this make a difference?
>> This is using Lustre 1.8.3-ddn3.3 on all servers and most clients,
>> some clients use 1.8.4.
> 
> I can''t comment on what changes are in the DDN release, so I
don''t
> know if this is specific to that release or not.  In any case,
I''ve
> never seen these messages before.
I''ll test this later on our test file system but no promises that
I''ll
be able to reproduce similar conditions.
>> So far we''ve not noticed any ill effect but would like to know
what
>> that message is and if we can safely ignore it.
> 
> It would only affect the listed inodes, if at all.
Unfortunately I don''t have the full list of inodes as syslog has
skipped
some ''similar messages'', but as mentioned above, I''m
not that worried at
the moment.

Thanks,
Frederik
-- 
Frederik Ferner
Computer Systems Administrator		phone: +44 1235 77 8624
Diamond Light Source Ltd.		mob:   +44 7917 08 5110
(Apologies in advance for the lines below. Some bits are a legal
requirement and I have no control over them.)

Lustre discuss - Nov 2010 - Bad lmm_size during open replay for inode

[Lustre-discuss] Bad lmm_size during open replay for inode

[Lustre-discuss] Bad lmm_size during open replay for inode

[Lustre-discuss] Bad lmm_size during open replay for inode

[Lustre-discuss] Bad lmm_size during open replay for inode

[Lustre-discuss] Bad lmm_size during open replay for inode