thr3ads.net - Lustre devel - [Lustre-devel] storing SOM epoch in EA [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Alex Zhuravlev

2008-Feb-19 09:48 UTC

[Lustre-devel] storing SOM epoch in EA

Good day,

some time ago we discussed that it would be very helpful to
store epoch in inode on mds. the perfect solution could be
to store epoch in old inode body, but there is no much space
for this in the body and with DMU we''ll have this problem
again.

given the minimal inode size we use on MDS is 512 bytes, we
can store upto 13 stripes in the body. larger EAs go to a
dedicated block. if we add 8 byte epoch, then we can store
upto 12 stripes in the body. so, epoch stored in EA affects
only files with exactly 13 stripes. files with different
stripes are unaffected at all.

couple lesser concerns are:
1) cpu usage
2) epoch on old filesystem with insufficient inode space

any objections to use EA to store SOM epoch?

thanks, Alex

Yuriy Umanets

2008-Feb-19 10:28 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev wrote:> Good day,
>
> some time ago we discussed that it would be very helpful to
> store epoch in inode on mds. the perfect solution could be
> to store epoch in old inode body, but there is no much space
> for this in the body and with DMU we''ll have this problem
> again.
>
> given the minimal inode size we use on MDS is 512 bytes, we
> can store upto 13 stripes in the body. larger EAs go to a
> dedicated block. if we add 8 byte epoch, then we can store
> upto 12 stripes in the body. so, epoch stored in EA affects
> only files with exactly 13 stripes. files with different
> stripes are unaffected at all.
>
> couple lesser concerns are:
> 1) cpu usage
> 2) epoch on old filesystem with insufficient inode space
>
> any objections to use EA to store SOM epoch?
>   hi!

Can we use IAM for storing epoch? It is fast and does not have such 
strong size limitations. We could make "epoch" index in mkfs time
(like
it is done for existing indexes now) and use object''s fid as a key and 
epoch as value.

Thanks.

-- 
umka

Alex Zhuravlev

2008-Feb-19 10:30 UTC

head link

[Lustre-devel] storing SOM epoch in EA

hmm. not sure I got it. epoch is per-inode. and we don''t need >1
epoch for
any inode.

thanks, Alex

Yuriy Umanets wrote:> Can we use IAM for storing epoch? It is fast and does not have such 
> strong size limitations. We could make "epoch" index in mkfs time
(like
> it is done for existing indexes now) and use object''s fid as a key
and
> epoch as value.
> 
> Thanks.
>

Yuriy Umanets

2008-Feb-19 10:38 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev wrote:> hmm. not sure I got it. epoch is per-inode. and we don''t need
>1 epoch
> for
> any inode.
>Yes, right. We will not have few epochs for the inode. I think we need 
Nikita here as he is author of IAM and may help us.

In HEAD we have got OI (Object Index) which purpose is to map object 
fids into object store cookies (inode + generation). Fid here is the key 
and inode store info is value. We have only one such mapping entry for 
any inode. I proposed to have similar mapping, but store SOM epoch for 
the inode same way. Use fid as key and epoch as value.

Nikita, is this correct using of IAM?

Thanks.> thanks, Alex
>
> Yuriy Umanets wrote:
>> Can we use IAM for storing epoch? It is fast and does not have such 
>> strong size limitations. We could make "epoch" index in mkfs
time
>> (like it is done for existing indexes now) and use object''s
fid as a
>> key and epoch as value.
>>
>> Thanks.
>>
>

-- 
umka

Vitaly Fertman

2008-Feb-19 10:59 UTC

head link

[Lustre-devel] storing SOM epoch in EA

> hi!
> 
> Can we use IAM for storing epoch? It is fast and does not have such 
> strong size limitations. there are no size limitations, EA can be stored in a separate
block, we just want to minimize IO.
> We could make "epoch" index in mkfs time (like  
> it is done for existing indexes now) and use object''s fid as a key
and
> epoch as value.this looks like it will double IO/seeks for each inode.

-- 
Vitaly

Yuriy Umanets

2008-Feb-19 11:11 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Vitaly Fertman wrote:>> hi!
>>
>> Can we use IAM for storing epoch? It is fast and does not have such 
>> strong size limitations. 
>>     
> there are no size limitations, EA can be stored in a separate
> block, we just want to minimize IO.
>
>   
EA is separate block is evil. It makes things slow.>> We could make "epoch" index in mkfs time (like  
>> it is done for existing indexes now) and use object''s fid as a
key and
>> epoch as value.
>>     
> this looks like it will double IO/seeks for each inode.
>
>   Well, it did not in cmd3 :)


-- 
umka

Alex Zhuravlev

2008-Feb-19 11:18 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Yuriy Umanets wrote:> EA is separate block is evil. It makes things slow.
we have fast EAs (stored in inode, this is why we make them large) for years.
> Well, it did not in cmd3 :)
if it isn''t stored in inode, it''s a seek.

thanks, Alex

Yuriy Umanets

2008-Feb-19 12:02 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev wrote:> Yuriy Umanets wrote:
>   
>> EA is separate block is evil. It makes things slow.
>>     
>
> we have fast EAs (stored in inode, this is why we make them large) for
years.
>   Well, people used horses for ages but this did not stop them from 
building cars :) Guys, I gave you idea, not worse than using EAs. I will 
not insist it is great. If you can''t estimate its value yourself, well,
let it be. We have such a nice thing as IAM and you keep talking about 
EAs...

Seriously, IMHO what is bad about EAs:

1. You need to control their size, you need to bother;
2. Large-fast inodes make create/lookup slow. You need to load this 
thing to memory after all. I think this is complement to additional 
seeks caused by IAM;
3. Storing epoch in EA makes you use this chain to access epoch: 
fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch (in IAM);
4. Large inodes consume more RAM;
5. There others...  but they are less related to technical 
downsides/advantages so I will omit them.

Thanks.

-- 
umka

Alex Zhuravlev

2008-Feb-19 12:09 UTC

head link

[Lustre-devel] storing SOM epoch in EA

I guess there is some sort of misunderstanding here.

we don''t need fid->epoch mapping. we only need epoch along with
other
inode attributes. epoch is fixed size (8 bytes, probably few more for
flags in future)


thanks, Alex


Yuriy Umanets wrote:> Alex Zhuravlev wrote:
>> Yuriy Umanets wrote:
>>   
>>> EA is separate block is evil. It makes things slow.
>>>     
>> we have fast EAs (stored in inode, this is why we make them large) for
years.
>>   
> Well, people used horses for ages but this did not stop them from 
> building cars :) Guys, I gave you idea, not worse than using EAs. I will 
> not insist it is great. If you can''t estimate its value yourself,
well,
> let it be. We have such a nice thing as IAM and you keep talking about 
> EAs...
> 
> Seriously, IMHO what is bad about EAs:
> 
> 1. You need to control their size, you need to bother;
> 2. Large-fast inodes make create/lookup slow. You need to load this 
> thing to memory after all. I think this is complement to additional 
> seeks caused by IAM;
> 3. Storing epoch in EA makes you use this chain to access epoch: 
> fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch (in
IAM);
> 4. Large inodes consume more RAM;
> 5. There others...  but they are less related to technical 
> downsides/advantages so I will omit them.
> 
> Thanks.
>

Alex Zhuravlev

2008-Feb-19 12:13 UTC

head link

[Lustre-devel] storing SOM epoch in EA

btw, are you proposing to store LOV in global IAM?

thanks, Alex

Yuriy Umanets wrote:> Seriously, IMHO what is bad about EAs:
> 
> 1. You need to control their size, you need to bother;
> 2. Large-fast inodes make create/lookup slow. You need to load this 
> thing to memory after all. I think this is complement to additional 
> seeks caused by IAM;
> 3. Storing epoch in EA makes you use this chain to access epoch: 
> fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch (in
IAM);
> 4. Large inodes consume more RAM;
> 5. There others...  but they are less related to technical 
> downsides/advantages so I will omit them.
> 
> Thanks.
>

Yuriy Umanets

2008-Feb-19 14:28 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev wrote:> I guess there is some sort of misunderstanding here.
>
> we don''t need fid->epoch mapping. we only need epoch along with
other
> inode attributes. epoch is fixed size (8 bytes, probably few more for
> flags in future)
>
>   Alex,

Yes, this is what I understand as well. And we were discussing that EA 
approach has some downsides. In fact what you propose, that is, store it 
in EA is logical taking into account that epoch is kind of extension to 
inode fields. It is property of inode-object. It is logical to store it 
with inode, I see your point. But as we saw, this has/may have some 
downsides which may be solved with IAM. Just take this in mind when you 
think/work on it. I do not see why IAM is such a bad here.

Thanks.> thanks, Alex
>
>
> Yuriy Umanets wrote:
>   
>> Alex Zhuravlev wrote:
>>     
>>> Yuriy Umanets wrote:
>>>   
>>>       
>>>> EA is separate block is evil. It makes things slow.
>>>>     
>>>>         
>>> we have fast EAs (stored in inode, this is why we make them large)
for years.
>>>   
>>>       
>> Well, people used horses for ages but this did not stop them from 
>> building cars :) Guys, I gave you idea, not worse than using EAs. I
will
>> not insist it is great. If you can''t estimate its value
yourself, well,
>> let it be. We have such a nice thing as IAM and you keep talking about 
>> EAs...
>>
>> Seriously, IMHO what is bad about EAs:
>>
>> 1. You need to control their size, you need to bother;
>> 2. Large-fast inodes make create/lookup slow. You need to load this 
>> thing to memory after all. I think this is complement to additional 
>> seeks caused by IAM;
>> 3. Storing epoch in EA makes you use this chain to access epoch: 
>> fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch
(in IAM);
>> 4. Large inodes consume more RAM;
>> 5. There others...  but they are less related to technical 
>> downsides/advantages so I will omit them.
>>
>> Thanks.
>>
>>     
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   

-- 
umka

Yuriy Umanets

2008-Feb-19 14:30 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev wrote:> btw, are you proposing to store LOV in global IAM?by "LOV" you mean LOV EA? If yes, well, this is too radical idea
seems,
but it may be worse to think on. Finally using IAM with it will cost 
almost nothing in meaning of additional development. IAM should be ready 
for that.

Nikita, is there any limitations for value size in IAM?

Thanks.>
> thanks, Alex
>
> Yuriy Umanets wrote:
>> Seriously, IMHO what is bad about EAs:
>>
>> 1. You need to control their size, you need to bother;
>> 2. Large-fast inodes make create/lookup slow. You need to load this 
>> thing to memory after all. I think this is complement to additional 
>> seeks caused by IAM;
>> 3. Storing epoch in EA makes you use this chain to access epoch: 
>> fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch
(in IAM);
>> 4. Large inodes consume more RAM;
>> 5. There others...  but they are less related to technical 
>> downsides/advantages so I will omit them.
>>
>> Thanks.
>>
>

-- 
umka

Nikita Danilov

2008-Feb-19 14:36 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Yuriy Umanets writes:
 > Alex Zhuravlev wrote:
 > > btw, are you proposing to store LOV in global IAM?
 > by "LOV" you mean LOV EA? If yes, well, this is too radical idea
seems,
 > but it may be worse to think on. Finally using IAM with it will cost 
 > almost nothing in meaning of additional development. IAM should be ready 
 > for that.
 > 
 > Nikita, is there any limitations for value size in IAM?

Htree shift code will be upset if key+value are larger than one fourth
of a block, but that''s easy to fix.

 > 
 > Thanks.

Nikita.

Yuriy Umanets

2008-Feb-19 14:39 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Yuriy Umanets wrote:> Alex Zhuravlev wrote:
>   
>> btw, are you proposing to store LOV in global IAM?
>>     
> by "LOV" you mean LOV EA? If yes, well, this is too radical idea
seems,
> but it may be worse to think on. Finally using IAM with it will cost 
>   s/worse/valuable/



-- 
umka

Alex Zhuravlev

2008-Feb-19 14:42 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Yuriy Umanets wrote:> Alex Zhuravlev wrote:
>> btw, are you proposing to store LOV in global IAM?
> by "LOV" you mean LOV EA? If yes, well, this is too radical idea
seems,
> but it may be worse to think on. Finally using IAM with it will cost 
> almost nothing in meaning of additional development. IAM should be ready 
> for that.
it will cost additional seek to access something through IAM.
same applies to LOV and to epoch.

thanks, Alex

Yuriy Umanets

2008-Feb-19 14:44 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Nikita Danilov wrote:> Yuriy Umanets writes:
>  > Alex Zhuravlev wrote:
>  > > btw, are you proposing to store LOV in global IAM?
>  > by "LOV" you mean LOV EA? If yes, well, this is too radical
idea seems,
>  > but it may be worse to think on. Finally using IAM with it will cost 
>  > almost nothing in meaning of additional development. IAM should be
ready
>  > for that.
>  > 
>  > Nikita, is there any limitations for value size in IAM?
>
> Htree shift code will be upset if key+value are larger than one fourth
> of a block, but that''s easy to fix.
>   This is in fact interesting idea. An object (inode + EA, etc) always 
gets more and more info while adding new features and one day we will 
face the need to get rid of EA seems because it''s too big.

Thanks.>  
>  > 
>  > Thanks.
>
> Nikita.
>   

-- 
umka

Alex Zhuravlev

2008-Feb-19 14:47 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Yuriy Umanets wrote:> Yes, this is what I understand as well. And we were discussing that EA 
> approach has some downsides. In fact what you propose, that is, store it 
> in EA is logical taking into account that epoch is kind of extension to 
> inode fields. It is property of inode-object. It is logical to store it 
> with inode, I see your point. But as we saw, this has/may have some 
> downsides which may be solved with IAM. Just take this in mind when you 
> think/work on it. I do not see why IAM is such a bad here.
1) additional seek(s)
2) shared structure (additional cost on concurrent access)
3) inode is already 512 bytes

thanks, Alex

Mikhail Pershin

2008-Feb-19 14:59 UTC

head link

[Lustre-devel] storing SOM epoch in EA

On Tue, 19 Feb 2008 15:02:02 +0300, Yuriy Umanets <Yury.Umanets at
Sun.COM>
wrote:
> Alex Zhuravlev wrote:
>> Yuriy Umanets wrote:
>>
>>> EA is separate block is evil. It makes things slow.
>>>
>>
>> we have fast EAs (stored in inode, this is why we make them large) for
>> years.
>>
> Well, people used horses for ages but this did not stop them from
> building cars :) Guys, I gave you idea, not worse than using EAs. I will
> not insist it is great. If you can''t estimate its value yourself,
well,
> let it be. We have such a nice thing as IAM and you keep talking about
> EAs...
>
> Seriously, IMHO what is bad about EAs:
>
> 1. You need to control their size, you need to bother;
> 2. Large-fast inodes make create/lookup slow. You need to load this
> thing to memory after all. I think this is complement to additional
> seeks caused by IAM;
but this is still better than extra block for EA or IAM. Btw IAM data is  
also in memory and takes it no less than extra inode size possibly
> 3. Storing epoch in EA makes you use this chain to access epoch:
> fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch (in
IAM);
not true actually. inode will be read anyway until you are proposing to  
put whole inode body in IAM, so there is no benefits. Moreover inode->ea  
is direct mapping while fid->epoch will need index lookup and may invoke  
several blocks to read if IAM is large and it will be large in this case,  
so IO will be not better than even EA in extra block.
> 4. Large inodes consume more RAM;
this is the same as 2.

Guys, don''t forget about DMU as well.

-- 
Mikhail Pershin
Staff Engineer
Lustre Group
Sun Microsystems, Inc

Yuriy Umanets

2008-Feb-19 15:10 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev wrote:> Yuriy Umanets wrote:
>   
>> Yes, this is what I understand as well. And we were discussing that EA 
>> approach has some downsides. In fact what you propose, that is, store
it
>> in EA is logical taking into account that epoch is kind of extension to
>> inode fields. It is property of inode-object. It is logical to store it
>> with inode, I see your point. But as we saw, this has/may have some 
>> downsides which may be solved with IAM. Just take this in mind when you
>> think/work on it. I do not see why IAM is such a bad here.
>>     
>
> 1) additional seek(s)
> 2) shared structure (additional cost on concurrent access)
> 3) inode is already 512 bytes
>
>   Agreed, but this is all not measured and may happen that IAM is not 
worse but more handy in many respects.

Thanks.> thanks, Alex
>
>
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>   

-- 
umka

Kalpak Shah

2008-Feb-19 15:11 UTC

head link

[Lustre-devel] storing SOM epoch in EA

On Tue, 2008-02-19 at 17:59 +0300, Mikhail Pershin
wrote:> On Tue, 19 Feb 2008 15:02:02 +0300, Yuriy Umanets <Yury.Umanets at
Sun.COM>
> wrote:
> 
> > Alex Zhuravlev wrote:
> >> Yuriy Umanets wrote:
> >>
> >>> EA is separate block is evil. It makes things slow.
> >>>
> >>
> >> we have fast EAs (stored in inode, this is why we make them large)
for
> >> years.
> >>
> > Well, people used horses for ages but this did not stop them from
> > building cars :) Guys, I gave you idea, not worse than using EAs. I
will
> > not insist it is great. If you can''t estimate its value
yourself, well,
> > let it be. We have such a nice thing as IAM and you keep talking about
> > EAs...
> >
> > Seriously, IMHO what is bad about EAs:
> >
> > 1. You need to control their size, you need to bother;
> > 2. Large-fast inodes make create/lookup slow. You need to load this
> > thing to memory after all. I think this is complement to additional
> > seeks caused by IAM;
> 
> but this is still better than extra block for EA or IAM. Btw IAM data is  
> also in memory and takes it no less than extra inode size possibly
> 
> > 3. Storing epoch in EA makes you use this chain to access epoch:
> > fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch
(in IAM);
> 
> not true actually. inode will be read anyway until you are proposing to  
> put whole inode body in IAM, so there is no benefits. Moreover inode->ea
> is direct mapping while fid->epoch will need index lookup and may invoke
> several blocks to read if IAM is large and it will be large in this case,  
> so IO will be not better than even EA in extra block.
> 
> > 4. Large inodes consume more RAM;
> 
> this is the same as 2.
> 
> Guys, don''t forget about DMU as well.
For the DMU, we will be using 1024-byte dnodes by default to store the
striping information. So the epoch can be stored in the in-dnode system
attributes. The epoch will need to be stored in an external block or
FatZap (depending on implementation of in-dnode EAs) only in-case the
file is striped across more than 10-15 OSTs. (The exact number of
striped will again depend on the design of in-dnode EAs)

Thanks,
Kalpak.
>

Yuriy Umanets

2008-Feb-19 15:14 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Mikhail Pershin wrote:> On Tue, 19 Feb 2008 15:02:02 +0300, Yuriy Umanets <Yury.Umanets at
Sun.COM>
> wrote:
>
>   
>> Alex Zhuravlev wrote:
>>     
>>> Yuriy Umanets wrote:
>>>
>>>       
>>>> EA is separate block is evil. It makes things slow.
>>>>
>>>>         
>>> we have fast EAs (stored in inode, this is why we make them large)
for
>>> years.
>>>
>>>       
>> Well, people used horses for ages but this did not stop them from
>> building cars :) Guys, I gave you idea, not worse than using EAs. I
will
>> not insist it is great. If you can''t estimate its value
yourself, well,
>> let it be. We have such a nice thing as IAM and you keep talking about
>> EAs...
>>
>> Seriously, IMHO what is bad about EAs:
>>
>> 1. You need to control their size, you need to bother;
>> 2. Large-fast inodes make create/lookup slow. You need to load this
>> thing to memory after all. I think this is complement to additional
>> seeks caused by IAM;
>>     
>
> but this is still better than extra block for EA or IAM. Btw IAM data is  
> also in memory and takes it no less than extra inode size possibly
>   
If it is in memory it will generate less seeks :-)>   
>> 3. Storing epoch in EA makes you use this chain to access epoch:
>> fid->inode->epoch (in EA), IAM makes it shorter: fid->epoch
(in IAM);
>>     
>
> not true actually. inode will be read anyway until you are proposing to  
> put whole inode body in IAM, so there is no benefits. Moreover inode->ea
> is direct mapping while fid->epoch will need index lookup and may invoke
> several blocks to read if IAM is large and it will be large in this case,  
> so IO will be not better than even EA in extra block.
>
>   I did not mean to put whole inode in IAM. I meant only put there fid as 
key and epoch as value. So way to access epoch is shorter with IAM as no 
need to load inode. But these all need to be well thought as all your 
mention more seeks, new reads, etc.>> 4. Large inodes consume more RAM;
>>     
>
> this is the same as 2.
>
> Guys, don''t forget about DMU as well.
>
>   

-- 
umka

Ricardo M. Correia

2008-Feb-19 15:18 UTC

head link

[Lustre-devel] storing SOM epoch in EA

On Ter, 2008-02-19 at 17:59 +0300, Mikhail Pershin wrote: 
> Guys, don''t forget about DMU as well.

For the DMU, we haven''t reached a consensus on a final design for EAs
in
dnode with the ZFS team yet.
The ZFS team proposed having variably-sized system attributes (with
integer indexes) instead of having name-value attributes like ext3.

I guess this is another good point to discuss in today''s ZFS team
meeting.

Thanks,
Ricardo

--

Ricardo Manuel Correia
Lustre Engineering

Sun Microsystems, Inc.
Portugal
Phone +351.214134023 / x58723
Mobile +351.912590825
Email Ricardo.M.Correia at Sun.COM
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20080219/1241151b/attachment-0004.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6g_top.gif
Type: image/gif
Size: 1257 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20080219/1241151b/attachment-0004.gif

Alex Zhuravlev

2008-Feb-19 15:19 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Yuriy Umanets wrote:> I did not mean to put whole inode in IAM. I meant only put there fid as 
> key and epoch as value. So way to access epoch is shorter with IAM as no 
> need to load inode. But these all need to be well thought as all your 
> mention more seeks, new reads, etc.
I don''t understand benefits of this approach. the idea is to pack
frequently
accessed data together so that we don''t need additional seeks and
load/store
these data with a single contiguous IO.

thanks, Alex

Ricardo M. Correia

2008-Feb-19 15:23 UTC

head link

[Lustre-devel] storing SOM epoch in EA

On Ter, 2008-02-19 at 20:41 +0530, Kalpak Shah wrote:
> The epoch will need to be stored in an external block or
> FatZap (depending on implementation of in-dnode EAs) only in-case the
> file is striped across more than 10-15 OSTs.

That may not be true.
For example, with Matthew''s proposed design, we could put the epoch as
a
system attribute with "higher priority" (lower index) than LOV data,
which means it would always fit in the dnode even if we have lots of
LOVs.

Regards,
Ricardo

--

Ricardo Manuel Correia
Lustre Engineering

Sun Microsystems, Inc.
Portugal
Phone +351.214134023 / x58723
Mobile +351.912590825
Email Ricardo.M.Correia at Sun.COM
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20080219/81be0367/attachment-0004.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6g_top.gif
Type: image/gif
Size: 1257 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20080219/81be0367/attachment-0004.gif

Vitaly Fertman

2008-Feb-19 15:28 UTC

head link

[Lustre-devel] storing SOM epoch in EA

> I did not mean to put whole inode in IAM. I meant only put there fid as 
> key and epoch as value. So way to access epoch is shorter with IAM as no 
> need to load inode. But these all need to be well thought as all your 
> mention more seeks, new reads, etc.
as Alex already mentioned, we do not need fid->ioepoch mapping.
ioepoch is a tag for inode attributes and all them need to be loaded
together, i.e. there is a need to load inode anyway.

-- 
Vitaly

Eric Barton

2008-Feb-19 15:31 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Rather than discussing this one EA at a time, should we not
consider any other EAs (e.g. being considered in current 
architecture work) that might contend for space in the inode?
> -----Original Message-----
> From: lustre-devel-bounces at lists.lustre.org 
> [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of 
> Alex Zhuravlev
> Sent: 19 February 2008 9:49 AM
> To: lustre-devel at lists.lustre.org
> Subject: [Lustre-devel] storing SOM epoch in EA
> 
> Good day,
> 
> some time ago we discussed that it would be very helpful to
> store epoch in inode on mds. the perfect solution could be
> to store epoch in old inode body, but there is no much space
> for this in the body and with DMU we''ll have this problem
> again.
> 
> given the minimal inode size we use on MDS is 512 bytes, we
> can store upto 13 stripes in the body. larger EAs go to a
> dedicated block. if we add 8 byte epoch, then we can store
> upto 12 stripes in the body. so, epoch stored in EA affects
> only files with exactly 13 stripes. files with different
> stripes are unaffected at all.
> 
> couple lesser concerns are:
> 1) cpu usage
> 2) epoch on old filesystem with insufficient inode space
> 
> any objections to use EA to store SOM epoch?
> 
> thanks, Alex
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
>

Alex Zhuravlev

2008-Feb-19 15:42 UTC

head link

[Lustre-devel] storing SOM epoch in EA

well, this was one of the reasons I asked lustre-devel@ for inputs.
what else we do consider important to store in inode.

probably we should list all existing and planned EAs and rank them.

also, I''ve got to think that for some cases we don''t need to
load
LOV EA. for example, for getattr in case of SOM (size/blocks are
cached on MDS). or for revalidation when client already has LOV EA.

thanks, Alex

Eric Barton wrote:> Rather than discussing this one EA at a time, should we not
> consider any other EAs (e.g. being considered in current 
> architecture work) that might contend for space in the inode?
> 
>> -----Original Message-----
>> From: lustre-devel-bounces at lists.lustre.org 
>> [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of 
>> Alex Zhuravlev
>> Sent: 19 February 2008 9:49 AM
>> To: lustre-devel at lists.lustre.org
>> Subject: [Lustre-devel] storing SOM epoch in EA
>>
>> Good day,
>>
>> some time ago we discussed that it would be very helpful to
>> store epoch in inode on mds. the perfect solution could be
>> to store epoch in old inode body, but there is no much space
>> for this in the body and with DMU we''ll have this problem
>> again.
>>
>> given the minimal inode size we use on MDS is 512 bytes, we
>> can store upto 13 stripes in the body. larger EAs go to a
>> dedicated block. if we add 8 byte epoch, then we can store
>> upto 12 stripes in the body. so, epoch stored in EA affects
>> only files with exactly 13 stripes. files with different
>> stripes are unaffected at all.
>>
>> couple lesser concerns are:
>> 1) cpu usage
>> 2) epoch on old filesystem with insufficient inode space
>>
>> any objections to use EA to store SOM epoch?
>>
>> thanks, Alex
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>>
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

Nikita Danilov

2008-Feb-19 16:21 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev writes:
 > 
 > Yuriy Umanets wrote:
 > > EA is separate block is evil. It makes things slow.
 > 
 > we have fast EAs (stored in inode, this is why we make them large) for
years.
 > 
 > > Well, it did not in cmd3 :)
 > 
 > if it isn''t stored in inode, it''s a seek.

One possible point here is that OSD has to do fid->ino translation
anyway, and it makes sense to use the same index to store other
information besides inode number. That is, we can use "object index"
iam
file to map fid into (ino, gen, ioepoch, LOV, ...) records, and this
would not cause any additional seeks. The downside here is that object
index is so heavily used, that making its records larger is going to
increase the amount of IO significantly, so it only worth to place there
things that we are absolutely sure will be needed for every inode.

 > 
 > thanks, Alex

Nikita.

Alex Zhuravlev

2008-Feb-19 16:27 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Nikita Danilov wrote:> One possible point here is that OSD has to do fid->ino translation
> anyway, and it makes sense to use the same index to store other
> information besides inode number. That is, we can use "object
index" iam
> file to map fid into (ino, gen, ioepoch, LOV, ...) records, and this
> would not cause any additional seeks. The downside here is that object
> index is so heavily used, that making its records larger is going to
> increase the amount of IO significantly, so it only worth to place there
> things that we are absolutely sure will be needed for every inode.
well, we also need to update ioepoch when we update size/blocks.

thanks, Alex

Andreas Dilger

2008-Feb-19 20:13 UTC

head link

[Lustre-devel] storing SOM epoch in EA

On Feb 19, 2008  16:30 +0200, Yuriy Umanets wrote:> Alex Zhuravlev wrote:
> by "LOV" you mean LOV EA? If yes, well, this is too radical idea
seems,
> but it may be worse to think on. Finally using IAM with it will cost 
> almost nothing in meaning of additional development. IAM should be ready 
> for that.
> 
> Nikita, is there any limitations for value size in IAM?
One of the major problems with IAM is that e2fsck doesn''t work with it,
it will only exist for ldiskfs (though ZAP works for DMU), and there is
a consistency issue between items stored in IAM and in rest of filesystem.

If e2fsck deletes an inode, it will not delete entry in IAM, so now we
have to patch e2fsck to understand not only IAM, but also specific uses
of IAM that link items there to inodes in another place.  I don''t think
that introducing dependence on IAM is practical.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Andreas Dilger

2008-Feb-19 20:19 UTC

head link

[Lustre-devel] storing SOM epoch in EA

On Feb 19, 2008  15:31 +0000, Eric Barton wrote:> Rather than discussing this one EA at a time, should we not
> consider any other EAs (e.g. being considered in current 
> architecture work) that might contend for space in the inode?
I wouldn''t object to this.  There were several other proposals to add
EAs to the inode, but individually the overhead is high.  If we have
a single aggregated EA struct for Lustre that would be more reasonable.
> > -----Original Message-----
> > From: lustre-devel-bounces at lists.lustre.org 
> > [mailto:lustre-devel-bounces at lists.lustre.org] On Behalf Of 
> > Alex Zhuravlev
> > Sent: 19 February 2008 9:49 AM
> > To: lustre-devel at lists.lustre.org
> > Subject: [Lustre-devel] storing SOM epoch in EA
> > 
> > Good day,
> > 
> > some time ago we discussed that it would be very helpful to
> > store epoch in inode on mds. the perfect solution could be
> > to store epoch in old inode body, but there is no much space
> > for this in the body and with DMU we''ll have this problem
> > again.
> > 
> > given the minimal inode size we use on MDS is 512 bytes, we
> > can store upto 13 stripes in the body. larger EAs go to a
> > dedicated block. if we add 8 byte epoch, then we can store
> > upto 12 stripes in the body. so, epoch stored in EA affects
> > only files with exactly 13 stripes. files with different
> > stripes are unaffected at all.
> > 
> > couple lesser concerns are:
> > 1) cpu usage
> > 2) epoch on old filesystem with insufficient inode space
> > 
> > any objections to use EA to store SOM epoch?
> > 
> > thanks, Alex
> > _______________________________________________
> > Lustre-devel mailing list
> > Lustre-devel at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-devel
> > 
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Vitaly Fertman

2008-Feb-19 20:33 UTC

head link

[Lustre-devel] on-disk SOM attributes [former storing SOM epoch in EA]

Hi All,

Besides the question Alex asked, there are some more issues
I would like to discuss, so let me list all of them here.

1) where to store on-disk IOepoch on MDS -- this question
was described in the Alex''s initial "storing SOM epoch in EA"
email, so I will not repeat it here.

2) where to store SOM-ENABLE flag in inode?
currently it is stored in inode flags, but it may be not
acceptable for DMU. If so, we will probably need to move it
to the place we will store on-disk IOepoch in (EA?).

I also want to mention that on-disk IOepoch is needed at the
attribute update time only, to be sure we write newer attributes.
Whereas SOM-ENABLE flag is needed more often, thus it is also
checked when we tell a client size is valid at getattr.

3) how to avoid e2fsck zeroing i_blocks on MDS?
we could patch e2fsck, or alternatively store i_blocks copy
in inode that fsck does not know about, e.g. in the same EA.

As i_blocks is needed on each getattr, it is worth to store
it along with SOM-ENABLE flag.

Please advise.

-- 
Vitaly

Nikita Danilov

2008-Feb-20 16:10 UTC

head link

[Lustre-devel] storing SOM epoch in EA

Alex Zhuravlev writes:
 > Nikita Danilov wrote:
 > > One possible point here is that OSD has to do fid->ino translation
 > > anyway, and it makes sense to use the same index to store other
 > > information besides inode number. That is, we can use "object
index" iam
 > > file to map fid into (ino, gen, ioepoch, LOV, ...) records, and this
 > > would not cause any additional seeks. The downside here is that
object
 > > index is so heavily used, that making its records larger is going to
 > > increase the amount of IO significantly, so it only worth to place
there
 > > things that we are absolutely sure will be needed for every inode.
 > 
 > well, we also need to update ioepoch when we update size/blocks.

Indeed. We still can do this for read-only and read-mostly attributes,
like LOV and avoid dirtying extra blocks in the common case.

 > 
 > thanks, Alex

Nikita.

Maybe Matching Threads

Search for more possibly parallel threads

Lustre devel - Feb 2008 - storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] storing SOM epoch in EA

[Lustre-devel] on-disk SOM attributes [former storing SOM epoch in EA]

[Lustre-devel] storing SOM epoch in EA

Maybe Matching Threads