thr3ads.net - Lustre discuss - [Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Nochum Klein

2010-Jan-08 05:12 UTC

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

Hi Everyone,

Apologies for what is likely a simple question for anyone who has been
working with Lustre for a while. I am evaluating Lustre as part of a
fault-tolerant failover solution for an application component. Based on our
design using heartbeats between the hot primary and warm secondary
components, we have four basic requirements of the clustered file system:

1. *Write Order *- The storage solution must write data blocks to shared
storage in the same order as they occur in the data buffer. Solutions that
write data blocks in any other order (for example, to enhance disk
efficiency) do not satisfy this requirement.
2. *Synchronous Write Persistence* - Upon return from a synchronous
write call, the storage solution guarantees that all the data have been
written to durable, persistent storage.
3. *Distributed File Locking* - Application components must be able to
request and obtain an exclusive lock on the shared storage. The storage
solution must not assign the locks to two servers simultaneously.
4. *Unique Write Ownership* - The application component that has the
file lock must be the only server process that can write to the file. Once
the system transfers the lock to another server, pending writes queued by
the previous owner must fail.

Can anyone confirm that these requirements would be met by Lustre 1.8?

Thanks a lot!

Nochum
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100108/deac861b/attachment.html

Atul Vidwansa

2010-Jan-08 06:30 UTC

head link

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

Some comments inline..

Nochum Klein wrote:> Hi Everyone,
>  
> Apologies for what is likely a simple question for anyone who has been 
> working with Lustre for a while.  I am evaluating Lustre as part of a 
> fault-tolerant failover solution for an application component.  Based 
> on our design using heartbeats between the hot primary and warm 
> secondary components, we have four basic requirements of the clustered 
> file system:
>  
>
>    1. *Write Order *- The storage solution must write data blocks to
>       shared storage in the same order as they occur in the data
>       buffer.  Solutions that write data blocks in any other order
>       (for example, to enhance disk efficiency) do not satisfy this
>       requirement.
>    2.  *Synchronous Write Persistence* - Upon return from a
>       synchronous write call, the storage solution guarantees that all
>       the data have been written to durable, persistent storage.
>    3.  *Distributed File Locking* - Application components must be
>       able to request and obtain an exclusive lock on the shared
>       storage. The storage solution must not assign the locks to two
>       servers simultaneously.
>
AFAIK Lustre does support distributed locking. From wiki.lustre.org:

    * /flock/lockf/

    POSIX and BSD /flock/lockf/ system calls will be completely coherent
    across the cluster, using the Lustre lock manager, but are not
    enabled by default today. It is possible to enable client-local
    /flock/ locking with the /-o localflock/ mount option, or
    cluster-wide locking with the /-o flock/ mount option. If/when this
    becomes the default, it is also possible to disable /flock/ for a
    client with the /-o noflock/ mount option. 

>    1.  *Unique Write Ownership* - The application component that has
>       the file lock must be the only server process that can write to
>       the file. Once the system transfers the lock to another server,
>       pending writes queued by the previous owner must fail.
>
It depends on what level of locking you do. Lustre supports byte-range 
locking, so unless writes overlap, multiple writers can write to same file.

Cheers,
_Atul>
>   1.
>
>
>  
> Can anyone confirm that these requirements would be met by Lustre 1.8?
>  
> Thanks a lot!
>  
> Nochum
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Nochum Klein

2010-Jan-08 13:55 UTC

head link

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

Hi Atul,

Thanks a lot -- this is very helpful!

So assuming the application is performing the following fcntl() call to set
a file segment lock:

    struct flock fl;
    int err;

    fl.l_type = F_WRLCK;
    fl.l_whence = 0;
    fl.l_start = 0;
    fl.l_len = 0;    /* len = 0 means until end of file */

    err = fcntl(file, F_SETLK, &fl);

I should be able to achieve the desired behavior if I enable cluster-wide
locking with the /-o flock/ mount option.  Is this correct?

Thanks again!

Nochum

---------- Original message ----------
From: Atul Vidwansa <Atul.Vidwa... at Sun.COM>
Date: Jan 8, 1:30 am
Subject: Newbie question: File locking, synchronicity, order, and ownership
To: lustre-discuss-list


Some comments inline..





Nochum Klein wrote:> Hi Everyone,
> Apologies for what is likely a simple question for anyone who has been
> working with Lustre for a while.  I am evaluating Lustre as part of a
> fault-tolerant failover solution for an application component.  Based
> on our design using heartbeats between the hot primary and warm
> secondary components, we have four basic requirements of the clustered
> file system:
>    1. *Write Order *- The storage solution must write data blocks to
>       shared storage in the same order as they occur in the data
>       buffer.  Solutions that write data blocks in any other order
>       (for example, to enhance disk efficiency) do not satisfy this
>       requirement.
>    2.  *Synchronous Write Persistence* - Upon return from a
>       synchronous write call, the storage solution guarantees that all
>       the data have been written to durable, persistent storage.
>    3.  *Distributed File Locking* - Application components must be
>       able to request and obtain an exclusive lock on the shared
>       storage. The storage solution must not assign the locks to two
>       servers simultaneously.
AFAIK Lustre does support distributed locking. From wiki.lustre.org:

    * /flock/lockf/

    POSIX and BSD /flock/lockf/ system calls will be completely coherent
    across the cluster, using the Lustre lock manager, but are not
    enabled by default today. It is possible to enable client-local
    /flock/ locking with the /-o localflock/ mount option, or
    cluster-wide locking with the /-o flock/ mount option. If/when this
    becomes the default, it is also possible to disable /flock/ for a
    client with the /-o noflock/ mount option.
>    1.  *Unique Write Ownership* - The application component that has
>       the file lock must be the only server process that can write to
>       the file. Once the system transfers the lock to another server,
>       pending writes queued by the previous owner must fail.
It depends on what level of locking you do. Lustre supports byte-range
locking, so unless writes overlap, multiple writers can write to same file.

Cheers,
_Atul






>   1.
> Can anyone confirm that these requirements would be met by Lustre 1.8?
> Thanks a lot!
> Nochum
> ------------------------------------------------------------------------
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at lists.lustre.org
>http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100108/3d9d7351/attachment.html

Atul Vidwansa

2010-Jan-11 04:30 UTC

head link

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

Comments inline...

Nochum Klein wrote:>
> Hi Atul,
>
> Thanks a lot -- this is very helpful!
>
> So assuming the application is performing the following fcntl() call 
> to set a file segment lock:
>
>     struct flock fl;
>     int err;
>
>     fl.l_type = F_WRLCK;
>     fl.l_whence = 0;
>     fl.l_start = 0;
>     fl.l_len = 0;    /* len = 0 means until end of file */
>
>     err = fcntl(file, F_SETLK, &fl);
>
> I should be able to achieve the desired behavior
>
What is your desired behavior ?>
> if I enable cluster-wide locking with the /-o flock/ mount option.  Is 
> this correct?
>Is your application writing to same file from multiple nodes? If yes, do 
writes from different nodes overlap? Above piece of code will work fine 
if each node is writing to its own file OR multiple nodes are writing to 
different sections of the file. Otherwise, it will result in lock pingpong.

Cheers,
_Atul>
> Thanks again!
>
> Nochum
>
> ---------- Original message ----------
> From: Atul Vidwansa <Atul.Vidwa... at Sun.COM <mailto:Atul.Vidwa...
at Sun.COM>>
> Date: Jan 8, 1:30 am
> Subject: Newbie question: File locking, synchronicity, order, and 
> ownership
> To: lustre-discuss-list
>
>
> Some comments inline..
>
>  
>
>  
>
> Nochum Klein wrote:
> > Hi Everyone,
>
> > Apologies for what is likely a simple question for anyone who has been
> > working with Lustre for a while.  I am evaluating Lustre as part of a
> > fault-tolerant failover solution for an application component.  Based
> > on our design using heartbeats between the hot primary and warm
> > secondary components, we have four basic requirements of the clustered
> > file system:
>
> >    1. *Write Order *- The storage solution must write data blocks to
> >       shared storage in the same order as they occur in the data
> >       buffer.  Solutions that write data blocks in any other order
> >       (for example, to enhance disk efficiency) do not satisfy this
> >       requirement.
> >    2.  *Synchronous Write Persistence* - Upon return from a
> >       synchronous write call, the storage solution guarantees that all
> >       the data have been written to durable, persistent storage.
> >    3.  *Distributed File Locking* - Application components must be
> >       able to request and obtain an exclusive lock on the shared
> >       storage. The storage solution must not assign the locks to two
> >       servers simultaneously.
>
> AFAIK Lustre does support distributed locking. From wiki.lustre.org 
> <http://wiki.lustre.org>:
>
>     * /flock/lockf/
>
>     POSIX and BSD /flock/lockf/ system calls will be completely coherent
>     across the cluster, using the Lustre lock manager, but are not
>     enabled by default today. It is possible to enable client-local
>     /flock/ locking with the /-o localflock/ mount option, or
>     cluster-wide locking with the /-o flock/ mount option. If/when this
>     becomes the default, it is also possible to disable /flock/ for a
>     client with the /-o noflock/ mount option.
>
> >    1.  *Unique Write Ownership* - The application component that has
> >       the file lock must be the only server process that can write to
> >       the file. Once the system transfers the lock to another server,
> >       pending writes queued by the previous owner must fail.
>
> It depends on what level of locking you do. Lustre supports byte-range
> locking, so unless writes overlap, multiple writers can write to same 
> file.
>
> Cheers,
> _Atul
>
>  
>
>  
>
>  
>
> >   1.
>
> > Can anyone confirm that these requirements would be met by Lustre 1.8?
>
> > Thanks a lot!
>
> > Nochum
> >
------------------------------------------------------------------------
>
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-disc... at lists.lustre.org <mailto:Lustre-disc... at
lists.lustre.org>
> >http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss
> <mailto:Lustre-disc... at
lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Nochum Klein

2010-Jan-11 13:32 UTC

head link

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

Hi Atul,

Thanks, I appreciate your help.  I apologize for not being clearer about my
intent.  The goal is to implement a fault-tolerant pair of application
component instances in a hot-warm configuration where the primary and
secondary instances exchange heartbeats to determine the health/availability
of their partner.  The partners would be on different nodes.  While the
primary instance is running it should have exclusive write access to the
application state.  If the configured number of heratbeats are missed, the
secondary component instance will try to retrieve the lock on
the application state (thereby becoming primary).  Given that networks are
often unreliable the design goal is that the clustered file system should
ensure that the secondary instance does not assume primary role while the
actual primary instance is still alive when a network disruption has
occurred.  So in a sense a controlled pingpong is actually the desired
effect (where the secondary and primary instances change roles whenever the
current primary instance fails).  Am I correct that the configuration
referenced below could support this behavior?

Thanks again for your help and your patience!

Nochum

On Sun, Jan 10, 2010 at 10:30 PM, Atul Vidwansa <Atul.Vidwansa at
sun.com>wrote:
> Comments inline...
>
> Nochum Klein wrote:
>
>>
>> Hi Atul,
>>
>> Thanks a lot -- this is very helpful!
>>
>> So assuming the application is performing the following fcntl() call to
>> set a file segment lock:
>>
>>    struct flock fl;
>>    int err;
>>
>>    fl.l_type = F_WRLCK;
>>    fl.l_whence = 0;
>>    fl.l_start = 0;
>>    fl.l_len = 0;    /* len = 0 means until end of file */
>>
>>    err = fcntl(file, F_SETLK, &fl);
>>
>> I should be able to achieve the desired behavior
>>
>> What is your desired behavior ?
>
>
>> if I enable cluster-wide locking with the /-o flock/ mount option.  Is
>> this correct?
>>
>> Is your application writing to same file from multiple nodes? If yes,
do
> writes from different nodes overlap? Above piece of code will work fine if
> each node is writing to its own file OR multiple nodes are writing to
> different sections of the file. Otherwise, it will result in lock pingpong.
>
> Cheers,
> _Atul
>
>>
>> Thanks again!
>>
>> Nochum
>>
>> ---------- Original message ----------
>>  From: Atul Vidwansa <Atul.Vidwa... at Sun.COM
<mailto:Atul.Vidwa... at Sun.COM
>> >>
>> Date: Jan 8, 1:30 am
>> Subject: Newbie question: File locking, synchronicity, order, and
>> ownership
>> To: lustre-discuss-list
>>
>>
>> Some comments inline..
>>
>>
>>
>> Nochum Klein wrote:
>> > Hi Everyone,
>>
>> > Apologies for what is likely a simple question for anyone who has
been
>> > working with Lustre for a while.  I am evaluating Lustre as part
of a
>> > fault-tolerant failover solution for an application component. 
Based
>> > on our design using heartbeats between the hot primary and warm
>> > secondary components, we have four basic requirements of the
clustered
>> > file system:
>>
>> >    1. *Write Order *- The storage solution must write data blocks
to
>> >       shared storage in the same order as they occur in the data
>> >       buffer.  Solutions that write data blocks in any other order
>> >       (for example, to enhance disk efficiency) do not satisfy
this
>> >       requirement.
>> >    2.  *Synchronous Write Persistence* - Upon return from a
>> >       synchronous write call, the storage solution guarantees that
all
>> >       the data have been written to durable, persistent storage.
>> >    3.  *Distributed File Locking* - Application components must be
>> >       able to request and obtain an exclusive lock on the shared
>> >       storage. The storage solution must not assign the locks to
two
>> >       servers simultaneously.
>>
>> AFAIK Lustre does support distributed locking. From wiki.lustre.org
<
>> http://wiki.lustre.org>:
>>
>>
>>    * /flock/lockf/
>>
>>    POSIX and BSD /flock/lockf/ system calls will be completely coherent
>>    across the cluster, using the Lustre lock manager, but are not
>>    enabled by default today. It is possible to enable client-local
>>    /flock/ locking with the /-o localflock/ mount option, or
>>    cluster-wide locking with the /-o flock/ mount option. If/when this
>>    becomes the default, it is also possible to disable /flock/ for a
>>    client with the /-o noflock/ mount option.
>>
>> >    1.  *Unique Write Ownership* - The application component that
has
>> >       the file lock must be the only server process that can write
to
>> >       the file. Once the system transfers the lock to another
server,
>> >       pending writes queued by the previous owner must fail.
>>
>> It depends on what level of locking you do. Lustre supports byte-range
>> locking, so unless writes overlap, multiple writers can write to same
>> file.
>>
>> Cheers,
>> _Atul
>>
>>
>>
>>
>> >   1.
>>
>> > Can anyone confirm that these requirements would be met by Lustre
1.8?
>>
>> > Thanks a lot!
>>
>> > Nochum
>> >
------------------------------------------------------------------------
>>
>> > _______________________________________________
>> > Lustre-discuss mailing list
>> > Lustre-disc... at lists.lustre.org <mailto:Lustre-disc... at
lists.lustre.org>
>>
>>
>> >http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-disc... at lists.lustre.orghttp://
>> lists.lustre.org/mailman/listinfo/lustre-discuss
<mailto:Lustre-disc...
>>
@lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss>
>>
>>
------------------------------------------------------------------------
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>>
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100111/7ccd2360/attachment.html

Nicolas Williams

2010-Jan-11 19:10 UTC

head link

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

On Mon, Jan 11, 2010 at 07:32:24AM -0600, Nochum Klein
wrote:> Thanks, I appreciate your help.  I apologize for not being clearer about my
> intent.  The goal is to implement a fault-tolerant pair of application
> component instances in a hot-warm configuration where the primary and
> secondary instances exchange heartbeats to determine the
health/availability
> of their partner.  The partners would be on different nodes.  While the
> primary instance is running it should have exclusive write access to the
> application state.  If the configured number of heratbeats are missed, the
> secondary component instance will try to retrieve the lock on
> the application state (thereby becoming primary).  Given that networks are
> often unreliable the design goal is that the clustered file system should
> ensure that the secondary instance does not assume primary role while the
> actual primary instance is still alive when a network disruption has
> occurred.  So in a sense a controlled pingpong is actually the desired
> effect (where the secondary and primary instances change roles whenever the
> current primary instance fails).  Am I correct that the configuration
> referenced below could support this behavior?
If the client running the primary dies, eventually it will be evicted
from the cluster, its locks will be dropped, and the secondary will be
able to take over.

If the application running on the primary hangs while holding the lock,
then the secondary will not be able to take over.

I would recommend implementing your own lock system.  A simple lockfile,
opened with O_EXCL|O_CREAT should suffice.

Nico
--

Lustre discuss - Jan 2010 - Newbie question: File locking, synchronicity, order, and ownership

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership

[Lustre-discuss] Newbie question: File locking, synchronicity, order, and ownership