Hello!
    Attached you can find a ''high level design'' of group locks
feature present
    in Lustre. It includes information about what group locks are, their
    semantics and ways you can access this feature.
    In short, group locks feature allows a cerain group of processes (on many
    nodes) to lock files for exlusive access by that group only.
Bye,
    Oleg
-------------- next part --------------
High Level Design for group locks.
0.1 Functionality specification
Possibility to lock files (inodes) for exclusive access 
by only processes belonging to a logical group of processes.
0.2 Logic specificatioN
To support certain HPC installations, Lustre supports a 
group I/O lock. The semantics of the lock are as follows:
1. All processes in a group of cooperating processes:
  (a) the processes share a group id, a 32 bit integer, 
    which is generated in a way outside of the scope of 
    this document.
  (b) mark the file as not requiring normal extent 
    locks, and mark the file descriptor (as "usual") as 
    blocking or non blocking.
  (c) take a concurrent GROUP lock on a [0,EOF] extent 
    associated with a file. The concurrent GROUP lock 
    is passed the group id.
  (d) explicitly release this lock when done with their 
    I/O, preceeded by a flush of cached data.
  (e) when the file is closed, deliberately or through 
    exit, the group locks are dropped
2. Readers/writers on other nodes take [a,b]R/W locks 
  which cannot be granted when group locks are present. 
  Such readers can receive: 
  (a) can be made to wait forever, interruptably. This 
    is good for blocking file descriptors.
  (b) can get -EWOULDBLOCK, this is good for file 
    descriptors that have been marked as non-blocking.
  (c) group enqueues with a different group id must 
    wait for the current group and PR/PW locks to be released.
In case (a) this behavior causes further group locks to 
have to wait until the read is satisfied. This is not 
desirable, so we will let group locks jump over the 
waiting lists if other group locks have already been granted.
0.3 State management
This lock can be held for unspecified amount of time by 
a client, so usual lock revocations timeouts are not 
applicable to these locks.
0.4 Protocol, APIs, disk format
New lock mode LCK_GROUP is added to support such a 
locking mode. This lock mode is only used for EXTENT 
locks. Access to this sort of locks is possible through ioctls:
* LL_IOC_GROUP_LOCK would get a group lock on a file. 
  ioctl''s arg argument reperesents 32bit "group id".
* LL_IOC_GROUP_UNLOCK woulo release a group lock 
  previously granted on a file. ioctl''s arg argument 
  represents 32bit "group id" and should match that used 
  at LL_IOC_GROUP_LOCK time.
0.5 Scalability and performance
It is believed that certain processes using this sort 
of locks will see speed burst, because several nodes 
can read and write the file at the same time without 
any locks bouncing around. Applications should be 
specially written to get use of this feature and to 
avoid any possible races.
0.6 Recovery
No implications.
0.7 Alternatives
None known.
0.8 Concerns
If a node holding such a lock would die, normal access 
to a file locked by this lock would be stalled until 
the node is evicted.
-------------- next part --------------
#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass newbook
\begin_preamble
\usepackage{fancyhdr}
\usepackage{array}
\usepackage{epsfig}
\usepackage{applegar}
\usepackage{makeidx}
\usepackage{multicol}
\usepackage{longtable}
\usepackage{listings}
\usepackage{color}
\usepackage{coz}
\setlength{\parindent}{0pt}
\parskip 5pt
\newcommand{\tm}{\symbol{''252}}
\newcommand{\rt}{\symbol{''250}}
\newcommand{\cpr}{\symbol{''251}}
\newcommand{\gt}{\symbol{''074}}
\newcommand{\lt}{\symbol{''076}}
\newcommand{\verbar}{\symbol{''174}}
\newcommand{\hdr}[1]{{\bf #1.\ }}
\newcommand{\dbs}{$\backslash$}
\newcommand{\centre}[1]{ \begin{center} #1 \end{center}}
\newcommand{\cfs}{Cluster File System}
\newcommand{\WS}{IWS}
\newenvironment{tscreen}%
 {\begin{quote}\bgroup\small\tt}%
 {\egroup\end{quote}}
\newenvironment{summarybox}[1]{\framebox{{\bf
#1}}\penalty500\begin{enumerate}}{\end{enumerate}}
\renewcommand{\chaptermark}[1]{\markboth{\sf \bf \thechapter\  #1}{}}
\renewcommand{\sectionmark}[1]{\markright{\bfseries #1 \ \thesection}}
\lhead[]{\leftmark}
\rhead[\rightmark]{}
\setlength{\unitlength}{18mm}
\newcommand{\blob}{\rule[-.2\unitlength]{2\unitlength}{.5\unitlength}}
\newcommand\rblob{\thepage
  \begin{picture}(0,0)
    \put(.25,-\value{chapter}){\blob}
  \end{picture}}
\newcommand\lblob{%
  \begin{picture}(0,0)
    \put(-3,-\value{chapter}){\blob}
  \end{picture}%
  \thepage}
\newcounter{line}
\newcommand{\secname}[1]{\addtocounter{line}{1}%
  \put(1,-\value{line}){\blob}
  \put(-7.5,-\value{line}){\Large \arabic{line}}
  \put(-7,-\value{line}){\Large #1}}
\newcommand{\overview}{\thepage
  \begin{picture}(0,0)
    \secname{Introduction}
    \secname{The first year}
    \secname{Specialisation}
  \end{picture}}
\newcounter{itemnum}
\renewenvironment{enumerate}{\begin{list}{{\bf \arabic{itemnum}. }} {
\usecounter{itemnum}
\setlength{\labelwidth}{0.4cm}
\setlength{\labelsep}{0.25cm}
\setlength{\leftmargin}{0.65cm}
\setlength{\rightmargin}{1.0cm}
\setlength{\itemsep}{1pt}
\setlength{\parsep}{3pt}
\setlength{\itemindent}{0pt}
\setlength{\listparindent}{0pt}
\setlength{\topsep}{0.5ex} }}
 {\end{list}}
\renewenvironment{itemize}{\begin{list}{\rule{0.15cm}{0.15cm}}{
\setlength{\labelwidth}{0.25cm}
\setlength{\labelsep}{0.25cm}
\setlength{\leftmargin}{0.65cm}
\setlength{\rightmargin}{1.0cm}
\setlength{\itemsep}{1pt}
\setlength{\parsep}{3pt}
\setlength{\itemindent}{0pt}
\setlength{\listparindent}{0pt}
\setlength{\topsep}{0.5ex}}}
{\end{list}}
\makeindex
\newcommand{\lst}[2] {
        \noindent\rule[-0.3mm]{\textwidth}{0.3mm}\vspace{-0.3mm}
        \lstinputlisting[caption={#2},
        label={#1},
        showstringspaces=false, 
        numbers=left, 
        stepnumber=1,
        frame=bottomline,
        extendedchars=true,
        basicstyle=\small\tt,
        numberstyle=\tiny,
        keywordstyle=\color{red},
        language=C,
        emph={1, 2, 3, 4, 5, 6, 7, 8, 9, 0, NULL, lustre, CFS},
        emphstyle=\color{blue},
        commentstyle=\color{cyan},
        stringstyle=\color{green},
        directivestyle=\color{magenta}, 
        breaklines=true]{#1}
        \vspace{0.3mm}       
}
\end_preamble
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\spacing single 
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle fancy
\layout Title
High Level Design for group locks.
\layout Section
Functionality specification
\layout Standard
Possibility to lock files (inodes) for exclusive access by only processes
 belonging to a logical group of processes.
\layout Section
Logic specificatioN
\layout Standard
To support certain HPC installations, Lustre supports a group I/O lock.
 The semantics of the lock are as follows:
\layout Enumerate
All processes in a group of cooperating processes:
\begin_deeper 
\layout Enumerate
the processes share a group id, a 32 bit integer, which is generated in
 a way outside of the scope of this document.
\layout Enumerate
mark the file as not requiring normal extent locks, and mark the file descriptor
 (as 
\begin_inset Quotes eld
\end_inset 
usual
\begin_inset Quotes erd
\end_inset 
) as blocking or non blocking.
\layout Enumerate
take a concurrent GROUP lock on a [0,EOF] extent associated with a file.
 The concurrent GROUP lock is passed the group id.
\layout Enumerate
explicitly release this lock when done with their I/O, preceeded by a flush
 of cached data.
\layout Enumerate
when the file is closed, deliberately or through exit, the group locks are
 dropped
\end_deeper 
\layout Enumerate
Readers/writers on other nodes take [a,b]R/W locks which cannot be granted
 when group locks are present.
 Such readers can receive: 
\begin_deeper 
\layout Enumerate
can be made to wait forever, interruptably.
 This is good for blocking file descriptors.
\layout Enumerate
can get -EWOULDBLOCK, this is good for file descriptors that have been marked
 as non-blocking.
\layout Enumerate
group enqueues with a different group id must wait for the current group
 and PR/PW locks to be released.
\end_deeper 
\layout Standard
In case (a) this behavior causes further group locks to have to wait until
 the read is satisfied.
 This is not desirable, so we will let group locks jump over the waiting
 lists if other group locks have already been granted.
\layout Section
State management
\layout Standard
This lock can be held for unspecified amount of time by a client, so usual
 lock revocations timeouts are not applicable to these locks.
\layout Section
Protocol, APIs, disk format
\layout Standard
New lock mode LCK_GROUP is added to support such a locking mode.
 This lock mode is only used for EXTENT locks.
 Access to this sort of locks is possible through ioctls:
\layout Itemize
LL_IOC_GROUP_LOCK would get a group lock on a file.
 ioctl''s arg argument reperesents 32bit 
\begin_inset Quotes eld
\end_inset 
group id
\begin_inset Quotes erd
\end_inset 
.
\layout Itemize
LL_IOC_GROUP_UNLOCK woulo release a group lock previously granted on a file.
 ioctl''s arg argument represents 32bit 
\begin_inset Quotes eld
\end_inset 
group id
\begin_inset Quotes erd
\end_inset 
 and should match that used at LL_IOC_GROUP_LOCK time.
\layout Section
Scalability and performance
\layout Standard
It is believed that certain processes using this sort of locks will see
 speed burst, because several nodes can read and write the file at the same
 time without any locks bouncing around.
 Applications should be specially written to get use of this feature and
 to avoid any possible races.
\layout Section
Recovery
\layout Standard
No implications.
\layout Section
Alternatives
\layout Standard
None known.
\layout Section
Concerns
\layout Standard
If a node holding such a lock would die, normal access to a file locked
 by this lock would be stalled until the node is evicted.
\the_end