Hello!
Attached you can find a ''high level design'' of group locks
feature present
in Lustre. It includes information about what group locks are, their
semantics and ways you can access this feature.
In short, group locks feature allows a cerain group of processes (on many
nodes) to lock files for exlusive access by that group only.
Bye,
Oleg
-------------- next part --------------
High Level Design for group locks.
0.1 Functionality specification
Possibility to lock files (inodes) for exclusive access
by only processes belonging to a logical group of processes.
0.2 Logic specificatioN
To support certain HPC installations, Lustre supports a
group I/O lock. The semantics of the lock are as follows:
1. All processes in a group of cooperating processes:
(a) the processes share a group id, a 32 bit integer,
which is generated in a way outside of the scope of
this document.
(b) mark the file as not requiring normal extent
locks, and mark the file descriptor (as "usual") as
blocking or non blocking.
(c) take a concurrent GROUP lock on a [0,EOF] extent
associated with a file. The concurrent GROUP lock
is passed the group id.
(d) explicitly release this lock when done with their
I/O, preceeded by a flush of cached data.
(e) when the file is closed, deliberately or through
exit, the group locks are dropped
2. Readers/writers on other nodes take [a,b]R/W locks
which cannot be granted when group locks are present.
Such readers can receive:
(a) can be made to wait forever, interruptably. This
is good for blocking file descriptors.
(b) can get -EWOULDBLOCK, this is good for file
descriptors that have been marked as non-blocking.
(c) group enqueues with a different group id must
wait for the current group and PR/PW locks to be released.
In case (a) this behavior causes further group locks to
have to wait until the read is satisfied. This is not
desirable, so we will let group locks jump over the
waiting lists if other group locks have already been granted.
0.3 State management
This lock can be held for unspecified amount of time by
a client, so usual lock revocations timeouts are not
applicable to these locks.
0.4 Protocol, APIs, disk format
New lock mode LCK_GROUP is added to support such a
locking mode. This lock mode is only used for EXTENT
locks. Access to this sort of locks is possible through ioctls:
* LL_IOC_GROUP_LOCK would get a group lock on a file.
ioctl''s arg argument reperesents 32bit "group id".
* LL_IOC_GROUP_UNLOCK woulo release a group lock
previously granted on a file. ioctl''s arg argument
represents 32bit "group id" and should match that used
at LL_IOC_GROUP_LOCK time.
0.5 Scalability and performance
It is believed that certain processes using this sort
of locks will see speed burst, because several nodes
can read and write the file at the same time without
any locks bouncing around. Applications should be
specially written to get use of this feature and to
avoid any possible races.
0.6 Recovery
No implications.
0.7 Alternatives
None known.
0.8 Concerns
If a node holding such a lock would die, normal access
to a file locked by this lock would be stalled until
the node is evicted.
-------------- next part --------------
#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass newbook
\begin_preamble
\usepackage{fancyhdr}
\usepackage{array}
\usepackage{epsfig}
\usepackage{applegar}
\usepackage{makeidx}
\usepackage{multicol}
\usepackage{longtable}
\usepackage{listings}
\usepackage{color}
\usepackage{coz}
\setlength{\parindent}{0pt}
\parskip 5pt
\newcommand{\tm}{\symbol{''252}}
\newcommand{\rt}{\symbol{''250}}
\newcommand{\cpr}{\symbol{''251}}
\newcommand{\gt}{\symbol{''074}}
\newcommand{\lt}{\symbol{''076}}
\newcommand{\verbar}{\symbol{''174}}
\newcommand{\hdr}[1]{{\bf #1.\ }}
\newcommand{\dbs}{$\backslash$}
\newcommand{\centre}[1]{ \begin{center} #1 \end{center}}
\newcommand{\cfs}{Cluster File System}
\newcommand{\WS}{IWS}
\newenvironment{tscreen}%
{\begin{quote}\bgroup\small\tt}%
{\egroup\end{quote}}
\newenvironment{summarybox}[1]{\framebox{{\bf
#1}}\penalty500\begin{enumerate}}{\end{enumerate}}
\renewcommand{\chaptermark}[1]{\markboth{\sf \bf \thechapter\ #1}{}}
\renewcommand{\sectionmark}[1]{\markright{\bfseries #1 \ \thesection}}
\lhead[]{\leftmark}
\rhead[\rightmark]{}
\setlength{\unitlength}{18mm}
\newcommand{\blob}{\rule[-.2\unitlength]{2\unitlength}{.5\unitlength}}
\newcommand\rblob{\thepage
\begin{picture}(0,0)
\put(.25,-\value{chapter}){\blob}
\end{picture}}
\newcommand\lblob{%
\begin{picture}(0,0)
\put(-3,-\value{chapter}){\blob}
\end{picture}%
\thepage}
\newcounter{line}
\newcommand{\secname}[1]{\addtocounter{line}{1}%
\put(1,-\value{line}){\blob}
\put(-7.5,-\value{line}){\Large \arabic{line}}
\put(-7,-\value{line}){\Large #1}}
\newcommand{\overview}{\thepage
\begin{picture}(0,0)
\secname{Introduction}
\secname{The first year}
\secname{Specialisation}
\end{picture}}
\newcounter{itemnum}
\renewenvironment{enumerate}{\begin{list}{{\bf \arabic{itemnum}. }} {
\usecounter{itemnum}
\setlength{\labelwidth}{0.4cm}
\setlength{\labelsep}{0.25cm}
\setlength{\leftmargin}{0.65cm}
\setlength{\rightmargin}{1.0cm}
\setlength{\itemsep}{1pt}
\setlength{\parsep}{3pt}
\setlength{\itemindent}{0pt}
\setlength{\listparindent}{0pt}
\setlength{\topsep}{0.5ex} }}
{\end{list}}
\renewenvironment{itemize}{\begin{list}{\rule{0.15cm}{0.15cm}}{
\setlength{\labelwidth}{0.25cm}
\setlength{\labelsep}{0.25cm}
\setlength{\leftmargin}{0.65cm}
\setlength{\rightmargin}{1.0cm}
\setlength{\itemsep}{1pt}
\setlength{\parsep}{3pt}
\setlength{\itemindent}{0pt}
\setlength{\listparindent}{0pt}
\setlength{\topsep}{0.5ex}}}
{\end{list}}
\makeindex
\newcommand{\lst}[2] {
\noindent\rule[-0.3mm]{\textwidth}{0.3mm}\vspace{-0.3mm}
\lstinputlisting[caption={#2},
label={#1},
showstringspaces=false,
numbers=left,
stepnumber=1,
frame=bottomline,
extendedchars=true,
basicstyle=\small\tt,
numberstyle=\tiny,
keywordstyle=\color{red},
language=C,
emph={1, 2, 3, 4, 5, 6, 7, 8, 9, 0, NULL, lustre, CFS},
emphstyle=\color{blue},
commentstyle=\color{cyan},
stringstyle=\color{green},
directivestyle=\color{magenta},
breaklines=true]{#1}
\vspace{0.3mm}
}
\end_preamble
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\spacing single
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle fancy
\layout Title
High Level Design for group locks.
\layout Section
Functionality specification
\layout Standard
Possibility to lock files (inodes) for exclusive access by only processes
belonging to a logical group of processes.
\layout Section
Logic specificatioN
\layout Standard
To support certain HPC installations, Lustre supports a group I/O lock.
The semantics of the lock are as follows:
\layout Enumerate
All processes in a group of cooperating processes:
\begin_deeper
\layout Enumerate
the processes share a group id, a 32 bit integer, which is generated in
a way outside of the scope of this document.
\layout Enumerate
mark the file as not requiring normal extent locks, and mark the file descriptor
(as
\begin_inset Quotes eld
\end_inset
usual
\begin_inset Quotes erd
\end_inset
) as blocking or non blocking.
\layout Enumerate
take a concurrent GROUP lock on a [0,EOF] extent associated with a file.
The concurrent GROUP lock is passed the group id.
\layout Enumerate
explicitly release this lock when done with their I/O, preceeded by a flush
of cached data.
\layout Enumerate
when the file is closed, deliberately or through exit, the group locks are
dropped
\end_deeper
\layout Enumerate
Readers/writers on other nodes take [a,b]R/W locks which cannot be granted
when group locks are present.
Such readers can receive:
\begin_deeper
\layout Enumerate
can be made to wait forever, interruptably.
This is good for blocking file descriptors.
\layout Enumerate
can get -EWOULDBLOCK, this is good for file descriptors that have been marked
as non-blocking.
\layout Enumerate
group enqueues with a different group id must wait for the current group
and PR/PW locks to be released.
\end_deeper
\layout Standard
In case (a) this behavior causes further group locks to have to wait until
the read is satisfied.
This is not desirable, so we will let group locks jump over the waiting
lists if other group locks have already been granted.
\layout Section
State management
\layout Standard
This lock can be held for unspecified amount of time by a client, so usual
lock revocations timeouts are not applicable to these locks.
\layout Section
Protocol, APIs, disk format
\layout Standard
New lock mode LCK_GROUP is added to support such a locking mode.
This lock mode is only used for EXTENT locks.
Access to this sort of locks is possible through ioctls:
\layout Itemize
LL_IOC_GROUP_LOCK would get a group lock on a file.
ioctl''s arg argument reperesents 32bit
\begin_inset Quotes eld
\end_inset
group id
\begin_inset Quotes erd
\end_inset
.
\layout Itemize
LL_IOC_GROUP_UNLOCK woulo release a group lock previously granted on a file.
ioctl''s arg argument represents 32bit
\begin_inset Quotes eld
\end_inset
group id
\begin_inset Quotes erd
\end_inset
and should match that used at LL_IOC_GROUP_LOCK time.
\layout Section
Scalability and performance
\layout Standard
It is believed that certain processes using this sort of locks will see
speed burst, because several nodes can read and write the file at the same
time without any locks bouncing around.
Applications should be specially written to get use of this feature and
to avoid any possible races.
\layout Section
Recovery
\layout Standard
No implications.
\layout Section
Alternatives
\layout Standard
None known.
\layout Section
Concerns
\layout Standard
If a node holding such a lock would die, normal access to a file locked
by this lock would be stalled until the node is evicted.
\the_end