thr3ads.net - Lustre discuss - [Lustre-discuss] diskless booting & Debian [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Troy Benjegerdes

2008-Aug-12 17:53 UTC

[Lustre-discuss] diskless booting & Debian

I see a relatively blank diskless booting wiki page at

http://wiki.lustre.org/index.php?title=Diskless_Booting

Are there any details on this, other than "University of Colorado at
Boulder has performed interesting work?"

I have a couple of interests here.. 

1) compare lustre netboot to AFS netboot
2) install Lustre MDS & ODS servers on Debian booted from AFS.
3) understand enough about Lustre to know if netbooting a Lustre server
   from a lustre filesystem is completely wacked, or not.

FYI, I haven''t tried it yet, but I''d be comfortable booting an
AFS
server from a replicated AFS read-only volume as the root filesystem..
I believe I understand the failure modes of AFS well enough that this
would work.

Jeff Darcy

2008-Aug-12 18:18 UTC

head link

[Lustre-discuss] diskless booting & Debian

Troy Benjegerdes wrote:> I see a relatively blank diskless booting wiki page at
>
> http://wiki.lustre.org/index.php?title=Diskless_Booting
>
> Are there any details on this, other than "University of Colorado at
> Boulder has performed interesting work?"
>
> I have a couple of interests here.. 
>
> 1) compare lustre netboot to AFS netboot
> 2) install Lustre MDS & ODS servers on Debian booted from AFS.
> 3) understand enough about Lustre to know if netbooting a Lustre server
>    from a lustre filesystem is completely wacked, or not.
>
> FYI, I haven''t tried it yet, but I''d be comfortable
booting an AFS
> server from a replicated AFS read-only volume as the root filesystem..
> I believe I understand the failure modes of AFS well enough that this
> would work.
>   We''ve actually experimented a bit with this, since most of the nodes in
our system are disk-less.  In fact, they''re -less just about everything
except for processors, memory, and a connection to our internal fabric. 
We''ve tried loading "canned" MDT/OST images into memory on
some nodes
and serving from there, and it does seem to work.  There are two
downsides, though.  One is that the Linux loopback driver is a real
performance bottleneck, ever since some bright person had the idea to
make it less multi-threaded than it had been.  Another is that booting
tends to involve metadata-heavy access patterns which are not exactly
Lustre''s strength - a situation made worse when you have nearly a
thousand clients doing it at the same time and your MDS is a relatively
small node like the others.  So far we''ve found that NBD serves us
better in the boot/root filesystem role, though that means a read-only
root which involves its own complexity.  Your mileage will almost
certainly vary.

Troy Benjegerdes

2008-Aug-12 19:39 UTC

head link

[Lustre-discuss] diskless booting & Debian

> We''ve actually experimented a bit with this, since most of the
nodes in
> our system are disk-less.  In fact, they''re -less just about
everything
> except for processors, memory, and a connection to our internal fabric. 
> We''ve tried loading "canned" MDT/OST images into memory
on some nodes
> and serving from there, and it does seem to work.  There are two
> downsides, though.  One is that the Linux loopback driver is a real
> performance bottleneck, ever since some bright person had the idea to
> make it less multi-threaded than it had been.  Another is that booting
> tends to involve metadata-heavy access patterns which are not exactly
> Lustre''s strength - a situation made worse when you have nearly a
> thousand clients doing it at the same time and your MDS is a relatively
> small node like the others.  So far we''ve found that NBD serves us
> better in the boot/root filesystem role, though that means a read-only
> root which involves its own complexity.  Your mileage will almost
> certainly vary.
Does the metadata updates problem go away in a read-only root
environment? That makes life a lot easier. I don''t want to have some
random node that''s either hacked or out to lunch be able to make
changes to
the filesystem.

When I used NFS root, we had one node that was allowed to write, and all
the rest were read-only mounts. I find AFS much more convenient since I
can log in as root, and authenticate to the filesystem from any node to
make changes.

Daire Byrne

2008-Aug-13 08:51 UTC

head link

[Lustre-discuss] diskless booting & Debian

Jeff/Troy,

----- "Jeff Darcy" <jeffd at sicortex.com> wrote:
> We''ve tried loading "canned" MDT/OST images into memory
on some nodes
> and serving from there, and it does seem to work.  There are two
> downsides, though.  One is that the Linux loopback driver is a real
> performance bottleneck, ever since some bright person had the idea to
> make it less multi-threaded than it had been.  Another is that
> booting tends to involve metadata-heavy access patterns which are not
exactly
> Lustre''s strength - a situation made worse when you have nearly a
> thousand clients doing it at the same time and your MDS is a
> relatively small node like the others.  So far we''ve found that
NBD serves us
> better in the boot/root filesystem role, though that means a
> read-only root which involves its own complexity.  Your mileage will almost
> certainly vary.
A good trick with Lustre to get around the metadata bottleneck is to use disk
image files on Lustre (e.g. SquashFS) and mount them using the loopback driver
on each compute node (or "lctl attach_device" ?). So instead of having
to bother the MDS you need only seek through the file on an OST. By either
striping the read-only image across all your OSTs or having a round-robin image
per OST you can get pretty good scalability.

I tried this with our 700 node compute cluster but to be honest the overall
booting performance was not that different to a couple of NFS servers serving a
read-only root so it was not really worth the extra complexity in the end.

We do still use SquashFS on Lustre from time to time when we have a directory
tree with 30,000 small files in it that needs to be read by every farm machine.
It''s rare but it does happen and traditionally NFS does much better
with such workloads.

Daire

Lustre discuss - Aug 2008 - diskless booting & Debian

[Lustre-discuss] diskless booting & Debian

[Lustre-discuss] diskless booting & Debian

[Lustre-discuss] diskless booting & Debian

[Lustre-discuss] diskless booting & Debian