I see a relatively blank diskless booting wiki page at http://wiki.lustre.org/index.php?title=Diskless_Booting Are there any details on this, other than "University of Colorado at Boulder has performed interesting work?" I have a couple of interests here.. 1) compare lustre netboot to AFS netboot 2) install Lustre MDS & ODS servers on Debian booted from AFS. 3) understand enough about Lustre to know if netbooting a Lustre server from a lustre filesystem is completely wacked, or not. FYI, I haven''t tried it yet, but I''d be comfortable booting an AFS server from a replicated AFS read-only volume as the root filesystem.. I believe I understand the failure modes of AFS well enough that this would work.
Troy Benjegerdes wrote:> I see a relatively blank diskless booting wiki page at > > http://wiki.lustre.org/index.php?title=Diskless_Booting > > Are there any details on this, other than "University of Colorado at > Boulder has performed interesting work?" > > I have a couple of interests here.. > > 1) compare lustre netboot to AFS netboot > 2) install Lustre MDS & ODS servers on Debian booted from AFS. > 3) understand enough about Lustre to know if netbooting a Lustre server > from a lustre filesystem is completely wacked, or not. > > FYI, I haven''t tried it yet, but I''d be comfortable booting an AFS > server from a replicated AFS read-only volume as the root filesystem.. > I believe I understand the failure modes of AFS well enough that this > would work. >We''ve actually experimented a bit with this, since most of the nodes in our system are disk-less. In fact, they''re -less just about everything except for processors, memory, and a connection to our internal fabric. We''ve tried loading "canned" MDT/OST images into memory on some nodes and serving from there, and it does seem to work. There are two downsides, though. One is that the Linux loopback driver is a real performance bottleneck, ever since some bright person had the idea to make it less multi-threaded than it had been. Another is that booting tends to involve metadata-heavy access patterns which are not exactly Lustre''s strength - a situation made worse when you have nearly a thousand clients doing it at the same time and your MDS is a relatively small node like the others. So far we''ve found that NBD serves us better in the boot/root filesystem role, though that means a read-only root which involves its own complexity. Your mileage will almost certainly vary.
> We''ve actually experimented a bit with this, since most of the nodes in > our system are disk-less. In fact, they''re -less just about everything > except for processors, memory, and a connection to our internal fabric. > We''ve tried loading "canned" MDT/OST images into memory on some nodes > and serving from there, and it does seem to work. There are two > downsides, though. One is that the Linux loopback driver is a real > performance bottleneck, ever since some bright person had the idea to > make it less multi-threaded than it had been. Another is that booting > tends to involve metadata-heavy access patterns which are not exactly > Lustre''s strength - a situation made worse when you have nearly a > thousand clients doing it at the same time and your MDS is a relatively > small node like the others. So far we''ve found that NBD serves us > better in the boot/root filesystem role, though that means a read-only > root which involves its own complexity. Your mileage will almost > certainly vary.Does the metadata updates problem go away in a read-only root environment? That makes life a lot easier. I don''t want to have some random node that''s either hacked or out to lunch be able to make changes to the filesystem. When I used NFS root, we had one node that was allowed to write, and all the rest were read-only mounts. I find AFS much more convenient since I can log in as root, and authenticate to the filesystem from any node to make changes.
Jeff/Troy, ----- "Jeff Darcy" <jeffd at sicortex.com> wrote:> We''ve tried loading "canned" MDT/OST images into memory on some nodes > and serving from there, and it does seem to work. There are two > downsides, though. One is that the Linux loopback driver is a real > performance bottleneck, ever since some bright person had the idea to > make it less multi-threaded than it had been. Another is that > booting tends to involve metadata-heavy access patterns which are not exactly > Lustre''s strength - a situation made worse when you have nearly a > thousand clients doing it at the same time and your MDS is a > relatively small node like the others. So far we''ve found that NBD serves us > better in the boot/root filesystem role, though that means a > read-only root which involves its own complexity. Your mileage will almost > certainly vary.A good trick with Lustre to get around the metadata bottleneck is to use disk image files on Lustre (e.g. SquashFS) and mount them using the loopback driver on each compute node (or "lctl attach_device" ?). So instead of having to bother the MDS you need only seek through the file on an OST. By either striping the read-only image across all your OSTs or having a round-robin image per OST you can get pretty good scalability. I tried this with our 700 node compute cluster but to be honest the overall booting performance was not that different to a couple of NFS servers serving a read-only root so it was not really worth the extra complexity in the end. We do still use SquashFS on Lustre from time to time when we have a directory tree with 30,000 small files in it that needs to be read by every farm machine. It''s rare but it does happen and traditionally NFS does much better with such workloads. Daire