Phil Schwan
2006-May-19 07:36 UTC
[Lustre-discuss] lustre client and ost on the same machine...
jobarjo wrote:> > However, I have some questions about lustre architecture. > > The typical lustre architecture shows that clients and OST are on > separate nodes. > Is there a specific reason why this separation?Our typical customers today are running clusters with at least 1,000 client nodes and dozens of object storage servers. They want their data access to be as fast as possible, and that means putting the servers on separate nodes.> Do the OSTs consume lot of CPU?Not particularly; for example, a single-cpu Opteron machine can move many hundreds of megabytes per second, given adequate disk and network hardware. Nevertheless, in a configuration with thousands of nodes, there are not really extra CPU cycles for client applications.> What is the problem with a full symetric cluster: each client containing > an OST > Does it have a real performance problem? > I''ve read somewhere on this list there were a bug when the client and > OST are on the same node. > Is it corrected on more recent available version?There is a bug, but it has nothing to do with performance; it is a memory management issue. In summary, when a client and an OSS are on the same node, the client may have dirty data in its page cache that must be written out before the kernel memory allocator can satisfy an allocation request. In that case, the OSS code must be able to receive that data and write it to disk without needing to allocate any memory (or extremely little, in a non-blocking way that must succeed). The kernel leaves a little bit of room for file systems to allocate memory in this situation, but Lustre will need to take more extreme measures to make sure that these allocations neither block nor fail. This is pretty subtle work, and none of our current customers have expressed an interest in funding this work. We will get to it eventually, because we want to support this configuration, but I can''t promise a specific version or date.> What about caching? > Does all file open require an access to MDS? Is there any callback like > AFS for cache consistency? > Is there any OST read caching?Clients cache aggressively, both file data (reading and writing), and metadata (reading only, in 1.x). All cached data is fully protected by our lock manager, with callbacks to selectively invalidate cached data as necessary. Today, all file opens require an RPC to the MDS. In Lustre 1.4 the clients will cache file handles, and repeated opens in a short period will not require additional RPCs. -Phil
Paul Nowoczynski
2006-May-19 07:36 UTC
[Lustre-discuss] lustre client and ost on the same machine...
Hi Phil, When you say, "a single-cpu Opteron machine can move many hundreds of megabytes per second, given adequate disk and network hardware.", how many is "many"? Lower hundreds or somewhere aproaching 800 or 900MB/sec. Can you tell us the best speeds which you have witnessed to a single OST? Paul Phil Schwan wrote:>jobarjo wrote: > > >> >>However, I have some questions about lustre architecture. >> >>The typical lustre architecture shows that clients and OST are on >>separate nodes. >>Is there a specific reason why this separation? >> >> > >Our typical customers today are running clusters with at least 1,000 >client nodes and dozens of object storage servers. They want their data >access to be as fast as possible, and that means putting the servers on >separate nodes. > > > >>Do the OSTs consume lot of CPU? >> >> > >Not particularly; for example, a single-cpu Opteron machine can move >many hundreds of megabytes per second, given adequate disk and network >hardware. > >Nevertheless, in a configuration with thousands of nodes, there are not >really extra CPU cycles for client applications. > > > >>What is the problem with a full symetric cluster: each client containing >>an OST >>Does it have a real performance problem? >>I''ve read somewhere on this list there were a bug when the client and >>OST are on the same node. >>Is it corrected on more recent available version? >> >> > >There is a bug, but it has nothing to do with performance; it is a >memory management issue. In summary, when a client and an OSS are on >the same node, the client may have dirty data in its page cache that >must be written out before the kernel memory allocator can satisfy an >allocation request. In that case, the OSS code must be able to receive >that data and write it to disk without needing to allocate any memory >(or extremely little, in a non-blocking way that must succeed). > >The kernel leaves a little bit of room for file systems to allocate >memory in this situation, but Lustre will need to take more extreme >measures to make sure that these allocations neither block nor fail. >This is pretty subtle work, and none of our current customers have >expressed an interest in funding this work. > >We will get to it eventually, because we want to support this >configuration, but I can''t promise a specific version or date. > > > >>What about caching? >>Does all file open require an access to MDS? Is there any callback like >>AFS for cache consistency? >>Is there any OST read caching? >> >> > >Clients cache aggressively, both file data (reading and writing), and >metadata (reading only, in 1.x). All cached data is fully protected by >our lock manager, with callbacks to selectively invalidate cached data >as necessary. Today, all file opens require an RPC to the MDS. In >Lustre 1.4 the clients will cache file handles, and repeated opens in a >short period will not require additional RPCs. > >-Phil >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@lists.clusterfs.com >https://lists.clusterfs.com/mailman/listinfo/lustre-discuss > >
Phil Schwan
2006-May-19 07:36 UTC
[Lustre-discuss] lustre client and ost on the same machine...
Paul Nowoczynski wrote:> Hi Phil, > When you say, "a single-cpu Opteron machine can move many hundreds of > megabytes per second, given adequate disk and network hardware.", how > many is "many"? Lower hundreds or somewhere aproaching 800 or > 900MB/sec. Can you > tell us the best speeds which you have witnessed to a single OST?The best speeds we have demonstrated with a single OST with all of the pieces in place is ~270 MB/s, on an dual-CPU IA-32 with Quadrics Elan 3. The bottleneck at that point is pretty much the PCI bus. When we did our 10-gige tests, we did not have a nearly large enough disk array, so we had to simulate it. On a single-CPU Opteron with a 10-gige adapter, we sustained 550 MB/s. Given that we did this at only 50% CPU utilization, there is more than enough CPU remaining to drive real disks at these speeds. On the client side, we have demonstrated 660 MB/s on a dual-CPU ia64 client with Quadrics Elan 4, doing I/O to multiple slow OSTs (I don''t know how many, but each is capable of at most ~75 MB/s). Thanks-- -Phil
jobarjo
2006-May-19 07:36 UTC
[Lustre-discuss] lustre client and ost on the same machine...
Hi All I''m following lustre devellopment since a while. It seems that lustre''s evolution is very fast! Congratulations. However, I have some questions about lustre architecture. The typical lustre architecture shows that clients and OST are on separate nodes. Is there a specific reason why this separation? Do the OSTs consume lot of CPU? What is the problem with a full symetric cluster: each client containing an OST Does it have a real performance problem? I''ve read somewhere on this list there were a bug when the client and OST are on the same node. Is it corrected on more recent available version? What about caching? Does all file open require an access to MDS? Is there any callback like AFS for cache consistency? Is there any OST read caching? Thanks