Klaus Steden
2008-Jan-14 20:41 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Hi there, I was asked by a friend of a business contact of mine the other day to share some information about Lustre; seems he''s planning to build what will eventually be about a 3 PB file system. The CFS website doesn''t appear to have any information on field deployments worth bragging about, so I figured I''d ask, just for fun; does anyone know: - the size of the largest working Lustre file system currently in the field - the highest sustained number of IOPS seen with Lustre, and what the backend was? cheers, Klaus
D. Marc Stearman
2008-Jan-14 21:36 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Klaus, We currently have a 1.2PB lustre filesystem that we will be expanding to 2.4PB in the near future. I not sure about the highest sustained IOPS, but we did have a user peak 19GB/s to one of our 500TB filesystems recently. The backend for that was 16 DDN 8500 couplets with write-cache turned OFF. -Marc ---- D. Marc Stearman LC Lustre Systems Administrator marc at llnl.gov 925.423.9670 Pager: 1.888.203.0641 On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote:> > Hi there, > > I was asked by a friend of a business contact of mine the other day > to share > some information about Lustre; seems he''s planning to build what will > eventually be about a 3 PB file system. > > The CFS website doesn''t appear to have any information on field > deployments > worth bragging about, so I figured I''d ask, just for fun; does > anyone know: > > - the size of the largest working Lustre file system currently in > the field > - the highest sustained number of IOPS seen with Lustre, and what the > backend was? > > cheers, > Klaus > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
D. Marc Stearman
2008-Jan-14 21:41 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Sorry, I confused myself. We have a 1.2PB filesystem that will remain that size. We will be expanding our 958TB filesystem to ~2.4PB in the near future. -Marc ---- D. Marc Stearman LC Lustre Systems Administrator marc at llnl.gov 925.423.9670 Pager: 1.888.203.0641 On Jan 14, 2008, at 1:36 PM, D. Marc Stearman wrote:> Klaus, > > We currently have a 1.2PB lustre filesystem that we will be expanding > to 2.4PB in the near future. I not sure about the highest sustained > IOPS, but we did have a user peak 19GB/s to one of our 500TB > filesystems recently. The backend for that was 16 DDN 8500 couplets > with write-cache turned OFF. > > -Marc > > ---- > D. Marc Stearman > LC Lustre Systems Administrator > marc at llnl.gov > 925.423.9670 > Pager: 1.888.203.0641 > > > On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: > >> >> Hi there, >> >> I was asked by a friend of a business contact of mine the other day >> to share >> some information about Lustre; seems he''s planning to build what will >> eventually be about a 3 PB file system. >> >> The CFS website doesn''t appear to have any information on field >> deployments >> worth bragging about, so I figured I''d ask, just for fun; does >> anyone know: >> >> - the size of the largest working Lustre file system currently in >> the field >> - the highest sustained number of IOPS seen with Lustre, and what the >> backend was? >> >> cheers, >> Klaus >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Canon, Richard Shane
2008-Jan-14 21:48 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Klaus, Here are some that I know are pretty large. * RedStorm - I think it has two roughly 50 GB/s file systems. The capacity may not be quite as large though. I think they used FC drives. It was DDN 8500 although that may have changed. * CEA - I think they have a file system approaching 100 GB/s. I think it is DDN 9550. Not sure about the capacities. * TACC has a large Thumper based system. Not sure of the specs. * ORNL - We have a 44 GB/s file system with around 800 TB of total capacity. That is DDN 9550. We also have two new file system (20 GB/s and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Those have around 800 TB each (after RAID6). * We are planning a 200 GB/s, around 10 PB file system now. --Shane -----Original Message----- From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc Stearman Sent: Monday, January 14, 2008 4:37 PM To: lustre-discuss at clusterfs.com Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file system? Klaus, We currently have a 1.2PB lustre filesystem that we will be expanding to 2.4PB in the near future. I not sure about the highest sustained IOPS, but we did have a user peak 19GB/s to one of our 500TB filesystems recently. The backend for that was 16 DDN 8500 couplets with write-cache turned OFF. -Marc ---- D. Marc Stearman LC Lustre Systems Administrator marc at llnl.gov 925.423.9670 Pager: 1.888.203.0641 On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote:> > Hi there, > > I was asked by a friend of a business contact of mine the other day > to share > some information about Lustre; seems he''s planning to build what will > eventually be about a 3 PB file system. > > The CFS website doesn''t appear to have any information on field > deployments > worth bragging about, so I figured I''d ask, just for fun; does > anyone know: > > - the size of the largest working Lustre file system currently in > the field > - the highest sustained number of IOPS seen with Lustre, and what the > backend was? > > cheers, > Klaus > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss_______________________________________________ Lustre-discuss mailing list Lustre-discuss at clusterfs.com https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Kennedy, Jeffrey
2008-Jan-14 21:55 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Any spec''s on IOPS rather than throughput? Thanks. Jeff Kennedy QCT Engineering Compute 858-651-6592> -----Original Message----- > From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss- > bounces at clusterfs.com] On Behalf Of Canon, Richard Shane > Sent: Monday, January 14, 2008 1:49 PM > To: lustre-discuss at clusterfs.com > Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file > system? > > > Klaus, > > Here are some that I know are pretty large. > > * RedStorm - I think it has two roughly 50 GB/s file systems. The > capacity may not be quite as large though. I think they used FCdrives.> It was DDN 8500 although that may have changed. > * CEA - I think they have a file system approaching 100 GB/s. I think > it is DDN 9550. Not sure about the capacities. > * TACC has a large Thumper based system. Not sure of the specs. > * ORNL - We have a 44 GB/s file system with around 800 TB of total > capacity. That is DDN 9550. We also have two new file system (20GB/s> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Those have > around 800 TB each (after RAID6). > * We are planning a 200 GB/s, around 10 PB file system now. > > --Shane > > -----Original Message----- > From: lustre-discuss-bounces at clusterfs.com > [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc > Stearman > Sent: Monday, January 14, 2008 4:37 PM > To: lustre-discuss at clusterfs.com > Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file > system? > > Klaus, > > We currently have a 1.2PB lustre filesystem that we will be expanding > to 2.4PB in the near future. I not sure about the highest sustained > IOPS, but we did have a user peak 19GB/s to one of our 500TB > filesystems recently. The backend for that was 16 DDN 8500 couplets > with write-cache turned OFF. > > -Marc > > ---- > D. Marc Stearman > LC Lustre Systems Administrator > marc at llnl.gov > 925.423.9670 > Pager: 1.888.203.0641 > > > On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: > > > > > Hi there, > > > > I was asked by a friend of a business contact of mine the other day > > to share > > some information about Lustre; seems he''s planning to build whatwill> > eventually be about a 3 PB file system. > > > > The CFS website doesn''t appear to have any information on field > > deployments > > worth bragging about, so I figured I''d ask, just for fun; does > > anyone know: > > > > - the size of the largest working Lustre file system currently in > > the field > > - the highest sustained number of IOPS seen with Lustre, and whatthe> > backend was? > > > > cheers, > > Klaus > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Canon, Richard Shane
2008-Jan-14 23:11 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Jeff, I''m not aware of any. For parallel file systems it is usually bandwidth centric. --Shane -----Original Message----- From: Kennedy, Jeffrey [mailto:jkennedy at qualcomm.com] Sent: Monday, January 14, 2008 4:56 PM To: Canon, Richard Shane; lustre-discuss at clusterfs.com Subject: RE: [Lustre-discuss] Off-topic: largest existing Lustre file system? Any spec''s on IOPS rather than throughput? Thanks. Jeff Kennedy QCT Engineering Compute 858-651-6592> -----Original Message----- > From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss- > bounces at clusterfs.com] On Behalf Of Canon, Richard Shane > Sent: Monday, January 14, 2008 1:49 PM > To: lustre-discuss at clusterfs.com > Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file > system? > > > Klaus, > > Here are some that I know are pretty large. > > * RedStorm - I think it has two roughly 50 GB/s file systems. The > capacity may not be quite as large though. I think they used FCdrives.> It was DDN 8500 although that may have changed. > * CEA - I think they have a file system approaching 100 GB/s. I think > it is DDN 9550. Not sure about the capacities. > * TACC has a large Thumper based system. Not sure of the specs. > * ORNL - We have a 44 GB/s file system with around 800 TB of total > capacity. That is DDN 9550. We also have two new file system (20GB/s> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Those have > around 800 TB each (after RAID6). > * We are planning a 200 GB/s, around 10 PB file system now. > > --Shane > > -----Original Message----- > From: lustre-discuss-bounces at clusterfs.com > [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc > Stearman > Sent: Monday, January 14, 2008 4:37 PM > To: lustre-discuss at clusterfs.com > Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file > system? > > Klaus, > > We currently have a 1.2PB lustre filesystem that we will be expanding > to 2.4PB in the near future. I not sure about the highest sustained > IOPS, but we did have a user peak 19GB/s to one of our 500TB > filesystems recently. The backend for that was 16 DDN 8500 couplets > with write-cache turned OFF. > > -Marc > > ---- > D. Marc Stearman > LC Lustre Systems Administrator > marc at llnl.gov > 925.423.9670 > Pager: 1.888.203.0641 > > > On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: > > > > > Hi there, > > > > I was asked by a friend of a business contact of mine the other day > > to share > > some information about Lustre; seems he''s planning to build whatwill> > eventually be about a 3 PB file system. > > > > The CFS website doesn''t appear to have any information on field > > deployments > > worth bragging about, so I figured I''d ask, just for fun; does > > anyone know: > > > > - the size of the largest working Lustre file system currently in > > the field > > - the highest sustained number of IOPS seen with Lustre, and whatthe> > backend was? > > > > cheers, > > Klaus > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
mlbarna
2008-Jan-16 17:43 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Could you elaborate on the benchmarking application(s) run that provided these bandwidth numbers. I have a particular interest in MPI coded programs that perform collective I/O. In discussions, I find this topic sometimes confused; my meaning is streamed, appending with all the data from all the processors for a single, atomic write operation filling disjoint sections of the same file. In MPI-IO, the MPI_File_write_all* family seems to define my focus area, run with or without two-phase aggregation. Imitating the operation with simple, Posix I/O is acceptable, as far as I am concerned. In tests on redstorm from last year, I appended to a single, open file at a rate of 26 GB/s. I had to use exceptional parameters to achieve this however: the file had an LFS stripe-count of 160, and I sent a 20 MB buffer, respectively, from each of a 160, total processor job, for an aggregate of 3.2 GB per write_all operation. I consider this configuration out of the range of any normal usage. I believe that a faster rate could be achieved by a similar program that wrote independently--that is, one-file-per-processor--such as via NetCDF. For this case, I would set the LFS stripe-count down to one. Marty Barnaby On 1/14/08 4:11 PM, "Canon, Richard Shane" <canonrs at ornl.gov> wrote:> > Jeff, > > I''m not aware of any. For parallel file systems it is usually bandwidth > centric. > > --Shane > > -----Original Message----- > From: Kennedy, Jeffrey [mailto:jkennedy at qualcomm.com] > Sent: Monday, January 14, 2008 4:56 PM > To: Canon, Richard Shane; lustre-discuss at clusterfs.com > Subject: RE: [Lustre-discuss] Off-topic: largest existing Lustre file > system? > > Any spec''s on IOPS rather than throughput? > > Thanks. > > Jeff Kennedy > QCT Engineering Compute > 858-651-6592 > >> -----Original Message----- >> From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss- >> bounces at clusterfs.com] On Behalf Of Canon, Richard Shane >> Sent: Monday, January 14, 2008 1:49 PM >> To: lustre-discuss at clusterfs.com >> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >> system? >> >> >> Klaus, >> >> Here are some that I know are pretty large. >> >> * RedStorm - I think it has two roughly 50 GB/s file systems. The >> capacity may not be quite as large though. I think they used FC > drives. >> It was DDN 8500 although that may have changed. >> * CEA - I think they have a file system approaching 100 GB/s. I think >> it is DDN 9550. Not sure about the capacities. >> * TACC has a large Thumper based system. Not sure of the specs. >> * ORNL - We have a 44 GB/s file system with around 800 TB of total >> capacity. That is DDN 9550. We also have two new file system (20 > GB/s >> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Those have >> around 800 TB each (after RAID6). >> * We are planning a 200 GB/s, around 10 PB file system now. >> >> --Shane >> >> -----Original Message----- >> From: lustre-discuss-bounces at clusterfs.com >> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc >> Stearman >> Sent: Monday, January 14, 2008 4:37 PM >> To: lustre-discuss at clusterfs.com >> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >> system? >> >> Klaus, >> >> We currently have a 1.2PB lustre filesystem that we will be expanding >> to 2.4PB in the near future. I not sure about the highest sustained >> IOPS, but we did have a user peak 19GB/s to one of our 500TB >> filesystems recently. The backend for that was 16 DDN 8500 couplets >> with write-cache turned OFF. >> >> -Marc >> >> ---- >> D. Marc Stearman >> LC Lustre Systems Administrator >> marc at llnl.gov >> 925.423.9670 >> Pager: 1.888.203.0641 >> >> >> On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: >> >>> >>> Hi there, >>> >>> I was asked by a friend of a business contact of mine the other day >>> to share >>> some information about Lustre; seems he''s planning to build what > will >>> eventually be about a 3 PB file system. >>> >>> The CFS website doesn''t appear to have any information on field >>> deployments >>> worth bragging about, so I figured I''d ask, just for fun; does >>> anyone know: >>> >>> - the size of the largest working Lustre file system currently in >>> the field >>> - the highest sustained number of IOPS seen with Lustre, and what > the >>> backend was? >>> >>> cheers, >>> Klaus >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Canon, Richard Shane
2008-Jan-30 21:09 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Marty, Our benchmark measurements were made using IOR doing POSIX IO to a single shared file (I believe). Since you mentioned MPI-IO... Weikuan Yu (at ORNL) has done some work to improve the MPI-IO Lustre ADIO driver. Also, we have been sponsoring work through a Lustre Centre of Excellence to further improve the ADIO driver. I''m optimistic that this can make collective IO perform at a level that one would expect. File-per-process runs often do run faster up until the meta data activity associated with creating 10k+ files starts to slow things down. I''m a firm believer that collective IO through libraries like MPI-IO, HDF5, and pNetCDF are the way things should move. It should be possible to embed enough intelligence in these middle layers to do good stripe alignment, automatically tune stripe counts, and stripe width, etc. Some of this will hopefully be accomplished with the improvements being made to the ADIO. --Shane -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of mlbarna Sent: Wednesday, January 16, 2008 12:43 PM To: lustre-discuss at clusterfs.com Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file system? Could you elaborate on the benchmarking application(s) run that provided these bandwidth numbers. I have a particular interest in MPI coded programs that perform collective I/O. In discussions, I find this topic sometimes confused; my meaning is streamed, appending with all the data from all the processors for a single, atomic write operation filling disjoint sections of the same file. In MPI-IO, the MPI_File_write_all* family seems to define my focus area, run with or without two-phase aggregation. Imitating the operation with simple, Posix I/O is acceptable, as far as I am concerned. In tests on redstorm from last year, I appended to a single, open file at a rate of 26 GB/s. I had to use exceptional parameters to achieve this however: the file had an LFS stripe-count of 160, and I sent a 20 MB buffer, respectively, from each of a 160, total processor job, for an aggregate of 3.2 GB per write_all operation. I consider this configuration out of the range of any normal usage. I believe that a faster rate could be achieved by a similar program that wrote independently--that is, one-file-per-processor--such as via NetCDF. For this case, I would set the LFS stripe-count down to one. Marty Barnaby On 1/14/08 4:11 PM, "Canon, Richard Shane" <canonrs at ornl.gov> wrote:> > Jeff, > > I''m not aware of any. For parallel file systems it is usuallybandwidth> centric. > > --Shane > > -----Original Message----- > From: Kennedy, Jeffrey [mailto:jkennedy at qualcomm.com] > Sent: Monday, January 14, 2008 4:56 PM > To: Canon, Richard Shane; lustre-discuss at clusterfs.com > Subject: RE: [Lustre-discuss] Off-topic: largest existing Lustre file > system? > > Any spec''s on IOPS rather than throughput? > > Thanks. > > Jeff Kennedy > QCT Engineering Compute > 858-651-6592 > >> -----Original Message----- >> From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss- >> bounces at clusterfs.com] On Behalf Of Canon, Richard Shane >> Sent: Monday, January 14, 2008 1:49 PM >> To: lustre-discuss at clusterfs.com >> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >> system? >> >> >> Klaus, >> >> Here are some that I know are pretty large. >> >> * RedStorm - I think it has two roughly 50 GB/s file systems. The >> capacity may not be quite as large though. I think they used FC > drives. >> It was DDN 8500 although that may have changed. >> * CEA - I think they have a file system approaching 100 GB/s. Ithink>> it is DDN 9550. Not sure about the capacities. >> * TACC has a large Thumper based system. Not sure of the specs. >> * ORNL - We have a 44 GB/s file system with around 800 TB of total >> capacity. That is DDN 9550. We also have two new file system (20 > GB/s >> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Thosehave>> around 800 TB each (after RAID6). >> * We are planning a 200 GB/s, around 10 PB file system now. >> >> --Shane >> >> -----Original Message----- >> From: lustre-discuss-bounces at clusterfs.com >> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc >> Stearman >> Sent: Monday, January 14, 2008 4:37 PM >> To: lustre-discuss at clusterfs.com >> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >> system? >> >> Klaus, >> >> We currently have a 1.2PB lustre filesystem that we will be expanding >> to 2.4PB in the near future. I not sure about the highest sustained >> IOPS, but we did have a user peak 19GB/s to one of our 500TB >> filesystems recently. The backend for that was 16 DDN 8500 couplets >> with write-cache turned OFF. >> >> -Marc >> >> ---- >> D. Marc Stearman >> LC Lustre Systems Administrator >> marc at llnl.gov >> 925.423.9670 >> Pager: 1.888.203.0641 >> >> >> On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: >> >>> >>> Hi there, >>> >>> I was asked by a friend of a business contact of mine the other day >>> to share >>> some information about Lustre; seems he''s planning to build what > will >>> eventually be about a 3 PB file system. >>> >>> The CFS website doesn''t appear to have any information on field >>> deployments >>> worth bragging about, so I figured I''d ask, just for fun; does >>> anyone know: >>> >>> - the size of the largest working Lustre file system currently in >>> the field >>> - the highest sustained number of IOPS seen with Lustre, and what > the >>> backend was? >>> >>> cheers, >>> Klaus >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >_______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Weikuan Yu
2008-Jan-31 17:13 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
I would throw in some of my experience for discussion as Shane mentioned my name here :) (1) First, I am not under the impression that the collective I/O is designed to reveal the peak performance of a particular system. Well, there are publications claiming that colective I/O might be a preferred case for some particular architecture, e.g., for BG/P (HPCA06)... But I am not fully positive or clear on the context of the claim. (2) With the intended disjoint access as Marty mentioned, using collective I/O would only add a phase of coordination before each process streams its data to the file system. (3) The testing as Marty described below seem small, both in terms of process counts and the amount of data. Aside from whether this is good for revealing the system''s best performance, I can confirm that there are applications at ORNL that has much larger process counts and bigger data volume from each processes. > In tests on redstorm from last year, I appended to a single, open file > at a > rate of 26 GB/s. I had to use exceptional parameters to achieve this > however: the file had an LFS stripe-count of 160, and I sent a 20 MB > buffer, > respectively, from each of a 160, total processor job, for an aggregate > of > 3.2 GB per write_all operation. I consider this configuration out of the > range of any normal usage. IIRC, something regarding collective I/O has been discussed earlier, also with Marty (?). In the upcoming IPDPS08 conference, ORNL has two papers on I/O performance of Jaguar. So you may find the numbers interesting. The final version of the papers should be available from me or Mark Fahey (the author of another paper). --Weikuan Canon, Richard Shane wrote:> Marty, > > Our benchmark measurements were made using IOR doing POSIX IO to a > single shared file (I believe). > > Since you mentioned MPI-IO... Weikuan Yu (at ORNL) has done some work > to improve the MPI-IO Lustre ADIO driver. Also, we have been sponsoring > work through a Lustre Centre of Excellence to further improve the ADIO > driver. I''m optimistic that this can make collective IO perform at a > level that one would expect. File-per-process runs often do run faster > up until the meta data activity associated with creating 10k+ files > starts to slow things down. I''m a firm believer that collective IO > through libraries like MPI-IO, HDF5, and pNetCDF are the way things > should move. It should be possible to embed enough intelligence in > these middle layers to do good stripe alignment, automatically tune > stripe counts, and stripe width, etc. Some of this will hopefully be > accomplished with the improvements being made to the ADIO. > > --Shane > > -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of mlbarna > Sent: Wednesday, January 16, 2008 12:43 PM > To: lustre-discuss at clusterfs.com > Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file > system? > > Could you elaborate on the benchmarking application(s) run that provided > these bandwidth numbers. I have a particular interest in MPI coded > programs > that perform collective I/O. In discussions, I find this topic sometimes > confused; my meaning is streamed, appending with all the data from all > the > processors for a single, atomic write operation filling disjoint > sections of > the same file. In MPI-IO, the MPI_File_write_all* family seems to define > my > focus area, run with or without two-phase aggregation. Imitating the > operation with simple, Posix I/O is acceptable, as far as I am > concerned. > > In tests on redstorm from last year, I appended to a single, open file > at a > rate of 26 GB/s. I had to use exceptional parameters to achieve this > however: the file had an LFS stripe-count of 160, and I sent a 20 MB > buffer, > respectively, from each of a 160, total processor job, for an aggregate > of > 3.2 GB per write_all operation. I consider this configuration out of the > range of any normal usage. > > I believe that a faster rate could be achieved by a similar program that > wrote independently--that is, one-file-per-processor--such as via > NetCDF. > For this case, I would set the LFS stripe-count down to one. > > > Marty Barnaby > > > > On 1/14/08 4:11 PM, "Canon, Richard Shane" <canonrs at ornl.gov> wrote: > >> Jeff, >> >> I''m not aware of any. For parallel file systems it is usually > bandwidth >> centric. >> >> --Shane >> >> -----Original Message----- >> From: Kennedy, Jeffrey [mailto:jkennedy at qualcomm.com] >> Sent: Monday, January 14, 2008 4:56 PM >> To: Canon, Richard Shane; lustre-discuss at clusterfs.com >> Subject: RE: [Lustre-discuss] Off-topic: largest existing Lustre file >> system? >> >> Any spec''s on IOPS rather than throughput? >> >> Thanks. >> >> Jeff Kennedy >> QCT Engineering Compute >> 858-651-6592 >> >>> -----Original Message----- >>> From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss- >>> bounces at clusterfs.com] On Behalf Of Canon, Richard Shane >>> Sent: Monday, January 14, 2008 1:49 PM >>> To: lustre-discuss at clusterfs.com >>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >>> system? >>> >>> >>> Klaus, >>> >>> Here are some that I know are pretty large. >>> >>> * RedStorm - I think it has two roughly 50 GB/s file systems. The >>> capacity may not be quite as large though. I think they used FC >> drives. >>> It was DDN 8500 although that may have changed. >>> * CEA - I think they have a file system approaching 100 GB/s. I > think >>> it is DDN 9550. Not sure about the capacities. >>> * TACC has a large Thumper based system. Not sure of the specs. >>> * ORNL - We have a 44 GB/s file system with around 800 TB of total >>> capacity. That is DDN 9550. We also have two new file system (20 >> GB/s >>> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Those > have >>> around 800 TB each (after RAID6). >>> * We are planning a 200 GB/s, around 10 PB file system now. >>> >>> --Shane >>> >>> -----Original Message----- >>> From: lustre-discuss-bounces at clusterfs.com >>> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc >>> Stearman >>> Sent: Monday, January 14, 2008 4:37 PM >>> To: lustre-discuss at clusterfs.com >>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >>> system? >>> >>> Klaus, >>> >>> We currently have a 1.2PB lustre filesystem that we will be expanding >>> to 2.4PB in the near future. I not sure about the highest sustained >>> IOPS, but we did have a user peak 19GB/s to one of our 500TB >>> filesystems recently. The backend for that was 16 DDN 8500 couplets >>> with write-cache turned OFF. >>> >>> -Marc >>> >>> ---- >>> D. Marc Stearman >>> LC Lustre Systems Administrator >>> marc at llnl.gov >>> 925.423.9670 >>> Pager: 1.888.203.0641 >>> >>> >>> On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: >>> >>>> Hi there, >>>> >>>> I was asked by a friend of a business contact of mine the other day >>>> to share >>>> some information about Lustre; seems he''s planning to build what >> will >>>> eventually be about a 3 PB file system. >>>> >>>> The CFS website doesn''t appear to have any information on field >>>> deployments >>>> worth bragging about, so I figured I''d ask, just for fun; does >>>> anyone know: >>>> >>>> - the size of the largest working Lustre file system currently in >>>> the field >>>> - the highest sustained number of IOPS seen with Lustre, and what >> the >>>> backend was? >>>> >>>> cheers, >>>> Klaus >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at clusterfs.com >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Marty Barnaby
2008-Jan-31 19:25 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
I concur that at 160, my processor count was low. At the time, I had access to as many as 1000 on our big Cray XT3, Redstorm, but now it is not available to me at all. Through my trials, I found that, for appending to a single, shared file, matching the lfs maximum stripe count of 160, with the equivalent job size was the only combination that got me this up to this rate. My interest in this case ends here, because real usage involves processor counts in the thousands. I''m not certain about the distinctions of collective I/O and shared files. It seems the latter is to be determined by the application authors and their users. With, for instance, a mesh-type application, there might be trade-offs; but having, at least, all the data for one computed field or dynamic, for one time step that is saved to the output (we usually calls these dumps), stored in the same file is regularly more advantageous for the entire work cycle. Actually, having all the output data in a single file seems like the most desirable approach. Though MPI-IO independent writing operations can be on the lowest level, mustn''t there still be some global coordination to determine where each processors chunk of a communicators complete vector of values is written, respectively, in a shared file? Further, what we now call two-phase aggregation, as a means of turning many, tiny block-size writes into a small number of large ones to leverage the greater efficiency of the FS to respond to this type of activity, has potential benefits. However, I''ve seen the considerable ROMIO provisions for this technique used incorrectly, delivering a decrease in performance. Marty Barnaby Weikuan Yu wrote:> I would throw in some of my experience for discussion as Shane mentioned > my name here :) > > (1) First, I am not under the impression that the collective I/O is > designed to reveal the peak performance of a particular system. Well, > there are publications claiming that colective I/O might be a preferred > case for some particular architecture, e.g., for BG/P (HPCA06)... But I > am not fully positive or clear on the context of the claim. > > (2) With the intended disjoint access as Marty mentioned, using > collective I/O would only add a phase of coordination before each > process streams its data to the file system. > > (3) The testing as Marty described below seem small, both in terms of > process counts and the amount of data. Aside from whether this is good > for revealing the system''s best performance, I can confirm that there > are applications at ORNL that has much larger process counts and bigger > data volume from each processes. > > > In tests on redstorm from last year, I appended to a single, open file > > at a > > rate of 26 GB/s. I had to use exceptional parameters to achieve this > > however: the file had an LFS stripe-count of 160, and I sent a 20 MB > > buffer, > > respectively, from each of a 160, total processor job, for an aggregate > > of > > 3.2 GB per write_all operation. I consider this configuration out of the > > range of any normal usage. > > IIRC, something regarding collective I/O has been discussed earlier, > also with Marty (?). > > In the upcoming IPDPS08 conference, ORNL has two papers on I/O > performance of Jaguar. So you may find the numbers interesting. The > final version of the papers should be available from me or Mark Fahey > (the author of another paper). > > --Weikuan > > > Canon, Richard Shane wrote: > >> Marty, >> >> Our benchmark measurements were made using IOR doing POSIX IO to a >> single shared file (I believe). >> >> Since you mentioned MPI-IO... Weikuan Yu (at ORNL) has done some work >> to improve the MPI-IO Lustre ADIO driver. Also, we have been sponsoring >> work through a Lustre Centre of Excellence to further improve the ADIO >> driver. I''m optimistic that this can make collective IO perform at a >> level that one would expect. File-per-process runs often do run faster >> up until the meta data activity associated with creating 10k+ files >> starts to slow things down. I''m a firm believer that collective IO >> through libraries like MPI-IO, HDF5, and pNetCDF are the way things >> should move. It should be possible to embed enough intelligence in >> these middle layers to do good stripe alignment, automatically tune >> stripe counts, and stripe width, etc. Some of this will hopefully be >> accomplished with the improvements being made to the ADIO. >> >> --Shane >> >> -----Original Message----- >> From: lustre-discuss-bounces at lists.lustre.org >> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of mlbarna >> Sent: Wednesday, January 16, 2008 12:43 PM >> To: lustre-discuss at clusterfs.com >> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >> system? >> >> Could you elaborate on the benchmarking application(s) run that provided >> these bandwidth numbers. I have a particular interest in MPI coded >> programs >> that perform collective I/O. In discussions, I find this topic sometimes >> confused; my meaning is streamed, appending with all the data from all >> the >> processors for a single, atomic write operation filling disjoint >> sections of >> the same file. In MPI-IO, the MPI_File_write_all* family seems to define >> my >> focus area, run with or without two-phase aggregation. Imitating the >> operation with simple, Posix I/O is acceptable, as far as I am >> concerned. >> >> In tests on redstorm from last year, I appended to a single, open file >> at a >> rate of 26 GB/s. I had to use exceptional parameters to achieve this >> however: the file had an LFS stripe-count of 160, and I sent a 20 MB >> buffer, >> respectively, from each of a 160, total processor job, for an aggregate >> of >> 3.2 GB per write_all operation. I consider this configuration out of the >> range of any normal usage. >> >> I believe that a faster rate could be achieved by a similar program that >> wrote independently--that is, one-file-per-processor--such as via >> NetCDF. >> For this case, I would set the LFS stripe-count down to one. >> >> >> Marty Barnaby >> >> >> >> On 1/14/08 4:11 PM, "Canon, Richard Shane" <canonrs at ornl.gov> wrote: >> >> >>> Jeff, >>> >>> I''m not aware of any. For parallel file systems it is usually >>> >> bandwidth >> >>> centric. >>> >>> --Shane >>> >>> -----Original Message----- >>> From: Kennedy, Jeffrey [mailto:jkennedy at qualcomm.com] >>> Sent: Monday, January 14, 2008 4:56 PM >>> To: Canon, Richard Shane; lustre-discuss at clusterfs.com >>> Subject: RE: [Lustre-discuss] Off-topic: largest existing Lustre file >>> system? >>> >>> Any spec''s on IOPS rather than throughput? >>> >>> Thanks. >>> >>> Jeff Kennedy >>> QCT Engineering Compute >>> 858-651-6592 >>> >>> >>>> -----Original Message----- >>>> From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss- >>>> bounces at clusterfs.com] On Behalf Of Canon, Richard Shane >>>> Sent: Monday, January 14, 2008 1:49 PM >>>> To: lustre-discuss at clusterfs.com >>>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >>>> system? >>>> >>>> >>>> Klaus, >>>> >>>> Here are some that I know are pretty large. >>>> >>>> * RedStorm - I think it has two roughly 50 GB/s file systems. The >>>> capacity may not be quite as large though. I think they used FC >>>> >>> drives. >>> >>>> It was DDN 8500 although that may have changed. >>>> * CEA - I think they have a file system approaching 100 GB/s. I >>>> >> think >> >>>> it is DDN 9550. Not sure about the capacities. >>>> * TACC has a large Thumper based system. Not sure of the specs. >>>> * ORNL - We have a 44 GB/s file system with around 800 TB of total >>>> capacity. That is DDN 9550. We also have two new file system (20 >>>> >>> GB/s >>> >>>> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Those >>>> >> have >> >>>> around 800 TB each (after RAID6). >>>> * We are planning a 200 GB/s, around 10 PB file system now. >>>> >>>> --Shane >>>> >>>> -----Original Message----- >>>> From: lustre-discuss-bounces at clusterfs.com >>>> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc >>>> Stearman >>>> Sent: Monday, January 14, 2008 4:37 PM >>>> To: lustre-discuss at clusterfs.com >>>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >>>> system? >>>> >>>> Klaus, >>>> >>>> We currently have a 1.2PB lustre filesystem that we will be expanding >>>> to 2.4PB in the near future. I not sure about the highest sustained >>>> IOPS, but we did have a user peak 19GB/s to one of our 500TB >>>> filesystems recently. The backend for that was 16 DDN 8500 couplets >>>> with write-cache turned OFF. >>>> >>>> -Marc >>>> >>>> ---- >>>> D. Marc Stearman >>>> LC Lustre Systems Administrator >>>> marc at llnl.gov >>>> 925.423.9670 >>>> Pager: 1.888.203.0641 >>>> >>>> >>>> On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: >>>> >>>> >>>>> Hi there, >>>>> >>>>> I was asked by a friend of a business contact of mine the other day >>>>> to share >>>>> some information about Lustre; seems he''s planning to build what >>>>> >>> will >>> >>>>> eventually be about a 3 PB file system. >>>>> >>>>> The CFS website doesn''t appear to have any information on field >>>>> deployments >>>>> worth bragging about, so I figured I''d ask, just for fun; does >>>>> anyone know: >>>>> >>>>> - the size of the largest working Lustre file system currently in >>>>> the field >>>>> - the highest sustained number of IOPS seen with Lustre, and what >>>>> >>> the >>> >>>>> backend was? >>>>> >>>>> cheers, >>>>> Klaus >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at clusterfs.com >>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at clusterfs.com >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at clusterfs.com >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080131/5c71e4ee/attachment-0002.html
Tom.Wang
2008-Jan-31 21:11 UTC
[Lustre-discuss] Off-topic: largest existing Lustre file system?
Marty Barnaby wrote:> I concur that at 160, my processor count was low. At the time, I had > access to as many as 1000 on our big Cray XT3, Redstorm, but now it is > not available to me at all. Through my trials, I found that, for > appending to a single, shared file, matching the lfs maximum stripe > count of 160, with the equivalent job size was the only combination > that got me this up to this rate. My interest in this case ends here, > because real usage involves processor counts in the thousands. > > I''m not certain about the distinctions of collective I/O and shared > files. It seems the latter is to be determined by the application > authors and their users. With, for instance, a mesh-type application, > there might be trade-offs; but having, at least, all the data for one > computed field or dynamic, for one time step that is saved to the > output (we usually calls these dumps), stored in the same file is > regularly more advantageous for the entire work cycle. Actually, > having all the output data in a single file seems like the most > desirable approach. > > Though MPI-IO independent writing operations can be on the lowest > level, mustn''t there still be some global coordination to determine > where each processors chunk of a communicators complete vector of > values is written, respectively, in a shared file? Further, what we > now call two-phase aggregation, as a means of turning many, tiny > block-size writes into a small number of large ones to leverage the > greater efficiency of the FS to respond to this type of activity, has > potential benefits. However, I''ve seen the considerable ROMIO > provisions for this technique used incorrectly, delivering a decrease > in performance.The collective I/O should reorganize the data among the I/O nodes to some kind of pattern the bottom file system prefer, IMHO. As for lustre, sometimes it is sensitive to the I/O size(or alignment). especially for redstorm, since no client cache, and client just throw the I/O req to server as soon as it gets from the application. So collective I/O is the only layer I/O req might be optimized in the client, except the application. But current two-phase aggregagation does not consider too much about the specification information from the file system. for example it splits data evenly to all the I/O clients, instead of considering stripe-size(alignment) and which OST the data goes to (sometimes, even client I/O load != even OST I/O load); improper use of read-modify-write in data_sieving somtimes; no collection for non-interleave data. Our current experience is that be careful of using collective_io and data_sieving, except we are very clear about how adio driver will reorganize your data. For example, for HDF5 lib, mpiposix(no collective and data_sieve) layer is favorable. But I do agree with Shane, the collective I/O with HDF5, pNetCDF should be the way moving forward. WangDi> > Marty Barnaby > > > Weikuan Yu wrote: >> I would throw in some of my experience for discussion as Shane mentioned >> my name here :) >> >> (1) First, I am not under the impression that the collective I/O is >> designed to reveal the peak performance of a particular system. Well, >> there are publications claiming that colective I/O might be a preferred >> case for some particular architecture, e.g., for BG/P (HPCA06)... But I >> am not fully positive or clear on the context of the claim. >> >> (2) With the intended disjoint access as Marty mentioned, using >> collective I/O would only add a phase of coordination before each >> process streams its data to the file system. >> >> (3) The testing as Marty described below seem small, both in terms of >> process counts and the amount of data. Aside from whether this is good >> for revealing the system''s best performance, I can confirm that there >> are applications at ORNL that has much larger process counts and bigger >> data volume from each processes. >> >> > In tests on redstorm from last year, I appended to a single, open file >> > at a >> > rate of 26 GB/s. I had to use exceptional parameters to achieve this >> > however: the file had an LFS stripe-count of 160, and I sent a 20 MB >> > buffer, >> > respectively, from each of a 160, total processor job, for an aggregate >> > of >> > 3.2 GB per write_all operation. I consider this configuration out of the >> > range of any normal usage. >> >> IIRC, something regarding collective I/O has been discussed earlier, >> also with Marty (?). >> >> In the upcoming IPDPS08 conference, ORNL has two papers on I/O >> performance of Jaguar. So you may find the numbers interesting. The >> final version of the papers should be available from me or Mark Fahey >> (the author of another paper). >> >> --Weikuan >> >> >> Canon, Richard Shane wrote: >> >>> Marty, >>> >>> Our benchmark measurements were made using IOR doing POSIX IO to a >>> single shared file (I believe). >>> >>> Since you mentioned MPI-IO... Weikuan Yu (at ORNL) has done some work >>> to improve the MPI-IO Lustre ADIO driver. Also, we have been sponsoring >>> work through a Lustre Centre of Excellence to further improve the ADIO >>> driver. I''m optimistic that this can make collective IO perform at a >>> level that one would expect. File-per-process runs often do run faster >>> up until the meta data activity associated with creating 10k+ files >>> starts to slow things down. I''m a firm believer that collective IO >>> through libraries like MPI-IO, HDF5, and pNetCDF are the way things >>> should move. It should be possible to embed enough intelligence in >>> these middle layers to do good stripe alignment, automatically tune >>> stripe counts, and stripe width, etc. Some of this will hopefully be >>> accomplished with the improvements being made to the ADIO. >>> >>> --Shane >>> >>> -----Original Message----- >>> From: lustre-discuss-bounces at lists.lustre.org >>> [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of mlbarna >>> Sent: Wednesday, January 16, 2008 12:43 PM >>> To: lustre-discuss at clusterfs.com >>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >>> system? >>> >>> Could you elaborate on the benchmarking application(s) run that provided >>> these bandwidth numbers. I have a particular interest in MPI coded >>> programs >>> that perform collective I/O. In discussions, I find this topic sometimes >>> confused; my meaning is streamed, appending with all the data from all >>> the >>> processors for a single, atomic write operation filling disjoint >>> sections of >>> the same file. In MPI-IO, the MPI_File_write_all* family seems to define >>> my >>> focus area, run with or without two-phase aggregation. Imitating the >>> operation with simple, Posix I/O is acceptable, as far as I am >>> concerned. >>> >>> In tests on redstorm from last year, I appended to a single, open file >>> at a >>> rate of 26 GB/s. I had to use exceptional parameters to achieve this >>> however: the file had an LFS stripe-count of 160, and I sent a 20 MB >>> buffer, >>> respectively, from each of a 160, total processor job, for an aggregate >>> of >>> 3.2 GB per write_all operation. I consider this configuration out of the >>> range of any normal usage. >>> >>> I believe that a faster rate could be achieved by a similar program that >>> wrote independently--that is, one-file-per-processor--such as via >>> NetCDF. >>> For this case, I would set the LFS stripe-count down to one. >>> >>> >>> Marty Barnaby >>> >>> >>> >>> On 1/14/08 4:11 PM, "Canon, Richard Shane" <canonrs at ornl.gov> wrote: >>> >>> >>>> Jeff, >>>> >>>> I''m not aware of any. For parallel file systems it is usually >>>> >>> bandwidth >>> >>>> centric. >>>> >>>> --Shane >>>> >>>> -----Original Message----- >>>> From: Kennedy, Jeffrey [mailto:jkennedy at qualcomm.com] >>>> Sent: Monday, January 14, 2008 4:56 PM >>>> To: Canon, Richard Shane; lustre-discuss at clusterfs.com >>>> Subject: RE: [Lustre-discuss] Off-topic: largest existing Lustre file >>>> system? >>>> >>>> Any spec''s on IOPS rather than throughput? >>>> >>>> Thanks. >>>> >>>> Jeff Kennedy >>>> QCT Engineering Compute >>>> 858-651-6592 >>>> >>>> >>>>> -----Original Message----- >>>>> From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss- >>>>> bounces at clusterfs.com] On Behalf Of Canon, Richard Shane >>>>> Sent: Monday, January 14, 2008 1:49 PM >>>>> To: lustre-discuss at clusterfs.com >>>>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >>>>> system? >>>>> >>>>> >>>>> Klaus, >>>>> >>>>> Here are some that I know are pretty large. >>>>> >>>>> * RedStorm - I think it has two roughly 50 GB/s file systems. The >>>>> capacity may not be quite as large though. I think they used FC >>>>> >>>> drives. >>>> >>>>> It was DDN 8500 although that may have changed. >>>>> * CEA - I think they have a file system approaching 100 GB/s. I >>>>> >>> think >>> >>>>> it is DDN 9550. Not sure about the capacities. >>>>> * TACC has a large Thumper based system. Not sure of the specs. >>>>> * ORNL - We have a 44 GB/s file system with around 800 TB of total >>>>> capacity. That is DDN 9550. We also have two new file system (20 >>>>> >>>> GB/s >>>> >>>>> and 10 GB/s currently LSI XBB2 and DDN 9550 respectively). Those >>>>> >>> have >>> >>>>> around 800 TB each (after RAID6). >>>>> * We are planning a 200 GB/s, around 10 PB file system now. >>>>> >>>>> --Shane >>>>> >>>>> -----Original Message----- >>>>> From: lustre-discuss-bounces at clusterfs.com >>>>> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of D. Marc >>>>> Stearman >>>>> Sent: Monday, January 14, 2008 4:37 PM >>>>> To: lustre-discuss at clusterfs.com >>>>> Subject: Re: [Lustre-discuss] Off-topic: largest existing Lustre file >>>>> system? >>>>> >>>>> Klaus, >>>>> >>>>> We currently have a 1.2PB lustre filesystem that we will be expanding >>>>> to 2.4PB in the near future. I not sure about the highest sustained >>>>> IOPS, but we did have a user peak 19GB/s to one of our 500TB >>>>> filesystems recently. The backend for that was 16 DDN 8500 couplets >>>>> with write-cache turned OFF. >>>>> >>>>> -Marc >>>>> >>>>> ---- >>>>> D. Marc Stearman >>>>> LC Lustre Systems Administrator >>>>> marc at llnl.gov >>>>> 925.423.9670 >>>>> Pager: 1.888.203.0641 >>>>> >>>>> >>>>> On Jan 14, 2008, at 12:41 PM, Klaus Steden wrote: >>>>> >>>>> >>>>>> Hi there, >>>>>> >>>>>> I was asked by a friend of a business contact of mine the other day >>>>>> to share >>>>>> some information about Lustre; seems he''s planning to build what >>>>>> >>>> will >>>> >>>>>> eventually be about a 3 PB file system. >>>>>> >>>>>> The CFS website doesn''t appear to have any information on field >>>>>> deployments >>>>>> worth bragging about, so I figured I''d ask, just for fun; does >>>>>> anyone know: >>>>>> >>>>>> - the size of the largest working Lustre file system currently in >>>>>> the field >>>>>> - the highest sustained number of IOPS seen with Lustre, and what >>>>>> >>>> the >>>> >>>>>> backend was? >>>>>> >>>>>> cheers, >>>>>> Klaus >>>>>> >>>>>> _______________________________________________ >>>>>> Lustre-discuss mailing list >>>>>> Lustre-discuss at clusterfs.com >>>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at clusterfs.com >>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at clusterfs.com >>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at clusterfs.com >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>> >>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >