Frank Riley
2012-May-04 20:53 UTC
[Lustre-discuss] Not sure how we should configure our RAID arrays (HW limitation)
Hello, We are using Nexsan E18s for our storage systems, and we are in the process of setting them up for Lustre. Each E18 has 18 disks total (max''ed out) in them. According to the Lustre docs, I want to have a stripe width of 1MB. Unfortunately, these E18s have a max stripe size of 128K. As I see it, for RAID6 this leaves us two options: 1) One array 16+2 with a stripe size of 64K for a stripe width of 1MB. I''m hesitant with this option because of the increased chance that we could have more than 2 disks fail. 2) Do two arrays 7+2 with a stripe size of 128K for a stripe width of 896K. I''d then modify the max_pages_per_rpc tunable to match the 896K. I''m not sure what to do with the flex_bg filesystem option since it has to be a power of 2. What is the better option here? Or is there an option I''m missing? I''ve pretty much ruled out RAID5 arrays at 8+1 due to data loss risk, and RAID1+0 wastes too much disk for our use. Thank you, Frank
Kevin Van Maren
2012-May-04 21:05 UTC
[Lustre-discuss] Not sure how we should configure our RAID arrays (HW limitation)
On May 4, 2012, at 2:53 PM, Frank Riley wrote:> Hello, > > We are using Nexsan E18s for our storage systems, and we are in the process of setting them up for Lustre. Each E18 has 18 disks total (max''ed out) in them. According to the Lustre docs, I want to have a stripe width of 1MB. Unfortunately, these E18s have a max stripe size of 128K. As I see it, for RAID6 this leaves us two options: > > 1) One array 16+2 with a stripe size of 64K for a stripe width of 1MB. I''m hesitant with this option because of the increased chance that we could have more than 2 disks fail. > > 2) Do two arrays 7+2 with a stripe size of 128K for a stripe width of 896K. I''d then modify the max_pages_per_rpc tunable to match the 896K. I''m not sure what to do with the flex_bg filesystem option since it has to be a power of 2.Note that you need to set the stripe size to match 896K, as otherwise you will send 896 and 128KB to each OST. Additional tuning of the mkfs options is also necessary so that the file system understands the layout (see -E in the Lustre manual), as otherwise all the block allocations will start mid-stripe. This is not ideal for applications that expect Po2 sizes to be optimal.> > What is the better option here? Or is there an option I''m missing? I''ve pretty much ruled out RAID5 arrays at 8+1 due to data loss risk, and RAID1+0 wastes too much disk for our use.8+1 is the best option from a Lustre performance standpoint. You should get better performance from 2 7+2 arrays than with a 16+2 simply because you can have twice the number of independent IOs. How about doing 3 4+2 RAIDs? 12 usable disks, instead of 14 or 16, but still better than 8 with RAID1. Doing 4*128KB, resulting in 2 full-stripe writes for each 1MB IO is not that bad. Kevin This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor.
Frank Riley
2012-May-04 21:14 UTC
[Lustre-discuss] Not sure how we should configure our RAID arrays (HW limitation)
> How about doing 3 4+2 RAIDs? 12 usable disks, instead of 14 or 16, but still > better than 8 with RAID1. Doing 4*128KB, resulting in 2 full-stripe writes for > each 1MB IO is not that bad.Yes, of course. I had thought of this option earlier but forgot to include it. Thanks for reminding me. So using a stripe width of 512K will not harm performance that much? Note also that the E18s have two active/active controllers in them so that means one controller will be handling I/O requests for 2 arrays, which will reduce performance somewhat. Would this affect your decision between 3 4+2 (512K) or 2 7+2 (896K)?
Kevin Van Maren
2012-May-07 14:48 UTC
[Lustre-discuss] Not sure how we should configure our RAID arrays (HW limitation)
The 512K stripe size should be fine for Lustre, and 128KB per disk is enough to get good performance from the underlying hard drive. I don''t know anything about the E18s beyond what you''ve posted, so I can''t guess which configuration is more optimal, so I would suggest you create the RAID arrays, format the LUNs for Lustre, and run the Lustre iokit and see how the various configurations perform (3 * 4+2, 2 * 8+1, 2 * 7+2). Then please post results (with mkfs, etc command lines) here so others can benefit from your experiments and/or suggest additional tunings. Kevin On May 4, 2012, at 3:14 PM, Frank Riley wrote:>> How about doing 3 4+2 RAIDs? 12 usable disks, instead of 14 or 16, but still >> better than 8 with RAID1. Doing 4*128KB, resulting in 2 full-stripe writes for >> each 1MB IO is not that bad. > > Yes, of course. I had thought of this option earlier but forgot to include it. Thanks for reminding me. So using a stripe width of 512K will not harm performance that much? Note also that the E18s have two active/active controllers in them so that means one controller will be handling I/O requests for 2 arrays, which will reduce performance somewhat. Would this affect your decision between 3 4+2 (512K) or 2 7+2 (896K)? > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussConfidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited. Email addresses that end with a ?-c? identify the sender as a Fusion-io contractor.