We are doing some testing, For a OST with a xserve raid connected to linux, is it better to not have a partition table mkfs.lustre /dev/sda or to have a partition? fdisk /dev/sda mkfs.lustre /dev/sda1 Thank you :-) Brock Palen Center for Advanced Computing brockp at umich.edu (734)936-1985
My personal opinion would be to create a partition. I''m not sure for the exact logistical reasons but there are some good ones (I just can''t recall them). -Aaron On Oct 17, 2007, at 3:35 PM, Brock Palen wrote:> We are doing some testing, > > For a OST with a xserve raid connected to linux, is it better to not > have a partition table > > mkfs.lustre /dev/sda > > or to have a partition? > fdisk /dev/sda > mkfs.lustre /dev/sda1 > > Thank you :-) > > Brock Palen > Center for Advanced Computing > brockp at umich.edu > (734)936-1985 > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discussAaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071017/8304e8f9/attachment-0002.html
On Oct 17, 2007 15:35 -0400, Brock Palen wrote:> We are doing some testing, > > For a OST with a xserve raid connected to linux, is it better to not > have a partition table > > mkfs.lustre /dev/sda > > or to have a partition? > fdisk /dev/sda > mkfs.lustre /dev/sda1For RAID 5/6 devices we recommend NOT having a partition table. The reason is that the partition table offsets the data partitions by a small amount (512 bytes usually) and this causes writes to span multiple RAID chunks and unnecessary read-modify-write activity. For best performance, pick a RAID chunk size that divides evenly into 1MB (e.g. 4 or 8 data disks + parity). The ldiskfs mballoc code works to align the allocation with the RAID chunk size for best performance. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Oct 18, 2007, at 4:42 AM, Andreas Dilger wrote:> On Oct 17, 2007 15:35 -0400, Brock Palen wrote: >> We are doing some testing, >> >> For a OST with a xserve raid connected to linux, is it better to not >> have a partition table >> >> mkfs.lustre /dev/sda >> >> or to have a partition? >> fdisk /dev/sda >> mkfs.lustre /dev/sda1 > > For RAID 5/6 devices we recommend NOT having a partition table. > The reason > is that the partition table offsets the data partitions by a small > amount > (512 bytes usually) and this causes writes to span multiple RAID > chunks and > unnecessary read-modify-write activity. > > For best performance, pick a RAID chunk size that divides evenly into > 1MB (e.g. 4 or 8 data disks + parity). The ldiskfs mballoc code works > to align the allocation with the RAID chunk size for best performance.Thanks I will keep this in mind. I did some basics test, 1MDS 1OST 1raid5 (half a xserve raid) Using tiobench on 1 client, using no partition table netted about 5MB/s faster for streaming read/write. I will scale up my tests though and try some other raid configurations. Thanks for the help.> > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > >
On Thu, 2007-10-18 at 13:02 -0400, Brock Palen wrote:> > I did some basics test, 1MDS 1OST 1raid5 (half a xserve raid) Using > tiobench on 1 client, using no partition table netted about 5MB/s > faster for streaming read/write. I will scale up my tests though and > try some other raid configurations. Thanks for the help.You could consider using the sgpdd_survey in our iokit. It was written specifically to test raw disk throughput and can cover a large number of concurrent threads and i/o sizes to show you the characteristics of your disk. Some disks, such as DDNs specifically (maybe there are others) are recommended to use without partition tables because the partition table at the beginning of the disk interferes with optimal alignment characteristics of the disk and performance suffers as a result. b.
Just to add a note, that if we do use partitions on the DDN storage then, the partition should be created with first cylinder as 9 and this will make sure that the alignment is optimal. Thanks Anand On Oct 18, 2007, at 10:11 AM, Brian J. Murrell wrote:> On Thu, 2007-10-18 at 13:02 -0400, Brock Palen wrote: >> >> I did some basics test, 1MDS 1OST 1raid5 (half a xserve raid) Using >> tiobench on 1 client, using no partition table netted about 5MB/s >> faster for streaming read/write. I will scale up my tests though and >> try some other raid configurations. Thanks for the help. > > You could consider using the sgpdd_survey in our iokit. It was > written > specifically to test raw disk throughput and can cover a large > number of > concurrent threads and i/o sizes to show you the characteristics of > your > disk. > > Some disks, such as DDNs specifically (maybe there are others) are > recommended to use without partition tables because the partition > table > at the beginning of the disk interferes with optimal alignment > characteristics of the disk and performance suffers as a result. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
On Oct 18, 2007 13:02 -0400, Brock Palen wrote:> On Oct 18, 2007, at 4:42 AM, Andreas Dilger wrote: > >For RAID 5/6 devices we recommend NOT having a partition table. > >The reason > >is that the partition table offsets the data partitions by a small > >amount > >(512 bytes usually) and this causes writes to span multiple RAID > >chunks and > >unnecessary read-modify-write activity. > > > >For best performance, pick a RAID chunk size that divides evenly into > >1MB (e.g. 4 or 8 data disks + parity). The ldiskfs mballoc code works > >to align the allocation with the RAID chunk size for best performance. > > Thanks I will keep this in mind. > > I did some basics test, 1MDS 1OST 1raid5 (half a xserve raid) Using > tiobench on 1 client, using no partition table netted about 5MB/s > faster for streaming read/write. I will scale up my tests though and > try some other raid configurations. Thanks for the help.The other important note - don''t use RAID5 for the MDS if at all possible. It generates largely small, random 4kB IO and is much better served by RAID1 or RAID1+0. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
>Yes :-) the MDS is a software raid 1, Thanks for the tip though.> The other important note - don''t use RAID5 for the MDS if at all > possible. > It generates largely small, random 4kB IO and is much better served by > RAID1 or RAID1+0. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > >
On Thu, 18 Oct 2007, Andreas Dilger wrote:>> We are doing some testing, >> >> For a OST with a xserve raid connected to linux, is it better to not >> have a partition table >> or to have a partition?> For RAID 5/6 devices we recommend NOT having a partition table. The reason > is that the partition table offsets the data partitions by a small amount > (512 bytes usually) and this causes writes to span multiple RAID chunks and > unnecessary read-modify-write activity. > > For best performance, pick a RAID chunk size that divides evenly into > 1MB (e.g. 4 or 8 data disks + parity). The ldiskfs mballoc code works > to align the allocation with the RAID chunk size for best performance.I found http://insights.oetiker.ch/linux/raidoptimization.html a while ago that discusses the alignment issue. I don''t agree with the "Linux Kernel Config Parameters" section, but the rest of the article regarding alignment is OK. The quick summary is: When using raid5/6, use LVM or no partitioning at all. Stay away from PC partition tables. When using hardware raid, use the correct mkfs-parameters to communicate stripe-size info to the FS. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- I used to read books. Now I read .qwk files. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=