I?m looking for advice and considerations on how to optimally setup and deploy an NFS-based home directory server. In particular: (1) how to determine hardware requirements, and (2) how to best setup and configure the server. We actually have a system in place, but the performance is pretty bad---the users often experience a fair amount of lag (1--5 seconds) when doing anything on their home directories, including an ?ls? or writing a small text file. So now I?m trying to back-up and determine, is it simply a configuration issue, or is the hardware inadequate? Our scenario: we have about 25 users, mostly software developers and analysts. The users login to one or more of about 40 development servers. All users? home directories live on a single server (no login except root); that server does an NFSv4 export which is mounted by all dev servers. The home directory server hardware is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5? 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options ?rw,data=journal,usrquota?. I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for the home directories. I?m fairly certain I did not align the partitions optimally: [root at lnxutil1 ~]# parted -s /dev/sda unit s print Model: DELL PERC H700 (scsi) Disk /dev/sda: 134217599s Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 63s 465884s 465822s primary ext2 boot 2 465885s 134207009s 133741125s primary lvm [root at lnxutil1 ~]# parted -s /dev/sdb unit s print Model: DELL PERC H700 (scsi) Disk /dev/sdb: 5720768639s Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 34s 5720768606s 5720768573s lvm Can anyone confirm that the partitions are not aligned correctly, as I suspect? If this is true, is there any way to *quantify* the effects of partition mis-alignment on performance? In other words, what kind of improvement could I expect if I rebuilt this server with the partitions aligned optimally? In general, what is the best way to determine the source of our performance issues? Right now, I?m running ?iostat -dkxt 30? re-directed to a file. I intend to let this run for a day or so, and write a script to produce some statistics. Here is one iteration from the iostat process: Time: 09:37:28 AM Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 44.09 0.03 107.76 0.13 607.40 11.27 0.89 8.27 7.27 78.35 sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 sdb1 0.00 2616.53 0.67 157.88 2.80 11098.83 140.04 8.57 54.08 4.21 66.68 dm-0 0.00 0.00 0.03 151.82 0.13 607.26 8.00 1.25 8.23 5.16 78.35 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 dm-3 0.00 0.00 0.67 2774.84 2.80 11099.37 8.00 474.30 170.89 0.24 66.84 What I observe, is that whenever sdb (home directory partition) becomes loaded, sda (OS) often does as well. Why is this? I would expect sda to generally be idle, or have minimal utilization. According to both ?free? and ?vmstat?, this server is not swapping at all. At one point, our problems were due to a random user writing a huge file to their home directory. We built a second server specifically for people to use for writing large temporary files. Furthermore, for all the dev servers, I used the following tc commands to rate limit how quickly any one server can write to the home directory server (8 Mbps or 1 MB/s): ETH_IFACE=$( route -n | grep "^0.0.0.0" | awk '{ print $8 }' ) IFACE_RATE=1000mbit LIMIT_RATE=8mbit TARGET_IP=1.2.3.4 # home directory server IP tc qdisc add dev $ETH_IFACE root handle 1: htb default 1 tc class add dev $ETH_IFACE parent 1: classid 1:1 htb rate $IFACE_RATE ceil $IFACE_RATE tc class add dev $ETH_IFACE parent 1: classid 1:2 htb rate $LIMIT_RATE ceil $LIMIT_RATE tc filter add dev $ETH_IFACE parent 1: protocol ip prio 16 u32 match ip dst $TARGET_IP flowid 1:2 The other interesting thing is that the second server I mentioned?the one specifically designed for users to ?torture??shows very low IO utilization, practically never going above 10%. That server is fairly different though: dual E5-2340 CPUs (more cores, but lower clock), 32 GB RAM. Disk subsystem is Dell PERC 710 (LSI MegaRAID SAS 2208), and drives are 7200 RPM 1GB (SEAGATE ST1000NM0001) in RAID-6. The OS is CentOS 6.3, NFS partition is ext4 with options ?rw,relatime,barrier=1,data=ordered,usrquota?. Ultimately, I plan to rebuild the home directory server with CentOS 6 (instead of 5), and align the partitions properly. But as of now, I don?t have a rational reason for doing that other than the other server with this config doesn?t have performance problems. I?d like to be able to say specifically (i.e. quantify) exactly where the problems are and how they will be addressed by the upgrade/config change. I?ll add that we want to use the ?sec=krb5p? (i.e. encrypt everything) mount option for the home directories. We tried that with the home directory server, and it became virtually unusable. But we use that option on the other server, with no issue. For now, as a stop-gap, we are just using the ?sec=krb5? mount option (i.e., no encryption). The server is still laggy, but at least usable. Here is the output of ?nfsstat ?v? on the home directory server: [root at lnxutil1 ~]# nfsstat -v Server packet stats: packets udp tcp tcpconn 12560989 0 12544002 17146 Server rpc stats: calls badcalls badclnt badauth xdrcall 12516995 922 0 922 0 Server reply cache: hits misses nocache 0 0 12512316 Server file handle cache: lookup anon ncachedir ncachedir stale 0 0 0 0 160 Server nfs v4: null compound 86 0% 12516096 99% Server nfs v4 operations: op0-unused op1-unused op2-future access close commit 0 0% 0 0% 0 0% 449630 1% 1131528 2% 191998 0% create delegpurge delegreturn getattr getfh link 2053 0% 0 0% 62931 0% 11210081 29% 1638995 4% 275 0% lock lockt locku lookup lookup_root nverify 196 0% 0 0% 196 0% 557606 1% 0 0% 0 0% open openattr open_conf open_dgrd putfh putpubfh 1274780 3% 0 0% 72561 0% 618 0% 12357089 32% 0 0% putrootfh read readdir readlink remove rename 160 0% 1548999 4% 44760 0% 625 0% 140946 0% 4229 0% renew restorefh savefh secinfo setattr setcltid 134103 0% 1157086 3% 1281276 3% 0 0% 133212 0% 143 0% setcltidconf verify write rellockowner 113 0% 0 0% 4896102 12% 196 0% Let me know if I can provide any more useful information. Thanks in advance for any pointers!
m.roth at 5-cent.us
2012-Dec-10 18:50 UTC
[CentOS] home directory server performance issues
Matt Garman wrote:> I?m looking for advice and considerations on how to optimally setup > and deploy an NFS-based home directory server. In particular: (1) how > to determine hardware requirements, and (2) how to best setup and > configure the server. We actually have a system in place, but the > performance is pretty bad---the users often experience a fair amount > of lag (1--5 seconds) when doing anything on their home directories, > including an ?ls? or writing a small text file. > > So now I?m trying to back-up and determine, is it simply a > configuration issue, or is the hardware inadequate?<snip> Without poring over your info, let me give you something that bit us here: our home directory servers are all 5.x (in this case, 5.8). Here's the reason: when we tried 6.x, if you were in an NFS-mounted directory, working from the same, or another NFS-mounted directory, it was *slow*. Unzipping a file that was about 120M or so took 6.5-7 *minutes*, as opposed to 1 min. After extensive testing (the numbers are still on our whiteboard here, from when I did it many months ago), it didn't seem to matter what the workstation was running, but it did matter what the NFS server was. You *can* solve it by changing from sync to async... if you're not worried about possible data loss or corruption. We do have to worry, since in some cases, our researchers might be dumping many gigs of data into their home directories from a job that's been running for days, and no one wants to rerun that. mark
On Mon, Dec 10, 2012 at 6:37 PM, Matt Garman <matthew.garman at gmail.com> wrote:> I?m looking for advice and considerations on how to optimally setup > and deploy an NFS-based home directory server. In particular: (1) how > to determine hardware requirements, and (2) how to best setup and > configure the server. We actually have a system in place, but the > performance is pretty bad---the users often experience a fair amount > of lag (1--5 seconds) when doing anything on their home directories, > including an ?ls? or writing a small text file.I know this is the centos forum, however, if you are still in a testing fase, then I can recommend you try solaris derivatives like nexenta or omnios. The NFS server performance in linux is simple not the same as on those using the same hardware. You get too true acls (no posix, but nfsv4 acls, comparable to those in ntfs), deduplication, compression, and snapshots (ZFS!). Nexenta is free as in beer up to 18TB and has a great web interface, omnios is just free but you need to know how to use solaris. If you stay with the linux nfs servers, look into the io scheduler setting of the disks. I managed to double the performance of a proliant raid controller (don't remember which model, sorry) by changing the standard cfq to noop. Shortly after that I came across nexenta and moved all our NFS loads there. Later we got a netapp cluster, but the nexenta filers are still kicking around. -- groet, natxo
On Mon, Dec 10, 2012 at 11:37:50AM -0600, Matt Garman wrote:> OS is CentOS 5.6, home directory partition is ext3, with options > ?rw,data=journal,usrquota?.Is the data=journal option really wanted here? Did you try with the other journalling modes available? I also think you are missing the noatime option here. The wiki has some information about raid math and ext3 journalling modes: http://wiki.centos.org/HowTos/Disk_Optimization> At one point, our problems were due to a random user writing a huge > file to their home directory.This is the case in data=journal mode; the server has to write the data twice on disk. -- Nicolas
From: Matt Garman <matthew.garman at gmail.com>> I?m fairly certain I did not align the partitions optimally: > Number? Start??? End??????? Size??????? Type??? File system? Flags > 1????? 63s????? 465884s??? 465822s??? primary? ext2??????? boot > 2????? 465885s? 134207009s? 133741125s? primary????????????? lvm > Number? Start? End????????? Size??????? File system? Name? Flags > 1????? 34s??? 5720768606s? 5720768573s??????????????????? lvm > Can anyone confirm that the partitions are not aligned correctly, as I > suspect?? If this is true, is there any way to *quantify* the effects > of partition mis-alignment on performance?? In other words, what kind > of improvement could I expect if I rebuilt this server with the > partitions aligned optimally?They indeed do not look like aligned... First, I am no expert but: At one point , the minimum to do was to at least start on 64 instead of 63. Now, if you add RAID stripes, 4k disks... it is more complicated. https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newstorage-iolimits.html You can see the effects on non alignment by looking at such images: http://www.ateamsystems.com/blog/FreeBSD-Partition-Alignment-RAID-SSD-4k-Drive Formatting also takes alignment parameters.? By example, stride and stripe-width for ext... JD
On Mon, Dec 10, 2012 at 9:37 AM, Matt Garman <matthew.garman at gmail.com>wrote:> I?m looking for advice and considerations on how to optimally setup > and deploy an NFS-based home directory server. In particular: (1) how > to determine hardware requirements, and (2) how to best setup and > configure the server. We actually have a system in place, but the > performance is pretty bad---the users often experience a fair amount > of lag (1--5 seconds) when doing anything on their home directories, > including an ?ls? or writing a small text file. >Just going to throw this out there. What is RPCNFSDCOUNT in /etc/sysconfig/nfs? -- Dan Young
On 2012-12-11, Dan Young <danielmyoung at gmail.com> wrote:> On Mon, Dec 10, 2012 at 9:37 AM, Matt Garman <matthew.garman at gmail.com>wrote: > >> I?m looking for advice and considerations on how to optimally setup >> and deploy an NFS-based home directory server. In particular: (1) how >> to determine hardware requirements, and (2) how to best setup and >> configure the server. We actually have a system in place, but the >> performance is pretty bad---the users often experience a fair amount >> of lag (1--5 seconds) when doing anything on their home directories, >> including an ?ls? or writing a small text file. > > Just going to throw this out there. What is RPCNFSDCOUNT in > /etc/sysconfig/nfs?I was also bit by this issue after a recent migration. The default in CentOS 6 is 8, which was too small even for my group, which has only 10 or so NFS clients, and only a handful active at any one time. It is easy to change the number of nfsd kernel threads on the fly: just do rpc.nfsd NN where NN is the number of threads you want. The kernel will adjust the number of running threads on the fly. If that solves your performance issue, then you can adjust RPCNFSDCOUNT accordingly. --keith -- kkeller at wombat.san-francisco.ca.us
On 12/10/2012 09:37 AM, Matt Garman wrote:> In particular: (1) how > to determine hardware requirementsThat may be difficult at this point, because you really want to start by measuring the number of IOPS. That's difficult to do if your applications demand more than your hardware currently provices.> -the users often experience a fair amount > of lag (1--5 seconds) when doing anything on their home directories, > including an ?ls? or writing a small text file.This might not be the result of your NFS server performance. You might actually be seeing bad performance in your directory service. What are you using for that service? LDAP? NIS? Are you running nscd or sssd on the clients?> There are eight 15k 2.5? 600 GB > drives (Seagate ST3600057SS) configured in hardware RAID-6 with a > single hot spare. RAID controller is a Dell PERC H700 w/512MB cache > (Linux sees this as a LSI MegaSAS 9260).RAID 6 is good for $/GB, but bad for performance. If you find that your performance is bad, RAID10 will offer you a lot more IOPS. Mixing 15k drives with RAID-6 is probably unusual. Typically 15k drives are used when the system needs maximum IOPS, and RAID-6 is used when storage capacity is more important than performance. It's also unusual to see a RAID-6 array with a hot spare. You already have two disks of parity. At this point, your available storage capacity is only 600GB greater than a RAID-10 configuration, but your performance is MUCH worse.> OS is CentOS 5.6, home > directory partition is ext3, with options ?rw,data=journal,usrquota?.data=journal actually offers better performance than the default in some workloads, but not all. You should try the default and see which is better. With a hardware RAID controller that has battery backed write cache, data=journal should not perform any better than the default, but probably not any worse.> I have the HW RAID configured to present two virtual disks to the OS: > /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for > the home directories. I?m fairly certain I did not align the > partitions optimally:If your drives are really 4k sectors, rather than the reported 512B, then they're not optimal and writes will suffer. The best policy is to start your first partition at 1M offset. parted should be aligning things well if it's updated, but if your partition sizes (in sectors) are divisible by 8, you should be in good shape.> Here is one iteration from the iostat process: > > Time: 09:37:28 AM > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz > avgqu-sz await svctm %util > sda 0.00 44.09 0.03 107.76 0.13 607.40 > 11.27 0.89 8.27 7.27 78.35 > sdb 0.00 2616.53 0.67 157.88 2.80 11098.83 > 140.04 8.57 54.08 4.21 66.68If that's normal, you need a faster array configuration. That iteration caught both disks with a very high % of maximum utilization. Consider using RAID-10.> What I observe, is that whenever sdb (home directory partition) > becomes loaded, sda (OS) often does as well. Why is this?Regardless of what you export to the OS, if the RAID controller really only has one big RAID-6 array, you'd expect saturation of either OS disk to affect both.
On Tue, Dec 11, 2012 at 2:24 PM, Dan Young <danielmyoung at gmail.com> wrote:> Just going to throw this out there. What is RPCNFSDCOUNT in > /etc/sysconfig/nfs?It was 64 (upped from the default of... 8 I think).