thr3ads.net - CentOS - [CentOS] home directory server performance issues [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Matt Garman

2012-Dec-10 17:37 UTC

[CentOS] home directory server performance issues

I?m looking for advice and considerations on how to optimally setup
and deploy an NFS-based home directory server.  In particular: (1) how
to determine hardware requirements, and (2) how to best setup and
configure the server.  We actually have a system in place, but the
performance is pretty bad---the users often experience a fair amount
of lag (1--5 seconds) when doing anything on their home directories,
including an ?ls? or writing a small text file.

So now I?m trying to back-up and determine, is it simply a
configuration issue, or is the hardware inadequate?

Our scenario: we have about 25 users, mostly software developers and
analysts.  The users login to one or more of about 40 development
servers.  All users? home directories live on a single server (no
login except root); that server does an NFSv4 export which is mounted
by all dev servers.  The home directory server hardware is a Dell R510
with dual E5620 CPUs and 8 GB RAM.  There are eight 15k 2.5? 600 GB
drives (Seagate ST3600057SS) configured in hardware RAID-6 with a
single hot spare.  RAID controller is a Dell PERC H700 w/512MB cache
(Linux sees this as a LSI MegaSAS 9260).  OS is CentOS 5.6, home
directory partition is ext3, with options ?rw,data=journal,usrquota?.

I have the HW RAID configured to present two virtual disks to the OS:
/dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for
the home directories.  I?m fairly certain I did not align the
partitions optimally:

[root at lnxutil1 ~]# parted -s /dev/sda unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sda: 134217599s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start    End         Size        Type     File system  Flags
 1      63s      465884s     465822s     primary  ext2         boot
 2      465885s  134207009s  133741125s  primary               lvm

[root at lnxutil1 ~]# parted -s /dev/sdb unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sdb: 5720768639s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name  Flags
 1      34s    5720768606s  5720768573s                     lvm


Can anyone confirm that the partitions are not aligned correctly, as I
suspect?  If this is true, is there any way to *quantify* the effects
of partition mis-alignment on performance?  In other words, what kind
of improvement could I expect if I rebuilt this server with the
partitions aligned optimally?

In general, what is the best way to determine the source of our
performance issues?  Right now, I?m running ?iostat -dkxt 30?
re-directed to a file.  I intend to let this run for a day or so, and
write a script to produce some statistics.

Here is one iteration from the iostat process:

Time: 09:37:28 AM
Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00    44.09  0.03 107.76     0.13   607.40
11.27     0.89    8.27   7.27  78.35
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00    44.09  0.03 107.76     0.13   607.40
11.27     0.89    8.27   7.27  78.35
sdb               0.00  2616.53  0.67 157.88     2.80 11098.83
140.04     8.57   54.08   4.21  66.68
sdb1              0.00  2616.53  0.67 157.88     2.80 11098.83
140.04     8.57   54.08   4.21  66.68
dm-0              0.00     0.00  0.03 151.82     0.13   607.26
8.00     1.25    8.23   5.16  78.35
dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  0.67 2774.84     2.80 11099.37
8.00   474.30  170.89   0.24  66.84
dm-3              0.00     0.00  0.67 2774.84     2.80 11099.37
8.00   474.30  170.89   0.24  66.84


What I observe, is that whenever sdb (home directory partition)
becomes loaded, sda (OS) often does as well.  Why is this?  I would
expect sda to generally be idle, or have minimal utilization.
According to both ?free? and ?vmstat?, this server is not swapping at
all.

At one point, our problems were due to a random user writing a huge
file to their home directory.  We built a second server specifically
for people to use for writing large temporary files.  Furthermore, for
all the dev servers, I used the following tc commands to rate limit
how quickly any one server can write to the home directory server (8
Mbps or 1 MB/s):

ETH_IFACE=$( route -n | grep "^0.0.0.0" | awk '{ print $8 }' )
IFACE_RATE=1000mbit
LIMIT_RATE=8mbit
TARGET_IP=1.2.3.4 # home directory server IP
tc qdisc add dev $ETH_IFACE root handle 1: htb default 1
tc class add dev $ETH_IFACE parent 1: classid 1:1 htb rate $IFACE_RATE
ceil $IFACE_RATE
tc class add dev $ETH_IFACE parent 1: classid 1:2 htb rate $LIMIT_RATE
ceil $LIMIT_RATE
tc filter add dev $ETH_IFACE parent 1: protocol ip prio 16 u32 match
ip dst $TARGET_IP flowid 1:2

The other interesting thing is that the second server I mentioned?the
one specifically designed for users to ?torture??shows very low IO
utilization, practically never going above 10%.  That server is fairly
different though: dual E5-2340 CPUs (more cores, but lower clock), 32
GB RAM.  Disk subsystem is Dell PERC 710 (LSI MegaRAID SAS 2208), and
drives are 7200 RPM 1GB (SEAGATE ST1000NM0001) in RAID-6.  The OS is
CentOS 6.3, NFS partition is ext4 with options
?rw,relatime,barrier=1,data=ordered,usrquota?.

Ultimately, I plan to rebuild the home directory server with CentOS 6
(instead of 5), and align the partitions properly.  But as of now, I
don?t have a rational reason for doing that other than the other
server with this config doesn?t have performance problems.  I?d like
to be able to say specifically (i.e. quantify) exactly where the
problems are and how they will be addressed by the upgrade/config
change.

I?ll add that we want to use the ?sec=krb5p? (i.e. encrypt everything)
mount option for the home directories.  We tried that with the home
directory server, and it became virtually unusable.  But we use that
option on the other server, with no issue.  For now, as a stop-gap, we
are just using the ?sec=krb5? mount option (i.e., no encryption).  The
server is still laggy, but at least usable.

Here is the output of ?nfsstat ?v? on the home directory server:
[root at lnxutil1 ~]# nfsstat -v
Server packet stats:
packets    udp        tcp        tcpconn
12560989   0          12544002   17146

Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
12516995   922        0          922        0

Server reply cache:
hits       misses     nocache
0          0          12512316

Server file handle cache:
lookup     anon       ncachedir  ncachedir  stale
0          0          0          0          160

Server nfs v4:
null         compound
86        0% 12516096 99%

Server nfs v4 operations:
op0-unused   op1-unused   op2-future   access       close        commit
0         0% 0         0% 0         0% 449630    1% 1131528   2% 191998    0%
create       delegpurge   delegreturn  getattr      getfh        link
2053      0% 0         0% 62931     0% 11210081 29% 1638995   4% 275       0%
lock         lockt        locku        lookup       lookup_root  nverify
196       0% 0         0% 196       0% 557606    1% 0         0% 0         0%
open         openattr     open_conf    open_dgrd    putfh        putpubfh
1274780   3% 0         0% 72561     0% 618       0% 12357089 32% 0         0%
putrootfh    read         readdir      readlink     remove       rename
160       0% 1548999   4% 44760     0% 625       0% 140946    0% 4229      0%
renew        restorefh    savefh       secinfo      setattr      setcltid
134103    0% 1157086   3% 1281276   3% 0         0% 133212    0% 143       0%
setcltidconf verify       write        rellockowner
113       0% 0         0% 4896102  12% 196       0%


Let me know if I can provide any more useful information.  Thanks in
advance for any pointers!

m.roth at 5-cent.us

2012-Dec-10 18:50 UTC

head link

[CentOS] home directory server performance issues

Matt Garman wrote:> I?m looking for advice and considerations on how to optimally setup
> and deploy an NFS-based home directory server.  In particular: (1) how
> to determine hardware requirements, and (2) how to best setup and
> configure the server.  We actually have a system in place, but the
> performance is pretty bad---the users often experience a fair amount
> of lag (1--5 seconds) when doing anything on their home directories,
> including an ?ls? or writing a small text file.
>
> So now I?m trying to back-up and determine, is it simply a
> configuration issue, or is the hardware inadequate?<snip>
Without poring over your info, let me give you something that bit us here:
our home directory servers are all 5.x (in this case, 5.8). Here's the
reason: when we tried 6.x, if you were in an NFS-mounted directory,
working from the same, or another NFS-mounted directory, it was *slow*.
Unzipping a file that was about 120M or so took 6.5-7 *minutes*, as
opposed to 1 min. After extensive testing (the numbers are still on our
whiteboard here, from when I did it many months ago), it didn't seem to
matter what the workstation was running, but it did matter what the NFS
server was. You *can* solve it by changing from sync to async... if you're
not worried about possible data loss or corruption. We do have to worry,
since in some cases, our researchers might be dumping many gigs of data
into their home directories from a job that's been running for days, and
no one wants to rerun that.

    mark

Natxo Asenjo

2012-Dec-10 19:28 UTC

head link

[CentOS] home directory server performance issues

On Mon, Dec 10, 2012 at 6:37 PM, Matt Garman <matthew.garman at gmail.com>
wrote:> I?m looking for advice and considerations on how to optimally setup
> and deploy an NFS-based home directory server.  In particular: (1) how
> to determine hardware requirements, and (2) how to best setup and
> configure the server.  We actually have a system in place, but the
> performance is pretty bad---the users often experience a fair amount
> of lag (1--5 seconds) when doing anything on their home directories,
> including an ?ls? or writing a small text file.
I know this is the centos forum, however, if you are still in a
testing fase, then I can recommend you try solaris derivatives like
nexenta or omnios. The NFS server performance in linux is simple not
the same as on those using the same hardware. You get too true acls
(no posix, but nfsv4 acls, comparable to those in ntfs),
deduplication, compression, and snapshots (ZFS!).

Nexenta is free as in beer up to 18TB and has a great web interface,
omnios is just free but you need to know how to use solaris.

If you stay with the linux nfs servers, look into the io scheduler
setting of the disks. I managed to double the performance of a
proliant raid controller (don't remember which model, sorry) by
changing the standard cfq to noop. Shortly after that I came across
nexenta and moved all our NFS loads there. Later we got a netapp
cluster, but the nexenta filers are still kicking around.

-- 
groet,
natxo

Nicolas KOWALSKI

2012-Dec-11 07:58 UTC

head link

[CentOS] home directory server performance issues

On Mon, Dec 10, 2012 at 11:37:50AM -0600, Matt Garman
wrote:> OS is CentOS 5.6, home directory partition is ext3, with options 
> ?rw,data=journal,usrquota?.
Is the data=journal option really wanted here? Did you try with the 
other journalling modes available? I also think you are missing the 
noatime option here.

The wiki has some information about raid math and ext3 journalling 
modes: http://wiki.centos.org/HowTos/Disk_Optimization

> At one point, our problems were due to a random user writing a huge
> file to their home directory.  
This is the case in data=journal mode; the server has to write the data 
twice on disk.

-- 
Nicolas

John Doe

2012-Dec-11 09:48 UTC

head link

[CentOS] home directory server performance issues

From: Matt Garman <matthew.garman at gmail.com>> I?m fairly certain I did not align the partitions optimally:
> Number? Start??? End??????? Size??????? Type??? File system? Flags
> 1????? 63s????? 465884s??? 465822s??? primary? ext2??????? boot
> 2????? 465885s? 134207009s? 133741125s? primary????????????? lvm
> Number? Start? End????????? Size??????? File system? Name? Flags
> 1????? 34s??? 5720768606s? 5720768573s??????????????????? lvm
> Can anyone confirm that the partitions are not aligned correctly, as I
> suspect?? If this is true, is there any way to *quantify* the effects
> of partition mis-alignment on performance?? In other words, what kind
> of improvement could I expect if I rebuilt this server with the
> partitions aligned optimally?
They indeed do not look like aligned...
First, I am no expert but:
At one point , the minimum to do was to at least start on 64 instead of 63.
Now, if you add RAID stripes, 4k disks... it is more complicated.
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newstorage-iolimits.html

You can see the effects on non alignment by looking at such images:
http://www.ateamsystems.com/blog/FreeBSD-Partition-Alignment-RAID-SSD-4k-Drive

Formatting also takes alignment parameters.? By example, stride and stripe-width
for ext...

JD

Dan Young

2012-Dec-11 20:24 UTC

head link

[CentOS] home directory server performance issues

On Mon, Dec 10, 2012 at 9:37 AM, Matt Garman <matthew.garman at
gmail.com>wrote:
> I?m looking for advice and considerations on how to optimally setup
> and deploy an NFS-based home directory server.  In particular: (1) how
> to determine hardware requirements, and (2) how to best setup and
> configure the server.  We actually have a system in place, but the
> performance is pretty bad---the users often experience a fair amount
> of lag (1--5 seconds) when doing anything on their home directories,
> including an ?ls? or writing a small text file.
>
Just going to throw this out there. What is RPCNFSDCOUNT in
/etc/sysconfig/nfs?

-- 
Dan Young

Keith Keller

2012-Dec-11 22:05 UTC

head link

[CentOS] home directory server performance issues

On 2012-12-11, Dan Young <danielmyoung at gmail.com>
wrote:> On Mon, Dec 10, 2012 at 9:37 AM, Matt Garman <matthew.garman at
gmail.com>wrote:
>
>> I?m looking for advice and considerations on how to optimally setup
>> and deploy an NFS-based home directory server.  In particular: (1) how
>> to determine hardware requirements, and (2) how to best setup and
>> configure the server.  We actually have a system in place, but the
>> performance is pretty bad---the users often experience a fair amount
>> of lag (1--5 seconds) when doing anything on their home directories,
>> including an ?ls? or writing a small text file.
>
> Just going to throw this out there. What is RPCNFSDCOUNT in
> /etc/sysconfig/nfs?
I was also bit by this issue after a recent migration.  The default in
CentOS 6 is 8, which was too small even for my group, which has only 10
or so NFS clients, and only a handful active at any one time.

It is easy to change the number of nfsd kernel threads on the fly: just
do

rpc.nfsd NN

where NN is the number of threads you want.  The kernel will adjust the
number of running threads on the fly.  If that solves your performance
issue, then you can adjust RPCNFSDCOUNT accordingly.

--keith


-- 
kkeller at wombat.san-francisco.ca.us

Gordon Messmer

2012-Dec-12 06:29 UTC

head link

[CentOS] home directory server performance issues

On 12/10/2012 09:37 AM, Matt Garman wrote:> In particular: (1) how
> to determine hardware requirements
That may be difficult at this point, because you really want to start by 
measuring the number of IOPS.  That's difficult to do if your 
applications demand more than your hardware currently provices.
> -the users often experience a fair amount
> of lag (1--5 seconds) when doing anything on their home directories,
> including an ?ls? or writing a small text file.
This might not be the result of your NFS server performance.  You might 
actually be seeing bad performance in your directory service.  What are 
you using for that service?  LDAP?  NIS?  Are you running nscd or sssd 
on the clients?
> There are eight 15k 2.5? 600 GB
> drives (Seagate ST3600057SS) configured in hardware RAID-6 with a
> single hot spare.  RAID controller is a Dell PERC H700 w/512MB cache
> (Linux sees this as a LSI MegaSAS 9260).
RAID 6 is good for $/GB, but bad for performance.  If you find that your 
performance is bad, RAID10 will offer you a lot more IOPS.

Mixing 15k drives with RAID-6 is probably unusual.  Typically 15k drives 
are used when the system needs maximum IOPS, and RAID-6 is used when 
storage capacity is more important than performance.

It's also unusual to see a RAID-6 array with a hot spare.  You already 
have two disks of parity.  At this point, your available storage 
capacity is only 600GB greater than a RAID-10 configuration, but your 
performance is MUCH worse.
> OS is CentOS 5.6, home
> directory partition is ext3, with options ?rw,data=journal,usrquota?.
data=journal actually offers better performance than the default in some 
workloads, but not all.  You should try the default and see which is 
better.  With a hardware RAID controller that has battery backed write 
cache, data=journal should not perform any better than the default, but 
probably not any worse.
> I have the HW RAID configured to present two virtual disks to the OS:
> /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for
> the home directories.  I?m fairly certain I did not align the
> partitions optimally:
If your drives are really 4k sectors, rather than the reported 512B, 
then they're not optimal and writes will suffer.  The best policy is to 
start your first partition at 1M offset.  parted should be aligning 
things well if it's updated, but if your partition sizes (in sectors) 
are divisible by 8, you should be in good shape.
> Here is one iteration from the iostat process:
>
> Time: 09:37:28 AM
> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00    44.09  0.03 107.76     0.13   607.40
> 11.27     0.89    8.27   7.27  78.35
> sdb               0.00  2616.53  0.67 157.88     2.80 11098.83
> 140.04     8.57   54.08   4.21  66.68
If that's normal, you need a faster array configuration.  That iteration 
caught both disks with a very high % of maximum utilization.  Consider 
using RAID-10.
> What I observe, is that whenever sdb (home directory partition)
> becomes loaded, sda (OS) often does as well.  Why is this?
Regardless of what you export to the OS, if the RAID controller really 
only has one big RAID-6 array, you'd expect saturation of either OS disk 
to affect both.

Matt Garman

2012-Dec-12 19:29 UTC

head link

[CentOS] home directory server performance issues

On Tue, Dec 11, 2012 at 2:24 PM, Dan Young <danielmyoung at gmail.com>
wrote:> Just going to throw this out there. What is RPCNFSDCOUNT in
> /etc/sysconfig/nfs?
It was 64 (upped from the default of... 8 I think).

Seemingly Similar Threads

Search for more reasonably related threads

CentOS - Dec 2012 - home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

[CentOS] home directory server performance issues

Seemingly Similar Threads