thr3ads.net - CentOS virt - [CentOS-virt] SATA vs RAID5 vs VMware [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Philip Gwyn

2009-Sep-25 01:18 UTC

[CentOS-virt] SATA vs RAID5 vs VMware

Hello,

I have strange behaviour on a server that I can't get a handle on.  I have a
reasonably powerful server running VMware server 1.0.4-56528.  It has a RAID5
build with mdadm on 5 SATA drives.  Masses of ram and 2 XEON CPUs.  But it
stutters.  

Example : fire up vi, press and keep finger on i.  After filling 2-3 lines, the
display is stopped for 2-12 seconds.  Then they continue.  This happens even on
the host OS, at the console.

Host system running CentOS 5.2 x86-64:

  CPU : 2x Xeon E5430 @ 2.66GHz
  RAM : 24GB
 Mobo : DSBV-DX
   HD : 5 x SATA ST3750330AS 750GB in RAID5

There are 5 VMs, detailed at http://www.awale.qc.ca/vmware/stj1.txt to make
this mail shorter.

Seems to me this system should be more then adequate to handle the load.  

This is what vmstat on the host looks like when the server is
"unhappy" :
   http://www.awale.qc.ca/vmware/vmstat.txt
Spending a lot of time in 'wa', but 'bo' and 'bi' are
miniscule.

The problem seems like a disk problem.  I grow to suspect that SATA isn't
ready
for the big time.  I also grow to dislike RAID5.

Questions :

- Anyone have a clue or other on how to track down my bottle neck?

- SATA NCQ is limited to 15 queue depth.  Is this per-SATA-port or
  per-SATA-chip? Or does this question make no sense?

- I realise there are more recent versions of CentOS out.  Are there specific
  items in the changelogs that would affect my problem?

Thank you for any help,

-Philip

Luke S Crawford

2009-Sep-25 01:46 UTC

head link

[CentOS-virt] SATA vs RAID5 vs VMware

Philip Gwyn <liste at artware.qc.ca> writes:
> Hello,
> 
> I have strange behaviour on a server that I can't get a handle on.  I
have a
> reasonably powerful server running VMware server 1.0.4-56528.  It has a
RAID5
> build with mdadm on 5 SATA drives.  Masses of ram and 2 XEON CPUs.  But it
> stutters.  

...

> The problem seems like a disk problem.  I grow to suspect that SATA
isn't ready
> for the big time.  I also grow to dislike RAID5.

Personally, I will use RAID5, and I will use SATA, but I will not use 
SATA with RAID5 except in 'tape replacement' roles.  The weak bits of 
RAID5 (read/ write cycle on sub-stripe writes) are often exasterbated 
by the weak bits of SATA (slow seek time, slow rotational speed)  
creating a perfect storm of suck.

Not to say that's your primary problem.

Actually, it sounds a whole lot like the problems I get with xen on 
heavily used servers, if I don't assign a core exclusively to the dom0 
(or at least give it a very high priority.)  But I have little knowledge
of or experience with VMware, so I don't know if you have a 
similar problem.  

-- 
Luke S. Crawford
http://prgmr.com/xen/         -   Hosting for the technically adept
http://nostarch.com/xen.htm   -   We don't assume you are stupid.

Rich

2009-Sep-25 04:02 UTC

head link

[CentOS-virt] SATA vs RAID5 vs VMware

I have been using a 3ware 9690se sata card with raid5.  I have been running
Centos using xen and have had no problems.
I wrote my virtual machines to the raw raid5 drive.  Its seems to have
worked fine for me.

On Thu, Sep 24, 2009 at 9:18 PM, Philip Gwyn <liste at artware.qc.ca>
wrote:
> Hello,
>
> I have strange behaviour on a server that I can't get a handle on.  I
have
> a
> reasonably powerful server running VMware server 1.0.4-56528.  It has a
> RAID5
> build with mdadm on 5 SATA drives.  Masses of ram and 2 XEON CPUs.  But it
> stutters.
>
> Example : fire up vi, press and keep finger on i.  After filling 2-3 lines,
> the
> display is stopped for 2-12 seconds.  Then they continue.  This happens
> even on
> the host OS, at the console.
>
> Host system running CentOS 5.2 x86-64:
>
>  CPU : 2x Xeon E5430 @ 2.66GHz
>  RAM : 24GB
>  Mobo : DSBV-DX
>   HD : 5 x SATA ST3750330AS 750GB in RAID5
>
> There are 5 VMs, detailed at http://www.awale.qc.ca/vmware/stj1.txt to
> make
> this mail shorter.
>
> Seems to me this system should be more then adequate to handle the load.
>
> This is what vmstat on the host looks like when the server is
"unhappy" :
>   http://www.awale.qc.ca/vmware/vmstat.txt
> Spending a lot of time in 'wa', but 'bo' and 'bi'
are miniscule.
>
> The problem seems like a disk problem.  I grow to suspect that SATA
isn't
> ready
> for the big time.  I also grow to dislike RAID5.
>
> Questions :
>
> - Anyone have a clue or other on how to track down my bottle neck?
>
> - SATA NCQ is limited to 15 queue depth.  Is this per-SATA-port or
>  per-SATA-chip? Or does this question make no sense?
>
> - I realise there are more recent versions of CentOS out.  Are there
> specific
>  items in the changelogs that would affect my problem?
>
> Thank you for any help,
>
> -Philip
>
>
> _______________________________________________
> CentOS-virt mailing list
> CentOS-virt at centos.org
> http://lists.centos.org/mailman/listinfo/centos-virt
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.centos.org/pipermail/centos-virt/attachments/20090925/7b1db5e9/attachment-0006.html>

Jeff

2009-Sep-25 04:29 UTC

head link

[CentOS-virt] SATA vs RAID5 vs VMware

On Thu, Sep 24, 2009 at 8:18 PM, Philip Gwyn <liste at artware.qc.ca>
wrote:> Hello,
>
> I have strange behaviour on a server that I can't get a handle on. ?I
have a
> reasonably powerful server running VMware server 1.0.4-56528. ?It has a
RAID5
> build with mdadm on 5 SATA drives. ?Masses of ram and 2 XEON CPUs. ?But it
> stutters.
>
> Example : fire up vi, press and keep finger on i. ?After filling 2-3 lines,
the
> display is stopped for 2-12 seconds. ?Then they continue. ?This happens
even on
> the host OS, at the console.
>
> Host system running CentOS 5.2 x86-64:
>
> ?CPU : 2x Xeon E5430 @ 2.66GHz
> ?RAM : 24GB
> ?Mobo : DSBV-DX
> ? HD : 5 x SATA ST3750330AS 750GB in RAID5
>
> There are 5 VMs, detailed at http://www.awale.qc.ca/vmware/stj1.txt to make
> this mail shorter.
>
> Seems to me this system should be more then adequate to handle the load.
>
> This is what vmstat on the host looks like when the server is
"unhappy" :
> ? http://www.awale.qc.ca/vmware/vmstat.txt
> Spending a lot of time in 'wa', but 'bo' and 'bi'
are miniscule.
>
> The problem seems like a disk problem. ?I grow to suspect that SATA
isn't ready
> for the big time. ?I also grow to dislike RAID5.
>
> Questions :
>
> - Anyone have a clue or other on how to track down my bottle neck?
>
> - SATA NCQ is limited to 15 queue depth. ?Is this per-SATA-port or
> ?per-SATA-chip? Or does this question make no sense?
>
> - I realise there are more recent versions of CentOS out. ?Are there
specific
> ?items in the changelogs that would affect my problem?
VMware Server 1.0.x was never supported on RHEL/CentOS 5.x, especially
as early as 1.0.4. Not that it can't be made to work, but it just
wasn't made for newer kernel versions. We run up to 10 guests in
VMware Server 1.0.9 on a single Xeon quad core with the host running
CentOS 4, SATA hardware RAID 1. Admittedly, our guests are pretty low
CPU, low throughput, but it works just fine for us. If your guests are
not really hammering the disk system, then you may be on a wild goose
chase blaming RAID 5.

In my time on the VMware forums, it was always suggested to use single
CPU guests running non-smp kernels for Server 1.0.x. It might help to
convert the one smp guest you have. If you can afford some down-time,
reconfigure the host to use compatible CentOS/VMware versions
(4.x/1.0.x or 5.x/2.x respectively). At the very least, get the latest
VMware Server 1.0.9.

--
Jeff

Filipe Brandenburger

2009-Sep-25 13:30 UTC

head link

[CentOS-virt] SATA vs RAID5 vs VMware

Hi,

On Thu, Sep 24, 2009 at 21:18, Philip Gwyn <liste at artware.qc.ca>
wrote:> The problem seems like a disk problem.  I grow to suspect that SATA
isn't ready
> for the big time.  I also grow to dislike RAID5.
>
> Questions :
> - Anyone have a clue or other on how to track down my bottle neck?
You can use the command "iostat -kx 1 /dev/sd?" which will give you
more information of what is happening, in particular it will show
%util which will show how often the drive is busy, and you can
correlate that with the rkB/s and wkB/s to see how much data is being
read or written to that specific drive. You also have averages for the
request size (to know if you have many small operations or a few big
ones), queue size, service time and wait time. See "man iostat" for
more details. It's not installed by default on CentOS 5 but it's
available from the base repositories, just run "yum install sysstat"
if you don't have it yet.

If you are using RAID-5 you might want to see if the chunk size you
are using is good. You can specify that when you create a new array
using the "-c" option to mdadm. I don't think you can change that
after it's created. The default is 64kB which sounds sane enough but
you might want to check if yours was created with that value or not.

The problem is basically if you have big operations that are larger
than the chunk size it will require operations on all the disks which
means all of them will have to seek to a specific position to complete
your operation, and while they are doing that they will not be able to
work on any other requests. If you have high usage and random access
the disks will spend a lot of time seeking. If that is the case, you
might want to increase the chunk size so that most operations can be
fulfilled by one disk only so that the others are free to work on
other requests at that time.

On the other hand, if you have specific areas of your filesystem that
are hit more often that fall always on the same disk, that disk will
be used more than the other ones, so your performance will be
effectively limited by that one disk instead of multiplied by the
number of disks due to the striped access. In that case it might make
sense to reduce the chunk size in order to make the access more even
across disks. I read sometime ago that ext2/ext3 has a way of
allocating blocks that will create such unfair distribution when you
are striping across a certain number of disks, I don't know exactly
how that works but you might want to check into that. I remember that
when you create the ext2/ext3 filesystem you can use an option such as
"stride=..." to give a hint on the disk layout so that the filesystem
can disalign those blocks enough to spread the load across the disks.
But I remember I could never exactly figure out what "stride=..."
number would make sense to me... the documentation is kind of scarce
in this area, but check the mke2fs manpage anyway if you have a disk
that is more "hot" than the others and you think that might be the
problem. You can also experiment with other filesystems such as XFS
which is available in the extras repository.

And of course, make sure "cat /proc/mdstat" shows everything OK, make
sure you aren't running a degraded array before you start
investigating its performance.

I'm sure there are performance tunings that can be done with, e.g.,
hdparm, tweaking numbers in /proc and /sys filesystems, or changing
the kernel scheduler, but I'm not really experienced with that so I
couldn't really advise you on that. I'm sure others will have such
experience and will be able to give you pointers on that. You might
want to ask on the main list in that case, instead of the -virt one.

HTH,
Filipe

Benjamin Franz

2009-Sep-25 14:36 UTC

head link

[CentOS-virt] SATA vs RAID5 vs VMware

Philip Gwyn wrote:> Hello,
>
> I have strange behaviour on a server that I can't get a handle on.  I
have a
> reasonably powerful server running VMware server 1.0.4-56528.  It has a
RAID5
> build with mdadm on 5 SATA drives.  Masses of ram and 2 XEON CPUs.  But it
stutters.
>   This will double your memory usage. But it should fix your I/O.

Take a look at http://vmfaq.com/?View=entry&EntryID=25
 
In particular, putting your temporary directory in a ramdisk will 
improve your I/O profile immensely.

Edit /etc/vmware/config and add:

tmpDirectory = "/tmp/vmware"
mainMem.useNamedFile = "FALSE"
sched.mem.pshare.enable = "FALSE"
MemTrimRate = "0"
MemAllowAutoScaleDown = "FALSE"
prefvmx.useRecommendedLockedMemSize = "TRUE"
prefvmx.minVmMemPct = "100"

Edit /etc/fstab and add

tmpfs                   /tmp/vmware             tmpfs   
defaults,size=100% 0 0

and edit /tmp/cron.daily/tmpwatch and add '-x /tmp/vmware' to the 
tmpwatch command line for /tmp.

make your mount point for /tmp/vmware and mount /tmp/vmware

restart vmware.

That is how I run my systems.

-- 
Benjamin Franz

Philip Gwyn

2009-Nov-03 15:08 UTC

head link

[CentOS-virt] SATA vs RAID5 vs VMware SOLVED

A short follow up to indicate how I solved my problem :

 - moved all the ram files to /dev/shm
 - downgraded host to CentOS 4.8 (was 5.2)
 - Moved virtual disks to RAID1 (was RAID5)
 - Spread the virtual disks over various raidsets (was all on same raidset)

The first element alone was not helpful.  I was not able to test RAID1 vs RAID5
in isolation from 4.8 vs 5.2, which would have been nice.

I might be downgrading all the other hosts to 4.8, in which case I might be
able to test it in isolation.

-Philip

Apparently Analagous Threads

Search for more apparently analagous threads

CentOS virt - Sep 2009 - SATA vs RAID5 vs VMware

[CentOS-virt] SATA vs RAID5 vs VMware

[CentOS-virt] SATA vs RAID5 vs VMware

[CentOS-virt] SATA vs RAID5 vs VMware

[CentOS-virt] SATA vs RAID5 vs VMware

[CentOS-virt] SATA vs RAID5 vs VMware

[CentOS-virt] SATA vs RAID5 vs VMware

[CentOS-virt] SATA vs RAID5 vs VMware SOLVED

Apparently Analagous Threads