thr3ads.net - CentOS - [CentOS] IO causing major performance issues [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Ross S. W. Walker

2007-Nov-15 22:17 UTC

[CentOS] IO causing major performance issues

Antonio Varni wrote:> 
> Hello everyone.
> 
> I'm wondering what other people's experiences are WRT systems
becoming
> unresponsive (unable to ssh in, etc) for brief periods of time when
> a large amount of IO is being performed.  It's really starting to
> cause a problem for us.  We're on Dell PowerEdge 1955 blades 
> - but this same
> issue has caused us problems on PE1950, PE1850, PE1750 servers.
> 
> We're running Centos 4.5 right now. I know Centos 5 includes 
> ionice, more
> io scheduler/elevator selections like deadlock/etc. Perhaps that would
> fix this issue.  We're running the latest PERC firmware.
> 
> The specific issue I'm referring to at this point is on a 
> system running
> mysql. All mysql data files are on a netapp filer but mysql's 
> tmp directory
> is on local disk.  Whenever a lot of temp tables are created (and thus
> written and deleted from local disk quickly) we can't even 
> log in to the
> machine - and our monitoring system gets all freaked out and we get
> lots of pages, etc... FYI this is two disks with hardware raid 1.
> 
> Is it just me? Or is this specific to Dell systems, or is this just
> the state of the Linux kernel these days? Is there some magical patch
> I can apply to make this issue go away :)
> 
> 
> Thanks in advance for any insight into this issue.
Yes, IO starvation can occur under heavy load.

Don't put database temp tables on system disks (or data tables for that
matter).

How much memory do you have in this box and how big does the temp directory
usage get?

Why I ask is you could create a tempfs and have mysql use that, just make sure
you have enough memory that you can spare X (whatever your temp table usage is)
for a cache filesystem.

You would also notice a dramatic speed increase in MySQL.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

redhat at mckerrs.net

2007-Nov-15 22:39 UTC

head link

[CentOS] IO causing major performance issues

----- Original Message ----- 
From: "Antonio Varni" <avarni at estalea.com> 
To: centos at centos.org 
Sent: Friday, November 16, 2007 9:06:52 AM (GMT+1000) Australia/Brisbane 
Subject: [CentOS] IO causing major performance issues 

Hello everyone. 

I'm wondering what other people's experiences are WRT systems becoming 
unresponsive (unable to ssh in, etc) for brief periods of time when 
a large amount of IO is being performed. It's really starting to 
cause a problem for us. We're on Dell PowerEdge 1955 blades - but this same 
issue has caused us problems on PE1950, PE1850, PE1750 servers. 

We're running Centos 4.5 right now. I know Centos 5 includes ionice, more 
io scheduler/elevator selections like deadlock/etc. Perhaps that would 
fix this issue. We're running the latest PERC firmware. 

The specific issue I'm referring to at this point is on a system running 
mysql. All mysql data files are on a netapp filer but mysql's tmp directory 
is on local disk. Whenever a lot of temp tables are created (and thus 
written and deleted from local disk quickly) we can't even log in to the 
machine - and our monitoring system gets all freaked out and we get 
lots of pages, etc... FYI this is two disks with hardware raid 1. 

Is it just me? Or is this specific to Dell systems, or is this just 
the state of the Linux kernel these days? Is there some magical patch 
I can apply to make this issue go away :) 

Thanks in advance for any insight into this issue. 

Antonio 

-- 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
antonio varni 
[ technology ] 

ESTALEA, L.P. 
629 State Street #222 
Santa Barbara, CA 93101 
v 805.252.0115 
f 805.899.2697 
e avarni at estalea.com 
w www.estalea.com 
_______________________________________________ 
CentOS mailing list 
CentOS at centos.org 
http://lists.centos.org/mailman/listinfo/centos 

-- 
This message has been scanned for viruses and 
dangerous content by MailScanner, and is 
believed to be clean. 

I have noticed similar behaviour on all sort of linuxes (in particular, ssh into
the box is really slow when it's doing IO) and wondered why, but never
really thought about investigating any further.

Unfortunately, I do a lot of work with solaris and the funny thing is that I
have *never* seen a solaris kernel exhibit this sort of behaviour. Even if it is
installed on normal IDE/SATA disks. And, in fact, even if installed on the exact
same hardware.

Now I'm curious.....especially given that I'm right in the middle of
pushing to get rid of solaris in favour of RHEL.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.centos.org/pipermail/centos/attachments/20071116/c2c2a24d/attachment-0005.html>

Ross S. W. Walker

2007-Nov-15 22:45 UTC

head link

[CentOS] IO causing major performance issues

redhat at mckerrs.net wrote:> Antonio Varni wrote:
> > 
> > Hello everyone.
> > 
> > I'm wondering what other people's experiences are WRT systems
becoming
> > unresponsive (unable to ssh in, etc) for brief periods of time when
> > a large amount of IO is being performed.  It's really starting to
> > cause a problem for us.  We're on Dell PowerEdge 1955 blades 
> > - but this same
> > issue has caused us problems on PE1950, PE1850, PE1750 servers.
> > 
> > We're running Centos 4.5 right now. I know Centos 5 includes 
> > ionice, more
> > io scheduler/elevator selections like deadlock/etc. Perhaps that would
> > fix this issue.  We're running the latest PERC firmware.
> > 
> > The specific issue I'm referring to at this point is on a 
> > system running
> > mysql. All mysql data files are on a netapp filer but mysql's 
> > tmp directory
> > is on local disk.  Whenever a lot of temp tables are created (and thus
> > written and deleted from local disk quickly) we can't even 
> > log in to the
> > machine - and our monitoring system gets all freaked out and we get
> > lots of pages, etc... FYI this is two disks with hardware raid 1.
> > 
> > Is it just me? Or is this specific to Dell systems, or is this just
> > the state of the Linux kernel these days? Is there some magical patch
> > I can apply to make this issue go away :)
> > 
> > 
> > Thanks in advance for any insight into this issue.
> > 
> > Antonio
> 
> I have noticed similar behaviour on all sort of linuxes (in 
> particular, ssh into the box is really slow when it's doing 
> IO) and wondered why, but never really thought about 
> investigating any further.
> 
> Unfortunately, I do a lot of work with solaris and the funny 
> thing is that I have *never* seen a solaris kernel exhibit 
> this sort of behaviour. Even if it is installed on normal 
> IDE/SATA disks. And, in fact, even if installed on the exact 
> same hardware.
> 
> 
> Now I'm curious.....especially given that I'm right in the 
> middle of pushing to get rid of solaris in favour of RHEL.
It really depends what the system is doing, what services you are
running and how you have it configured.

You had Solaris installed, what services was it running?

You had Linux installed, what services was it running?

Database temp tables and logs can generate an enormous amount of
io which can swamp the file systems of any system.

I have seen it on Windows and Linux, so I don't see why Solaris
would be any different.

You could always try a different scheduler to see if that helps,
for instance if you are using 'cfq' try 'deadline'.

-Ross

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

Les Mikesell

2007-Nov-15 22:54 UTC

head link

[CentOS] IO causing major performance issues

redhat at mckerrs.net wrote:> 
> The specific issue I'm referring to at this point is on a system
running
> mysql. All mysql data files are on a netapp filer but mysql's tmp
directory
> is on local disk.  Whenever a lot of temp tables are created (and thus
> written and deleted from local disk quickly) we can't even log in to
the
> machine - and our monitoring system gets all freaked out and we get
> lots of pages, etc... FYI this is two disks with hardware raid 1.
> 
> Is it just me? Or is this specific to Dell systems, or is this just
> the state of the Linux kernel these days? Is there some magical patch
> I can apply to make this issue go away :)
Does the Dell have a raid controller?  I saw something like this long 
ago on a Dell with a raid card that appeared to queue up thousands of 
operations, then hit some kind of high water mark and stay busy 
(basically locking the system) for several minutes while it caught up. 
It seemed pretty fast as long as you never completely filled its 
queue...  These days I mostly run software raid1.

-- 
   Les Mikesell
    lesmikesell at gmail.com

Antonio Varni

2007-Nov-15 23:06 UTC

head link

[CentOS] IO causing major performance issues

Hello everyone.

I'm wondering what other people's experiences are WRT systems becoming
unresponsive (unable to ssh in, etc) for brief periods of time when
a large amount of IO is being performed.  It's really starting to
cause a problem for us.  We're on Dell PowerEdge 1955 blades - but this same
issue has caused us problems on PE1950, PE1850, PE1750 servers.

We're running Centos 4.5 right now. I know Centos 5 includes ionice, more
io scheduler/elevator selections like deadlock/etc. Perhaps that would
fix this issue.  We're running the latest PERC firmware.

The specific issue I'm referring to at this point is on a system running
mysql. All mysql data files are on a netapp filer but mysql's tmp directory
is on local disk.  Whenever a lot of temp tables are created (and thus
written and deleted from local disk quickly) we can't even log in to the
machine - and our monitoring system gets all freaked out and we get
lots of pages, etc... FYI this is two disks with hardware raid 1.

Is it just me? Or is this specific to Dell systems, or is this just
the state of the Linux kernel these days? Is there some magical patch
I can apply to make this issue go away :)


Thanks in advance for any insight into this issue.

Antonio



-- 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
antonio varni
[ technology ]

ESTALEA, L.P.
629 State Street #222
Santa Barbara, CA 93101
v 805.252.0115
f 805.899.2697
e avarni at estalea.com
w www.estalea.com

Ross S. W. Walker

2007-Nov-15 23:52 UTC

head link

[CentOS] IO causing major performance issues

Well the CFQ scheduler tries to do just that, but if the amount of io is
overwhelming even it cannot compensate.

Writes take longer then reads and for a mirror they have to happen on both
drives before they can service another request.

I suggest you put the temp database on a tempfs to avoid the problem.

Set cfq scheduler on /dev/sda if it isn't already.

Other then that you could try a different file system like xfs or jfs to see if
that helps.

-Ross



-----Original Message-----
From: centos-bounces at centos.org <centos-bounces at centos.org>
To: CentOS mailing list <centos at centos.org>
Sent: Thu Nov 15 19:29:20 2007
Subject: RE: [CentOS] IO causing major performance issues

Of course IO can swamp the file system. My point is that the kernel should
at least give enough time-slices to the other processes (like sshd) so
we can still log in.  It's not asking a lot from the kernel - to just log in
via ssh really.

-- 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
antonio varni
[ technology ]

ESTALEA, L.P.
629 State Street #222
Santa Barbara, CA 93101
v 805.252.0115
f 805.899.2697
e avarni at estalea.com
w www.estalea.com

On Thu, 15 Nov 2007, Ross S. W. Walker wrote:
> redhat at mckerrs.net wrote:
> > Antonio Varni wrote:
> > > 
> > > Hello everyone.
> > > 
> > > I'm wondering what other people's experiences are WRT
systems becoming
> > > unresponsive (unable to ssh in, etc) for brief periods of time
when
> > > a large amount of IO is being performed.  It's really
starting to
> > > cause a problem for us.  We're on Dell PowerEdge 1955 blades 
> > > - but this same
> > > issue has caused us problems on PE1950, PE1850, PE1750 servers.
> > > 
> > > We're running Centos 4.5 right now. I know Centos 5 includes 
> > > ionice, more
> > > io scheduler/elevator selections like deadlock/etc. Perhaps that
would
> > > fix this issue.  We're running the latest PERC firmware.
> > > 
> > > The specific issue I'm referring to at this point is on a 
> > > system running
> > > mysql. All mysql data files are on a netapp filer but mysql's
> > > tmp directory
> > > is on local disk.  Whenever a lot of temp tables are created (and
thus
> > > written and deleted from local disk quickly) we can't even 
> > > log in to the
> > > machine - and our monitoring system gets all freaked out and we
get
> > > lots of pages, etc... FYI this is two disks with hardware raid 1.
> > > 
> > > Is it just me? Or is this specific to Dell systems, or is this
just
> > > the state of the Linux kernel these days? Is there some magical
patch
> > > I can apply to make this issue go away :)
> > > 
> > > 
> > > Thanks in advance for any insight into this issue.
> > > 
> > > Antonio
> > 
> > I have noticed similar behaviour on all sort of linuxes (in 
> > particular, ssh into the box is really slow when it's doing 
> > IO) and wondered why, but never really thought about 
> > investigating any further.
> > 
> > Unfortunately, I do a lot of work with solaris and the funny 
> > thing is that I have *never* seen a solaris kernel exhibit 
> > this sort of behaviour. Even if it is installed on normal 
> > IDE/SATA disks. And, in fact, even if installed on the exact 
> > same hardware.
> > 
> > 
> > Now I'm curious.....especially given that I'm right in the 
> > middle of pushing to get rid of solaris in favour of RHEL.
> 
> It really depends what the system is doing, what services you are
> running and how you have it configured.
> 
> You had Solaris installed, what services was it running?
> 
> You had Linux installed, what services was it running?
> 
> Database temp tables and logs can generate an enormous amount of
> io which can swamp the file systems of any system.
> 
> I have seen it on Windows and Linux, so I don't see why Solaris
> would be any different.
> 
> You could always try a different scheduler to see if that helps,
> for instance if you are using 'cfq' try 'deadline'.
> 
> -Ross
> 
> ______________________________________________________________________
> This e-mail, and any attachments thereto, is intended only for use by
> the addressee(s) named herein and may contain legally privileged
> and/or confidential information. If you are not the intended recipient
> of this e-mail, you are hereby notified that any dissemination,
> distribution or copying of this e-mail, and any attachments thereto,
> is strictly prohibited. If you have received this e-mail in error,
> please immediately notify the sender and permanently delete the
> original and any copy or printout thereof.
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
> _______________________________________________
CentOS mailing list
CentOS at centos.org
http://lists.centos.org/mailman/listinfo/centos

______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.centos.org/pipermail/centos/attachments/20071115/9646bff7/attachment-0005.html>

Apparently Analagous Threads

Search for more seemingly similar threads

CentOS - Nov 2007 - IO causing major performance issues

[CentOS] IO causing major performance issues

[CentOS] IO causing major performance issues

[CentOS] IO causing major performance issues

[CentOS] IO causing major performance issues

[CentOS] IO causing major performance issues

[CentOS] IO causing major performance issues

Apparently Analagous Threads