I'd like to know what the cause of a particular DB server's slowdown might be. We've ruled out IOPs for the disks (~ 20%) and raw CPU load (top shows perhaps 1/2 of cores busy, but the system slows to a crawl. We're suspecting that we're simply running out of memory bandwidth but have no way to confirm this suspicion. Is there a way to test for this? Think: iostat but for memory bandwidth instead of disk IO. So far, searching has found intel-cmt-cat-master which isn't supported on our CPU and oprofile which *sounds* like it does what I want from their website but I can't seem to get output that, in any way, tells me what the bandwidth usage is. Any idea?
On 2/2/2016 5:34 PM, Benjamin Smith wrote:> I'd like to know what the cause of a particular DB server's slowdown might be. > We've ruled out IOPs for the disks (~ 20%) and raw CPU load (top shows perhaps > 1/2 of cores busy, but the system slows to a crawl. > > We're suspecting that we're simply running out of memory bandwidth but have no > way to confirm this suspicion. Is there a way to test for this? Think: iostat > but for memory bandwidth instead of disk IO.memory bandwidth would show up as CPU busy, there's no distinction. 50% of your cores 100% busy, how many cores and how many waiting database tasks are there? typically with most database servers, one user connection == one core at a time. so if you have 16 cores, and only 8 busy/active database connections, that will tie up those 8 cores and leave the other 8 free. now the 8 processes will probably get bounced around between the cores, so it could end up looking like all 16 cores are 50% busy averaged over some sample rate, but thats the same net difference.. -- john r pierce, recycling bits in santa cruz
On 02/02/2016 05:34 PM, Benjamin Smith wrote:> We've ruled out IOPs for the disks (~ 20%)How did you measure that? What filesystem are you using? What is the disk / array configuration? Which database? If you run "iostat -x 2" what does a representative summary look like?> and raw CPU load (top shows perhaps > 1/2 of cores busy, but the system slows to a crawl.Define "busy"?
On Tue, 2 Feb 2016 at 20:34 -0000, Benjamin Smith wrote:> Any idea?Wild guessing...How old a system? ~5 year old Nehalem? If so try: echo 0 > /proc/sys/vm/zone_reclaim_mode For some memory performance diagnosing try 'sar': sar -B 10 There are lots of other sar options which might be useful. Stuart -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone
Benjamin Smith <lists at ...> writes:> So far, searching has found intel-cmt-cat-master which isn't supportedon our> CPU and oprofile which *sounds* like it does what I want from theirwebsite but> I can't seem to get output that, in any way, tells me what the bandwidthusage> is. > > Any idea? >Perhaps Intel Performance Counter Monitor tool can help here: https://software.intel.com/en-us/articles/intel-performance-counter- monitor Quick CPU model check on ark.intel.com will indicate maximum CPU memory bandwidth.
Hello, Try to install collectd and check the metrics for ram. Best regards, El dia 03/02/2016 2:51 a. m., "John R Pierce" <pierce at hogranch.com> va escriure:> On 2/2/2016 5:34 PM, Benjamin Smith wrote: > >> I'd like to know what the cause of a particular DB server's slowdown >> might be. >> We've ruled out IOPs for the disks (~ 20%) and raw CPU load (top shows >> perhaps >> 1/2 of cores busy, but the system slows to a crawl. >> >> We're suspecting that we're simply running out of memory bandwidth but >> have no >> way to confirm this suspicion. Is there a way to test for this? Think: >> iostat >> but for memory bandwidth instead of disk IO. >> > > memory bandwidth would show up as CPU busy, there's no distinction. > > 50% of your cores 100% busy, how many cores and how many waiting database > tasks are there? typically with most database servers, one user connection > == one core at a time. so if you have 16 cores, and only 8 busy/active > database connections, that will tie up those 8 cores and leave the other 8 > free. now the 8 processes will probably get bounced around between the > cores, so it could end up looking like all 16 cores are 50% busy averaged > over some sample rate, but thats the same net difference.. > > > > -- > john r pierce, recycling bits in santa cruz > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > https://lists.centos.org/mailman/listinfo/centos >
On Tue, Feb 2, 2016 at 7:34 PM, Gordon Messmer <gordon.messmer at gmail.com> wrote:> On 02/02/2016 05:34 PM, Benjamin Smith wrote: >> >> We've ruled out IOPs for the disks (~ 20%) > > > How did you measure that? What filesystem are you using? What is the disk > / array configuration? > Which database? > > If you run "iostat -x 2" what does a representative summary look like? > >> and raw CPU load (top shows perhaps >> 1/2 of cores busy, but the system slows to a crawl. > > > Define "busy"?Yeah. It'd nice to see the output from top so we can see what is consuming most of the cpu or anything consuming less than it should because it's waiting for something else that's slower. It might be useful to see 'perf top' if perf is installed, and if not install it, reproduce the problem and let perf top run for a minute, then post it on fpaste or pastebin so the formatting stays semisane. -- Chris Murphy