thr3ads.net - zfs discuss - [zfs-discuss] URGENT: very high busy and average service time with ZFS and USP1100 [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Jim Mauro

2009-Sep-22 16:57 UTC

[zfs-discuss] URGENT: very high busy and average service time with ZFS and USP1100

Cross-posting to zfs-discuss. This does not need to be on the
confidential alias. It''s a performance query - there''s nothing
confidential in here. Other folks post performance queries to
zfs-discuss....

Forget %b - it''s useless.

It''s not the bandwidth that''s hurting you, it''s the
IOPS.
One of the hot devices did 1515.8 reads-per-second,
the other did over 500.

Is this Oracle?

You never actually tell us what the huge performance problem is -
what''s the workload, what''s the delivered level of
performance?

IO service times in the 32-22 millisecond range are not great,
but not the worst I''ve seen. Do you have any data that connects
the delivered perfomance of the workload to an IO latency
issue, or did the customer just run "iostat", saw "100% b",
and assumed this was the problem?

I need to see zpool stats.

Is each of these c3txxxxxx devices actually a raid 7+1 (which means
7 data disks and 1 parity disk)??

There''s nothing here that tells us there''s something that
needs to be
done on the ZFS side. Not enough data.

It looks like a very lopsided IO load distribution problem.
You have 8 LUNs cetXXXXX devices, 2 of which are getting
slammed with IOPS, the other 6 are relatively idle.

Thanks,
/jim

Javier Conde wrote:>
> Hello,
>
> IHAC with a huge performance problem in a newly installed M8000 
> confiured with a USP1100 and ZFS.
>
> From what we can see, 2 disks used by in different ZPOOLS have are 
> 100% busy and and average service time is also quite high (between 30 
> and 5 ms).
>
>    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>    0.0   11.4    0.0  224.1  0.0  0.2    0.0   20.7   0   5 
> c3t5000C5000F94A607d0
>    0.0   11.8    0.0  224.1  0.0  0.3    0.0   24.2   0   6 
> c3t5000C5000F94E38Fd0
>    0.2    0.0   25.6    0.0  0.0  0.0    0.0    7.9   0   0 
> c3t60060E8015321F000001321F00000032d0
>    0.0    3.6    0.0   20.8  0.0  0.0    0.0    0.5   0   0 
> c3t60060E8015321F000001321F00000020d0
>    0.2   24.0   25.6  488.0  0.0  0.0    0.0    0.6   0   1 
> c3t60060E8015321F000001321F0000001Cd0
>   11.4    0.8   92.8    8.0  0.0  0.0    0.0    3.9   0   4 
> c3t60060E8015321F000001321F00000019d0
>  573.4    0.0 73395.5    0.0  0.0 20.6    0.0   36.0   0 100 
> c3t60060E8015321F000001321F0000000Bd0
>    0.8    0.8  102.4    8.0  0.0  0.0    0.0   22.8   0   4 
> c3t60060E8015321F000001321F00000008d0
> 1515.8   10.2 30420.9  148.0  0.0 34.9    0.0   22.9   1 100 
> c3t60060E8015321F000001321F00000006d0
>    0.4    0.4   51.2    1.6  0.0  0.0    0.0    5.1   0   0 
> c3t60060E8015321F000001321F00000055d0
>
> The USP1100 is configured with a raid 7+1, which is the default 
> recommendation.
>
> The data transfered is not very high, between 50 and 150 MB/sec.
>
> Is this normal to see the disks all the time busy at 100% and the 
> average time always greater than 30 ms?
>
> Is there something we can do from the ZFS side?
>
> We have followed the recommendations regarding the block size for the 
> database file systems, we use 4 different zpools for the DB, indexes, 
> redolog and archive logs, the vdev_cache_bshift is set to 13 (8k 
> blocks)...
>
>
> Can someone help me o troubleshoot this issue?
>
> Thanks in advance and best regards,
>
> Javier

Richard Elling

2009-Sep-22 17:43 UTC

head link

[zfs-discuss] URGENT: very high busy and average service time with ZFS and USP1100

comment below...

On Sep 22, 2009, at 9:57 AM, Jim Mauro wrote:
>
> Cross-posting to zfs-discuss. This does not need to be on the
> confidential alias. It''s a performance query - there''s
nothing
> confidential in here. Other folks post performance queries to
> zfs-discuss....
>
> Forget %b - it''s useless.
>
> It''s not the bandwidth that''s hurting you, it''s
the IOPS.
> One of the hot devices did 1515.8 reads-per-second,
> the other did over 500.
>
> Is this Oracle?
>
> You never actually tell us what the huge performance problem is -
> what''s the workload, what''s the delivered level of
performance?
>
> IO service times in the 32-22 millisecond range are not great,
> but not the worst I''ve seen. Do you have any data that connects
> the delivered perfomance of the workload to an IO latency
> issue, or did the customer just run "iostat", saw "100%
b",
> and assumed this was the problem?
>
> I need to see zpool stats.
>
> Is each of these c3txxxxxx devices actually a raid 7+1 (which means
> 7 data disks and 1 parity disk)??
>
> There''s nothing here that tells us there''s something that
needs to be
> done on the ZFS side. Not enough data.
>
> It looks like a very lopsided IO load distribution problem.
> You have 8 LUNs cetXXXXX devices, 2 of which are getting
> slammed with IOPS, the other 6 are relatively idle.
>
> Thanks,
> /jim
>
>
> Javier Conde wrote:
>>
>> Hello,
>>
>> IHAC with a huge performance problem in a newly installed M8000  
>> confiured with a USP1100 and ZFS.
>>
>> From what we can see, 2 disks used by in different ZPOOLS have are  
>> 100% busy and and average service time is also quite high (between  
>> 30 and 5 ms).
>>
>>   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>   0.0   11.4    0.0  224.1  0.0  0.2    0.0   20.7   0   5  
>> c3t5000C5000F94A607d0
>>   0.0   11.8    0.0  224.1  0.0  0.3    0.0   24.2   0   6  
>> c3t5000C5000F94E38Fd0
>>   0.2    0.0   25.6    0.0  0.0  0.0    0.0    7.9   0   0  
>> c3t60060E8015321F000001321F00000032d0
>>   0.0    3.6    0.0   20.8  0.0  0.0    0.0    0.5   0   0  
>> c3t60060E8015321F000001321F00000020d0
>>   0.2   24.0   25.6  488.0  0.0  0.0    0.0    0.6   0   1  
>> c3t60060E8015321F000001321F0000001Cd0
>>  11.4    0.8   92.8    8.0  0.0  0.0    0.0    3.9   0   4  
>> c3t60060E8015321F000001321F00000019d0
>> 573.4    0.0 73395.5    0.0  0.0 20.6    0.0   36.0   0 100  
>> c3t60060E8015321F000001321F0000000Bd0
avg read size ~128kBytes... which is good
>>   0.8    0.8  102.4    8.0  0.0  0.0    0.0   22.8   0   4  
>> c3t60060E8015321F000001321F00000008d0
>> 1515.8   10.2 30420.9  148.0  0.0 34.9    0.0   22.9   1 100  
>> c3t60060E8015321F000001321F00000006d0
avg read size ~20 kBytes... not so good
These look like single-LUN pools.  What is the workload?
>>   0.4    0.4   51.2    1.6  0.0  0.0    0.0    5.1   0   0  
>> c3t60060E8015321F000001321F00000055d0
>>
>> The USP1100 is configured with a raid 7+1, which is the default  
>> recommendation.
Check the starting sector for the partition.  For older OpenSolaris  
and Solaris 10
installations, the default starting sector is 34, which has the  
unfortunate affect of
misaligning with most hardware RAID arrays. For newer installations,  
the default
starting sector is 256, which has a better chance of aligning with  
hardware RAID
arrays. This will be more pronounced when using RAID-5.

To check, look at the partition table in format(1m) or prtvtoc(1m)

BTW, the customer is surely not expecting super database performance  
from
RAID-5 are they?
>>
>> The data transfered is not very high, between 50 and 150 MB/sec.
>>
>> Is this normal to see the disks all the time busy at 100% and the  
>> average time always greater than 30 ms?
>>
>> Is there something we can do from the ZFS side?
>>
>> We have followed the recommendations regarding the block size for  
>> the database file systems, we use 4 different zpools for the DB,  
>> indexes, redolog and archive logs, the vdev_cache_bshift is set to  
>> 13 (8k blocks)...
hmmm... what OS release?  The vdev cache should only read
metadata, unless you are running on an old OS. In other words, the
solution which suggests changing vdev_cache_bshift has been
superceded by later OS releases.  You can check this via the kstats
for vdev cache.

The big knob for databases is recordsize. Clearly, the recordsize is  
set as default
on the LUN with 128 kByte average reads.
  -- richard
>>
>> Can someone help me o troubleshoot this issue?
>>
>> Thanks in advance and best regards,
>>
>> Javier
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Javier Conde

2009-Sep-23 21:23 UTC

head link

[zfs-discuss] URGENT: very high busy and average service time with ZFS and USP1100

Thanks Richard and Jim,

Your answers helped me to show to the customer that there was no issue 
with ZFS and the HDS.

I went onsite to see the problem, and as Jim suggested, the customer 
just saw the %b and average service time and he thought there was a problem.

The server is running an Oracle DB, and the 2 zfs file systems showing a 
lot of activity were the one with the database files, and the one for 
the redo logs.

For the DB file system, the recordsize is set to 8k, and that''s why we 
see around 2000 IOPS, with an asvct of 10 ms

The redolog file system have the default recordsize of 128k, so we see 
much less IOPS, the same transfer rate, and an asvct of 30 ms.

Everything is normal on this system. I probed that the activity of the 
IO side while the DB was not responding correctly was not different from 
the rest of the day.

I still don''t know what the final where was the problem, but it seems
it
has been solved now.

Regards,

Javier

Richard Elling wrote:> comment below...
>
> On Sep 22, 2009, at 9:57 AM, Jim Mauro wrote:
>
>>
>> Cross-posting to zfs-discuss. This does not need to be on the
>> confidential alias. It''s a performance query -
there''s nothing
>> confidential in here. Other folks post performance queries to
>> zfs-discuss....
>>
>> Forget %b - it''s useless.
>>
>> It''s not the bandwidth that''s hurting you,
it''s the IOPS.
>> One of the hot devices did 1515.8 reads-per-second,
>> the other did over 500.
>>
>> Is this Oracle?
>>
>> You never actually tell us what the huge performance problem is -
>> what''s the workload, what''s the delivered level of
performance?
>>
>> IO service times in the 32-22 millisecond range are not great,
>> but not the worst I''ve seen. Do you have any data that
connects
>> the delivered perfomance of the workload to an IO latency
>> issue, or did the customer just run "iostat", saw "100%
b",
>> and assumed this was the problem?
>>
>> I need to see zpool stats.
>>
>> Is each of these c3txxxxxx devices actually a raid 7+1 (which means
>> 7 data disks and 1 parity disk)??
>>
>> There''s nothing here that tells us there''s something
that needs to be
>> done on the ZFS side. Not enough data.
>>
>> It looks like a very lopsided IO load distribution problem.
>> You have 8 LUNs cetXXXXX devices, 2 of which are getting
>> slammed with IOPS, the other 6 are relatively idle.
>>
>> Thanks,
>> /jim
>>
>>
>> Javier Conde wrote:
>>>
>>> Hello,
>>>
>>> IHAC with a huge performance problem in a newly installed M8000 
>>> confiured with a USP1100 and ZFS.
>>>
>>> From what we can see, 2 disks used by in different ZPOOLS have are 
>>> 100% busy and and average service time is also quite high (between 
>>> 30 and 5 ms).
>>>
>>>   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>>>   0.0   11.4    0.0  224.1  0.0  0.2    0.0   20.7   0   5 
>>> c3t5000C5000F94A607d0
>>>   0.0   11.8    0.0  224.1  0.0  0.3    0.0   24.2   0   6 
>>> c3t5000C5000F94E38Fd0
>>>   0.2    0.0   25.6    0.0  0.0  0.0    0.0    7.9   0   0 
>>> c3t60060E8015321F000001321F00000032d0
>>>   0.0    3.6    0.0   20.8  0.0  0.0    0.0    0.5   0   0 
>>> c3t60060E8015321F000001321F00000020d0
>>>   0.2   24.0   25.6  488.0  0.0  0.0    0.0    0.6   0   1 
>>> c3t60060E8015321F000001321F0000001Cd0
>>>  11.4    0.8   92.8    8.0  0.0  0.0    0.0    3.9   0   4 
>>> c3t60060E8015321F000001321F00000019d0
>>> 573.4    0.0 73395.5    0.0  0.0 20.6    0.0   36.0   0 100 
>>> c3t60060E8015321F000001321F0000000Bd0
>
> avg read size ~128kBytes... which is good
>
>>>   0.8    0.8  102.4    8.0  0.0  0.0    0.0   22.8   0   4 
>>> c3t60060E8015321F000001321F00000008d0
>>> 1515.8   10.2 30420.9  148.0  0.0 34.9    0.0   22.9   1 100 
>>> c3t60060E8015321F000001321F00000006d0
>
> avg read size ~20 kBytes... not so good
> These look like single-LUN pools.  What is the workload?
>
>>>   0.4    0.4   51.2    1.6  0.0  0.0    0.0    5.1   0   0 
>>> c3t60060E8015321F000001321F00000055d0
>>>
>>> The USP1100 is configured with a raid 7+1, which is the default 
>>> recommendation.
>
> Check the starting sector for the partition.  For older OpenSolaris 
> and Solaris 10
> installations, the default starting sector is 34, which has the 
> unfortunate affect of
> misaligning with most hardware RAID arrays. For newer installations, 
> the default
> starting sector is 256, which has a better chance of aligning with 
> hardware RAID
> arrays. This will be more pronounced when using RAID-5.
>
> To check, look at the partition table in format(1m) or prtvtoc(1m)
>
> BTW, the customer is surely not expecting super database performance from
> RAID-5 are they?
>
>>>
>>> The data transfered is not very high, between 50 and 150 MB/sec.
>>>
>>> Is this normal to see the disks all the time busy at 100% and the 
>>> average time always greater than 30 ms?
>>>
>>> Is there something we can do from the ZFS side?
>>>
>>> We have followed the recommendations regarding the block size for 
>>> the database file systems, we use 4 different zpools for the DB, 
>>> indexes, redolog and archive logs, the vdev_cache_bshift is set to 
>>> 13 (8k blocks)...
>
> hmmm... what OS release?  The vdev cache should only read
> metadata, unless you are running on an old OS. In other words, the
> solution which suggests changing vdev_cache_bshift has been
> superceded by later OS releases.  You can check this via the kstats
> for vdev cache.
>
> The big knob for databases is recordsize. Clearly, the recordsize is 
> set as default
> on the LUN with 128 kByte average reads.
>  -- richard
>
>>>
>>> Can someone help me o troubleshoot this issue?
>>>
>>> Thanks in advance and best regards,
>>>
>>> Javier
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

zfs discuss - Sep 2009 - URGENT: very high busy and average service time with ZFS and USP1100

[zfs-discuss] URGENT: very high busy and average service time with ZFS and USP1100

[zfs-discuss] URGENT: very high busy and average service time with ZFS and USP1100

[zfs-discuss] URGENT: very high busy and average service time with ZFS and USP1100