thr3ads.net - zfs discuss - [zfs-discuss] controller cache instead of dedicated ZIL device [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Felix Buenemann

2010-Feb-23 17:16 UTC

[zfs-discuss] controller cache instead of dedicated ZIL device

Hi,

as it turns out to be pretty difficult (or expensive), to find high 
performance dedicated ZIL devices, I had another thought:

If using a RAID controller with a large cache, eg. 4GB and battery 
backup in JBOD mode and using on disk ZIL ? wouldn''t the controller 
cache work as a great ZIL accelerator, requiring no dedicated ZIL?

If my understanding is correct, a battery backed RAID controller will 
ignore cache flush commands and thus the controller cache would be a 
very low latency intermediate cache wich is preserved over power failure.

Best Regards,
     Felix Buenemann

Richard Elling

2010-Feb-23 17:25 UTC

head link

[zfs-discuss] controller cache instead of dedicated ZIL device

On Feb 23, 2010, at 9:16 AM, Felix Buenemann wrote:> Hi,
> 
> as it turns out to be pretty difficult (or expensive), to find high
performance dedicated ZIL devices, I had another thought:
> 
> If using a RAID controller with a large cache, eg. 4GB and battery backup
in JBOD mode and using on disk ZIL ? wouldn''t the controller cache work
as a great ZIL accelerator, requiring no dedicated ZIL?
Yes. ZIL is a performance problem for HDD JBODs, not so much on devices with
fast, nonvolatile write caches.
> If my understanding is correct, a battery backed RAID controller will
ignore cache flush commands and thus the controller cache would be a very low
latency intermediate cache wich is preserved over power failure.
Cache flush latency is orthogonal to slog latency.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)

Bruno Sousa

2010-Feb-23 18:15 UTC

head link

[zfs-discuss] snv_133 - high cpu

Hi all,

I''m currently evaluating the possibility of migrating a NFS server
(Linux Centos 5.4 / RHEL 5.4 x64-32) based to a opensolaris box and i''m
seeing some huge cpu usage in the opensolaris box.

The zfs box is a Dell R710 with 2 Quad-Cores (Intel E5506  @ 2.13GHz),
16Gb ram , 2 Sun non-Raid HBA''s connected to two J4400 jbods, while the
Linux box is a 2Xeon 3.0Ghz with 8Gb ram, a areca HBA with 512 mb cache,
and both of the servers have a Intel 10gbE card with jumbo frames enabled.

This zfs box has one pool in a raidz2 with multipath enable (to make use
of the 2hbas and 2 j4400), with 20 disks (sata 7.200 rpm seagate
enterprise as supplied by Sun). The raidz2 has 5 vdevs with 4 disks each.
The test is made by mounting in the linux box one nfs share from the zfs
box, and copy around 1.1TB of data , and this data is users''s home
directories, so thousands of small files.
During the copy procedure from the linux box to the zfs box the load in
the zfs box is between 8 and 10 while on the linux box it never goes
over 1 .
Could the fact of having a RAIDZ2 configuration be the cause for such a
big load on the zfs box, or maybe am i missing something ?

Thanks for all your time,
Bruno


Here are some more specs from the ZFS box :

root at zfsbox01:/var/adm# zpool status -v RAIDZ2
  pool: RAIDZ2
 state: ONLINE
 scrub: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        RAIDZ2                     ONLINE       0     0     0
          raidz2-0                 ONLINE       0     0     0
            c0t5000C5001A101764d0  ONLINE       0     0     0
            c0t5000C5001A315D0Ad0  ONLINE       0     0     0
            c0t5000C5001A10EC6Bd0  ONLINE       0     0     0
            c0t5000C5001A0FFF4Bd0  ONLINE       0     0     0
          raidz2-1                 ONLINE       0     0     0
            c0t5000C50019C0A04Ed0  ONLINE       0     0     0
            c0t5000C5001A0FA028d0  ONLINE       0     0     0
            c0t5000C50019FCF180d0  ONLINE       0     0     0
            c0t5000C5001A11E657d0  ONLINE       0     0     0
          raidz2-2                 ONLINE       0     0     0
            c0t5000C5001A104A30d0  ONLINE       0     0     0
            c0t5000C5001A316841d0  ONLINE       0     0     0
            c0t5000C5001A0FF92Ed0  ONLINE       0     0     0
            c0t5000C50019EB02FDd0  ONLINE       0     0     0
          raidz2-3                 ONLINE       0     0     0
            c0t5000C5001A0FDBDCd0  ONLINE       0     0     0
            c0t5000C5001A0F2197d0  ONLINE       0     0     0
            c0t5000C50019BDBB8Dd0  ONLINE       0     0     0
            c0t5000C5001A3152A0d0  ONLINE       0     0     0
          raidz2-4                 ONLINE       0     0     0
            c0t5000C5001A100DA0d0  ONLINE       0     0     0
            c0t5000C5001A31544Cd0  ONLINE       0     0     0
            c0t5000C50019F03AF6d0  ONLINE       0     0     0
            c0t5000C50019FC3055d0  ONLINE       0     0     0

###############

root at zfsbox01:~# zpool iostat RAIDZ2 5
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
RAIDZ2      2.29T  15.8T     43    305  3.03M  14.6M
RAIDZ2      2.29T  15.8T    114    663  12.7M  18.6M
RAIDZ2      2.29T  15.8T    129    595  14.0M  11.2M
RAIDZ2      2.29T  15.8T    881    623  13.0M  10.4M
RAIDZ2      2.29T  15.8T    227    449  8.48M  17.5M
RAIDZ2      2.29T  15.8T     39    498  4.55M  29.1M

#######################################

root at zfsbox01:~# top -b | grep CPU | head -n1
CPU states: 35.2% idle,  2.2% user, 62.6% kernel,  0.0% iowait,  0.0% swap

root at zfsbox01:~# mpstat
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   55   0 16969 18180  102  785   55  127 1779    4   242    1  69  
0  30
  1   70   0 18005 16820    4  926   44  142 1889    6   159    1  65  
0  35
  2   42   0 16659 18091  262  555   53  113 1757   11   250    2  68  
0  31
  3   48   0 18221 17380  246  667   40  122 1929   12   132    1  66  
0  33
  4   38   0 16547 19965 1766  517   48  107 1775   10   264    2  70  
0  29
  5   42   0 18596 19113 1527  595   35  115 1987    6   156    1  69  
0  31
  6   23   0 16284 17921   10 2066   54  109 1763    4   115    1  70  
0  29
  7   32   0 17576 16665    3 2233   39  134 1847    5    90    0  64  
0  35

top -b| grep Memory
Memory: 16G phys mem, 2181M free mem, 8187M total swap, 8187M free swap


Feb 18 11:42:36 zfsbox01 unix: [ID 378719 kern.info] NOTICE: cpu_acpi:
_PSS package evaluation failed for with status 5 for CPU 2.
Feb 18 11:42:36 zfsbox01 unix: [ID 388705 kern.info] NOTICE: cpu_acpi:
error parsing _PSS for CPU 2
 

Feb 18 11:43:12 zfsbox01 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe0:
identify 82598 adapter
Feb 18 11:43:12 zfsbox01 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe0:
Request 16 handles, 2 available
Feb 18 11:43:12 zfsbox01 pcplusmp: [ID 805372 kern.info] pcplusmp:
pciex8086,10c7 (ixgbe) instance 0 irq 0x45 vector 0x66 ioapic 0xff intin
0xff is bound to cpu 3
Feb 18 11:43:12 zfsbox01 pcplusmp: [ID 805372 kern.info] pcplusmp:
pciex8086,10c7 (ixgbe) instance 0 irq 0x46 vector 0x67 ioapic 0xff intin
0xff is bound to cpu 4

Feb 18 11:43:12 zfsbox01 mac: [ID 469746 kern.info] NOTICE: ixgbe0
registered
Feb 18 11:43:12 zfsbox01 ixgbe: [ID 611667 kern.info] NOTICE: ixgbe0:
Intel 10Gb Ethernet, driver version 1.1.4

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100223/088b85ec/attachment.bin>

Bob Friesenhahn

2010-Feb-23 19:03 UTC

head link

[zfs-discuss] snv_133 - high cpu

On Tue, 23 Feb 2010, Bruno Sousa wrote:> Could the fact of having a RAIDZ2 configuration be the cause for such a
> big load on the zfs box, or maybe am i missing something ?
Zfs can consume appreciable CPU if compression, sha256 checksums, 
and/or deduplication is enabled.  Otherwise, substantial CPU 
consumption is unexpected.

Are compression, sha256 checksums, or deduplication enabled for the 
filesystem you are using?

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Eugen Leitl

2010-Feb-23 19:27 UTC

head link

[zfs-discuss] snv_133 - high cpu

On Tue, Feb 23, 2010 at 01:03:04PM -0600, Bob Friesenhahn wrote:
> Zfs can consume appreciable CPU if compression, sha256 checksums, 
> and/or deduplication is enabled.  Otherwise, substantial CPU 
> consumption is unexpected.
In terms of scaling, does zfs on OpenSolaris play well on multiple
cores? How much disks (assuming 100 MByte/s throughput for each)
would be considered pushing it for a current single-socket quadcore?
 > Are compression, sha256 checksums, or deduplication enabled for the 
> filesystem you are using?
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a>
http://leitl.org
______________________________________________________________
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

Bruno Sousa

2010-Feb-23 19:37 UTC

head link

[zfs-discuss] snv_133 - high cpu

Hi,

I don''t have compression and deduplication enabled, but checksums are.
However disabling checksums gives a 0.5 load reduction only...


Bruno
On 23-2-2010 20:27, Eugen Leitl wrote:> On Tue, Feb 23, 2010 at 01:03:04PM -0600, Bob Friesenhahn wrote:
>
>   
>> Zfs can consume appreciable CPU if compression, sha256 checksums, 
>> and/or deduplication is enabled.  Otherwise, substantial CPU 
>> consumption is unexpected.
>>     
> In terms of scaling, does zfs on OpenSolaris play well on multiple
> cores? How much disks (assuming 100 MByte/s throughput for each)
> would be considered pushing it for a current single-socket quadcore?
>  
>   
>> Are compression, sha256 checksums, or deduplication enabled for the 
>> filesystem you are using?
>>     
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100223/3e952ac3/attachment.bin>

Bruno Sousa

2010-Feb-23 19:39 UTC

head link

[zfs-discuss] snv_133 - high cpu

Hi Bob,

I have neither deduplication or compression enabled. The checksum are
enabled, but if try to disable it i gain aroud 0.5 less load on the box,
so it still seems to be to much.

Bruno

On 23-2-2010 20:03, Bob Friesenhahn wrote:> On Tue, 23 Feb 2010, Bruno Sousa wrote:
>> Could the fact of having a RAIDZ2 configuration be the cause for such a
>> big load on the zfs box, or maybe am i missing something ?
>
> Zfs can consume appreciable CPU if compression, sha256 checksums,
> and/or deduplication is enabled.  Otherwise, substantial CPU
> consumption is unexpected.
>
> Are compression, sha256 checksums, or deduplication enabled for the
> filesystem you are using?
>
> Bob
> -- 
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
> http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100223/4432409f/attachment.bin>

Bob Friesenhahn

2010-Feb-23 19:40 UTC

head link

[zfs-discuss] snv_133 - high cpu

On Tue, 23 Feb 2010, Eugen Leitl wrote:>
> In terms of scaling, does zfs on OpenSolaris play well on multiple
> cores? How much disks (assuming 100 MByte/s throughput for each)
> would be considered pushing it for a current single-socket quadcore?
In any large storage system, most disks are relatively unused.  It is 
not normal for all disks to be pumping through their rated throughput 
at one time.  PCIe interfaces are only capable of a certain amount of 
bandwidth and this will place a hard limit on maximum throughput. 
There are also limits based on the raw memory bandwidth of the 
machine.

OpenSolaris is the king of multi-threading and excels on multiple 
cores.  Without this fine level of threading, SPARC CMT hardware would 
be rendered useless.

With this in mind, some older versions of OpenSolaris did experience a 
thread priority problem when compression was used.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2010-Feb-23 19:47 UTC

head link

[zfs-discuss] snv_133 - high cpu

On Tue, 23 Feb 2010, Bruno Sousa wrote:
> I don''t have compression and deduplication enabled, but checksums
are.
> However disabling checksums gives a 0.5 load reduction only...
Since high CPU consumption is unusual, I would suspect a device driver 
issue.  Perhaps there is an interrupt conflict such that two devices 
are using the same interrupt.

On my own system (12 disks), I can run a throughput benchmark and the 
system remains completely usable as an interactive desktop system, 
without any large use of CPU or high load factor.  The bandwidth 
bottleneck in my case is the PCIe (4 lane) fiber channel card and its 
duplex connection to the storage array.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bruno Sousa

2010-Feb-23 19:53 UTC

head link

[zfs-discuss] snv_133 - high cpu

The system becames really slow during the data copy using network, but i
copy data between 2 pools of the box i don''t notice that issue, so
probably i may be hitting some sort of interrupt conflit in the network
cards...This system is configured with *alot *of interfaces, being :

4 internal broadcom gigabit
1 PCIe 4x, Intel Dual Pro gigabit
1 PCIe 4x, Intel 10gbE card
2 PCIe 8x Sun non-raid HBA


With all of this, is there any way to check if there is indeed an
interrupt conflit or some other type of conflit that leads this high
load? I also noticed some messages about acpi..can this acpi also affect
the performance of the system?

Regards,
Bruno

On 23-2-2010 20:47, Bob Friesenhahn wrote:> On Tue, 23 Feb 2010, Bruno Sousa wrote:
>
>> I don''t have compression and deduplication enabled, but
checksums are.
>> However disabling checksums gives a 0.5 load reduction only...
>
> Since high CPU consumption is unusual, I would suspect a device driver
> issue.  Perhaps there is an interrupt conflict such that two devices
> are using the same interrupt.
>
> On my own system (12 disks), I can run a throughput benchmark and the
> system remains completely usable as an interactive desktop system,
> without any large use of CPU or high load factor.  The bandwidth
> bottleneck in my case is the PCIe (4 lane) fiber channel card and its
> duplex connection to the storage array.
>
> Bob
> -- 
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
> http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100223/1b66fa8e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100223/1b66fa8e/attachment.bin>

Chris Ridd

2010-Feb-23 23:20 UTC

head link

[zfs-discuss] snv_133 - high cpu

On 23 Feb 2010, at 19:53, Bruno Sousa wrote:
> The system becames really slow during the data copy using network, but i
copy data between 2 pools of the box i don''t notice that issue, so
probably i may be hitting some sort of interrupt conflit in the network
cards...This system is configured with alot of interfaces, being :
> 
> 4 internal broadcom gigabit
> 1 PCIe 4x, Intel Dual Pro gigabit
> 1 PCIe 4x, Intel 10gbE card
> 2 PCIe 8x Sun non-raid HBA
> 
> 
> With all of this, is there any way to check if there is indeed an interrupt
conflit or some other type of conflit that leads this high load? I also noticed
some messages about acpi..can this acpi also affect the performance of the
system?
To see what interrupts are being shared:

# echo "::interrupts -d" | mdb -k

Running intrstat might also be interesting.

Cheers,

Chris

David Dyer-Bennet

2010-Feb-24 16:49 UTC

head link

[zfs-discuss] Interrupt sharing

On Tue, February 23, 2010 17:20, Chris Ridd wrote:
> To see what interrupts are being shared:
>
> # echo "::interrupts -d" | mdb -k
>
> Running intrstat might also be interesting.
This just caught my attention.  I''m not the original poster, but this
sparked something I''ve been wanting to know about for a while.

I know from startup log messages that I''ve got several interrupts being
shared.  I''ve been wondering how serious this is.  I don''t
have any
particular performance problems, but then again my cpu and motherboard are
from 2006 and I''d like to extend their service life, so using them more
efficiently isn''t a bad idea.  Plus it''s all a learning
experience :-).

While I see the relevance to diagnosing performance problems, for my case,
is there likely to be anything I can do about interrupt assignments?  Or
is this something that, if it''s a problem, is an unfixable problem
(short
of changing hardware)?  I think there''s BIOS stuff to shuffle interrupt
assignments some, but do changes at that level survive kernel startup, or
get overwritten?

If there''s nothing I can do, then no real point in my investigating
further.  However, if there''s possibly something to do, what kinds of
things should I look for as problems in the mdb or intrstat data?

mdb reports:

# echo "::interrupts -d" | mdb -k
IRQ  Vect IPL Bus    Trg Type   CPU Share APIC/INT# Driver Name(s)
1    0x42 5   ISA    Edg Fixed  1   1     0x0/0x1   i8042#0
4    0xb0 12  ISA    Edg Fixed  1   1     0x0/0x4   asy#0
6    0x44 5   ISA    Edg Fixed  0   1     0x0/0x6   fdc#0
9    0x81 9   PCI    Lvl Fixed  1   1     0x0/0x9   acpi_wrapper_isr
12   0x43 5   ISA    Edg Fixed  0   1     0x0/0xc   i8042#0
14   0x45 5   ISA    Edg Fixed  1   1     0x0/0xe   ata#0
16   0x83 9   PCI    Lvl Fixed  0   1     0x0/0x10  pci-ide#1
19   0x86 9   PCI    Lvl Fixed  1   1     0x0/0x13  hci1394#0
20   0x41 5   PCI    Lvl Fixed  0   2     0x0/0x14  nv_sata#1, nv_sata#0
21   0x84 9   PCI    Lvl Fixed  1   2     0x0/0x15  nv_sata#2, ehci#0
22   0x85 9   PCI    Lvl Fixed  0   2     0x0/0x16  audiohd#0, ohci#0
23   0x60 6   PCI    Lvl Fixed  1   2     0x0/0x17  nge#1, nge#0
24   0x82 7   PCI    Edg MSI    0   1     -         pcie_pci#0
25   0x40 5   PCI    Edg MSI    1   1     -         mpt#0
26   0x30 4   PCI    Edg MSI    1   1     -         pcie_pci#5
27   0x87 7   PCI    Edg MSI    0   1     -         pcie_pci#4
160  0xa0 0          Edg IPI    all 0     -         poke_cpu
192  0xc0 13         Edg IPI    all 1     -         xc_serv
208  0xd0 14         Edg IPI    all 1     -         kcpc_hw_overflow_intr
209  0xd1 14         Edg IPI    all 1     -         cbe_fire
210  0xd3 14         Edg IPI    all 1     -         cbe_fire
240  0xe0 15         Edg IPI    all 1     -         xc_serv
241  0xe1 15         Edg IPI    all 1     -         apic_error_intr

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Bart Smaalders

2010-Feb-24 18:31 UTC

head link

[zfs-discuss] snv_133 - high cpu

On 02/23/10 15:20, Chris Ridd wrote:>
> On 23 Feb 2010, at 19:53, Bruno Sousa wrote:
>
>> The system becames really slow during the data copy using network, but
i copy data between 2 pools of the box i don''t notice that issue, so
probably i may be hitting some sort of interrupt conflit in the network
cards...This system is configured with alot of interfaces, being :
>>
>> 4 internal broadcom gigabit
>> 1 PCIe 4x, Intel Dual Pro gigabit
>> 1 PCIe 4x, Intel 10gbE card
>> 2 PCIe 8x Sun non-raid HBA
>>
>>
>> With all of this, is there any way to check if there is indeed an
interrupt conflit or some other type of conflit that leads this high load? I
also noticed some messages about acpi..can this acpi also affect the performance
of the system?
>
> To see what interrupts are being shared:
>
> # echo "::interrupts -d" | mdb -k
>
> Running intrstat might also be interesting.
>
> Cheers,
>
> Chris
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Is this using mpt driver?  There''s an issue w/ the fix for
6863127 that causes performance problems on larger memory
machines, filed as 6908360.

- Bart







-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."

Andy Bowers

2010-Feb-24 18:42 UTC

head link

[zfs-discuss] snv_133 - high cpu

Hi Bart,
	yep, I got Bruno to run a kernel profile lockstat...

	it does look like the mpt issue..

andy

:-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
 2861   7%  55% 0.00     4889 cpu[1]+5               do_splx                 

      nsec ------ Time Distribution ------ count     Stack                   
      1024 |                               1         xc_common               
      2048 |@@                             213       xc_call                 
      4096 |@@@@@@@@@@@                    1136      hat_tlb_inval           
      8192 |@@@@@@@@@@@@                   1237      x86pte_inval            
     16384 |@@                             256       hat_pte_unmap           
     32768 |                               15        hat_unload_callback     
     65536 |                               1         hat_unload              
    131072 |                               2         segkmem_free_vn         
                                                     segkmem_free            
                                                     vmem_xfree              
                                                     vmem_free               
                                                     kfreea                  
                                                     i_ddi_mem_free          
                                                     rootnex_teardown_copybuf
                                                     rootnex_coredma_unbindhdl
                                                     rootnex_dma_unbindhdl   
                                                     ddi_dma_unbind_handle   
                                                     scsi_dmafree_attr       
                                                     scsi_free_cache_pkt     
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller                  
 1857   5%  59% 0.00     1907 cpu[0]+5               getctgsz                

      nsec ------ Time Distribution ------ count     Stack                   
      1024 |@@@                            206       kfreea                  
      2048 |@@@@@@@@@@@@@@@@@@@            1203      i_ddi_mem_free          
      4096 |@@@@@@                         387       rootnex_teardown_copybuf
      8192 |                               24        rootnex_coredma_unbindhdl
     16384 |                               25        rootnex_dma_unbindhdl   
     32768 |                               12        ddi_dma_unbind_handle   
                                                     scsi_dmafree_attr       
                                                     scsi_free_cache_pkt     
                                                     scsi_destroy_pkt        
                                                     vhci_scsi_destroy_pkt   
                                                     scsi_destroy_pkt        
                                                     sd_destroypkt_for_buf   
                                                     sd_return_command       
                                                     sdintr                  
                                                     scsi_hba_pkt_comp       
                                                     vhci_intr               
                                                     scsi_hba_pkt_comp       
                                                     mpt_doneq_empty         
                                                     mpt_intr                




On 24 Feb 2010, at 10:31, Bart Smaalders wrote:>> 
> 
> Is this using mpt driver?  There''s an issue w/ the fix for
> 6863127 that causes performance problems on larger memory
> machines, filed as 6908360.
> 
> - Bart
> 
> 
> 
> 
> 
> 
> 
> -- 
> Bart Smaalders			Solaris Kernel Performance
> barts at cyber.eng.sun.com		http://blogs.sun.com/barts
> "You will contribute more with mercurial than with thunderbird."
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bruno Sousa

2010-Feb-24 20:57 UTC

head link

[zfs-discuss] snv_133 - high cpu

Yes i''m using the mtp driver . In total this system has 3
HBA''s, 1
internal (Dell perc), and 2 Sun non-raid HBA''s.
I''m also using multipath, but if i disable multipath i have pretty much
the same results..

Bruno


On 24-2-2010 19:42, Andy Bowers wrote:> Hi Bart,
> 	yep, I got Bruno to run a kernel profile lockstat...
>
> 	it does look like the mpt issue..
>
> andy
>
>
:-------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
>  2861   7%  55% 0.00     4889 cpu[1]+5               do_splx
>
>       nsec ------ Time Distribution ------ count     Stack
>       1024 |                               1         xc_common
>       2048 |@@                             213       xc_call
>       4096 |@@@@@@@@@@@                    1136      hat_tlb_inval
>       8192 |@@@@@@@@@@@@                   1237      x86pte_inval
>      16384 |@@                             256       hat_pte_unmap
>      32768 |                               15        hat_unload_callback
>      65536 |                               1         hat_unload
>     131072 |                               2         segkmem_free_vn
>                                                      segkmem_free
>                                                      vmem_xfree
>                                                      vmem_free
>                                                      kfreea
>                                                      i_ddi_mem_free
>                                                     
rootnex_teardown_copybuf
>                                                     
rootnex_coredma_unbindhdl
>                                                      rootnex_dma_unbindhdl
>                                                      ddi_dma_unbind_handle
>                                                      scsi_dmafree_attr
>                                                      scsi_free_cache_pkt
>
-------------------------------------------------------------------------------
> Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
>  1857   5%  59% 0.00     1907 cpu[0]+5               getctgsz
>
>       nsec ------ Time Distribution ------ count     Stack
>       1024 |@@@                            206       kfreea
>       2048 |@@@@@@@@@@@@@@@@@@@            1203      i_ddi_mem_free
>       4096 |@@@@@@                         387      
rootnex_teardown_copybuf
>       8192 |                               24       
rootnex_coredma_unbindhdl
>      16384 |                               25        rootnex_dma_unbindhdl
>      32768 |                               12        ddi_dma_unbind_handle
>                                                      scsi_dmafree_attr
>                                                      scsi_free_cache_pkt
>                                                      scsi_destroy_pkt
>                                                      vhci_scsi_destroy_pkt
>                                                      scsi_destroy_pkt
>                                                      sd_destroypkt_for_buf
>                                                      sd_return_command
>                                                      sdintr
>                                                      scsi_hba_pkt_comp
>                                                      vhci_intr
>                                                      scsi_hba_pkt_comp
>                                                      mpt_doneq_empty
>                                                      mpt_intr
>
>
>
>
> On 24 Feb 2010, at 10:31, Bart Smaalders wrote:
>   
>>>       
>> Is this using mpt driver?  There''s an issue w/ the fix for
>> 6863127 that causes performance problems on larger memory
>> machines, filed as 6908360.
>>
>> - Bart
>>
>>
>>
>>
>>
>>
>>
>> -- 
>> Bart Smaalders			Solaris Kernel Performance
>> barts at cyber.eng.sun.com		http://blogs.sun.com/barts
>> "You will contribute more with mercurial than with
thunderbird."
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>     
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100224/d6043bec/attachment.bin>

Bart Smaalders

2010-Feb-25 02:22 UTC

head link

[zfs-discuss] snv_133 - high cpu

On 02/24/10 12:57, Bruno Sousa wrote:> Yes i''m using the mtp driver . In total this system has 3
HBA''s, 1
> internal (Dell perc), and 2 Sun non-raid HBA''s.
> I''m also using multipath, but if i disable multipath i have pretty
much
> the same results..
>
> Bruno
>
 From what I understand, the fix is expected "very soon"; your 
performance is getting killed by the over-aggressive use of
bounce buffers...

- Bart


-- 
Bart Smaalders			Solaris Kernel Performance
barts at cyber.eng.sun.com		http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."

Bruno Sousa

2010-Feb-25 07:25 UTC

head link

[zfs-discuss] snv_133 - high cpu

Hi,

Until it''s fixed the 132 build should be used instead of the 133?

Bruno
On 25-2-2010 3:22, Bart Smaalders wrote:> On 02/24/10 12:57, Bruno Sousa wrote:
>> Yes i''m using the mtp driver . In total this system has 3
HBA''s, 1
>> internal (Dell perc), and 2 Sun non-raid HBA''s.
>> I''m also using multipath, but if i disable multipath i have
pretty much
>> the same results..
>>
>> Bruno
>>
>
> From what I understand, the fix is expected "very soon"; your
> performance is getting killed by the over-aggressive use of
> bounce buffers...
>
> - Bart
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3656 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100225/a320aaee/attachment.bin>

Marion Hakanson

2010-Feb-25 20:25 UTC

head link

[zfs-discuss] Interrupt sharing

dd-b at dd-b.net said:> I know from startup log messages that I''ve got several interrupts
being
> shared.  I''ve been wondering how serious this is.  I
don''t have any
> particular performance problems, but then again my cpu and motherboard are
> from 2006 and I''d like to extend their service life, so using them
more
> efficiently isn''t a bad idea.  Plus it''s all a learning
experience :-).
Mine''s from 2004, and I''ve been going through the same
adjustments here.

> While I see the relevance to diagnosing performance problems, for my case,
is
> there likely to be anything I can do about interrupt assignments?  Or is
this
> something that, if it''s a problem, is an unfixable problem (short
of changing
> hardware)?  I think there''s BIOS stuff to shuffle interrupt
assignments some,
> but do changes at that level survive kernel startup, or get overwritten? 
Experience with my motherboard is that even when you switch the BIOS
"Plug-n-Play OS" setting between "No" and "Yes",
Solaris-10 doesn''t
seem to change where it maps any devices.  Probably a removal of the
/etc/path_to_inst file and reconfiguration reboot would be required,
but even that won''t move devices required for booting.

Also, the onboard devices (like your nv_sata, ehci, etc.) are not likely
to move around at all.  Only things that could be moved to different
PCI/PCI-X/PCIe slots are likely to move.  Ran across this note:
    http://blogs.sun.com/sming56/entry/interrupts_output_in_mdb

I found it pretty time-consuming just mapping the OS''s device instance
numbers to the physical devices.  Taking the device instance numbers
from "intrstat" or "echo ''::interrupts -d'' |
mdb -k" and digging through
the output of "prtconf -Dv" and/or boot-up /var/adm/messages stuff was
pretty tedious.

Check out what mine looks like, in particular the case where four devices
share the same interrupt -- the two onboard SATA ports, onboard ethernet,
and one slow-mode USB port (Intel ICH5 chipset).  There doesn''t appear
to
be a thing you can do about this sharing.  The system''s never seemed
slow,
though I do try to avoid using that particular USB port.

# echo ''::interrupts -d'' | mdb -k
IRQ  Vector IPL Bus   Type  CPU Share APIC/INT# Driver Name(s)
1    0x41   5   ISA   Fixed 0   1     0x0/0x1   i8042#0
6    0x43   5   ISA   Fixed 0   1     0x0/0x6   fdc#0
9    0x81   9   PCI   Fixed 0   1     0x0/0x9   acpi_wrapper_isr
12   0x42   5   ISA   Fixed 0   1     0x0/0xc   i8042#0
15   0x44   5   ISA   Fixed 0   1     0x0/0xf   ata#1
16   0x82   9   PCI   Fixed 0   3     0x0/0x10  uhci#3, uhci#0, nvidia#0
17   0x86   9   PCI   Fixed 0   1     0x0/0x11  audio810#0
18   0x85   9   PCI   Fixed 0   4     0x0/0x12  pci-ide#1, e1000g#0, uhci#2,
pci-ide#1
19   0x84   9   PCI   Fixed 0   1     0x0/0x13  uhci#1
22   0x40   5   PCI   Fixed 0   1     0x0/0x16  pci-ide#2
23   0x83   9   PCI   Fixed 0   1     0x0/0x17  ehci#0
160  0xa0   0         IPI   ALL 0     -         poke_cpu
192  0xc0   13        IPI   ALL 1     -         xc_serv
208  0xd0   14        IPI   ALL 1     -         kcpc_hw_overflow_intr
209  0xd1   14        IPI   ALL 1     -         cbe_fire
210  0xd3   14        IPI   ALL 1     -         cbe_fire
240  0xe0   15        IPI   ALL 1     -         xc_serv
241  0xe1   15        IPI   ALL 1     -         apic_error_intr
# 

Regards,

Marion

zfs discuss - Feb 2010 - controller cache instead of dedicated ZIL device

[zfs-discuss] controller cache instead of dedicated ZIL device

[zfs-discuss] controller cache instead of dedicated ZIL device

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] Interrupt sharing

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] snv_133 - high cpu

[zfs-discuss] Interrupt sharing