thr3ads.net - zfs discuss - [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Leon Koll

2006-Aug-07 11:49 UTC

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB
LUNs, connected via FC SAN.
The filesystems that were created on LUNS: UFS,VxFS,ZFS.
Unfortunately the ZFS test couldn''t complete bacuase the box was hung
under very moderate load (3000 IOPs).
Additional tests were done using UFS and VxFS that were built on ZFS
raw devices (Zvolumes).
Results can be seen here:
http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html

-- Leon

William D. Hathaway

2006-Aug-07 12:18 UTC

head link

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

If this is reproducible, can you force a panic so it can be analyzed?
 
 
This message posted from opensolaris.org

Leon Koll

2006-Aug-07 12:27 UTC

head link

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

On 8/7/06, William D. Hathaway <william.hathaway at versatile.com>
wrote:> If this is reproducible, can you force a panic so it can be analyzed?
>The core files and explorer output are here:
http://napobo3.lk.net/vinc/
The core files were created after the box was hung....break to OBP...sync

George Wilson

2006-Aug-07 15:10 UTC

head link

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

Leon,

Looking at the corefile doesn''t really show much from the zfs side. It 
looks like you were having problems with your san though:

/scsi_vhci/ssd at g001738010080001c (ssd5) offline
/scsi_vhci/ssd at g001738010080001c (ssd5) multipath status: failed, path 
/pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address: 
w1000001738043811,3 is offline Load balancing: none
/scsi_vhci/ssd at g001738010080001f (ssd6) offline
/scsi_vhci/ssd at g001738010080001f (ssd6) multipath status: failed, path 
/pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address: 
w1000001738043811,2 is offline Load balancing: none
/scsi_vhci/ssd at g001738010080001e (ssd7) offline
/scsi_vhci/ssd at g001738010080001e (ssd7) multipath status: failed, path 
/pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address: 
w1000001738043811,1 is offline Load balancing: none
WARNING: /scsi_vhci/ssd at g001738010080001a (ssd8):
         transport rejected fatal error

WARNING: fp(0)::GPN_ID for D_ID=10400 failed

WARNING: fp(0)::N_x Port with D_ID=10400, PWWN=1000001738279c10 
disappeared from fabric

/pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
         Lun=0 for target=10400 disappeared
WARNING: /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
         FCP: target=10400 reported NO Luns
WARNING: fp(0)::GPN_ID for D_ID=10400 failed

WARNING: fp(0)::N_x Port with D_ID=10400, PWWN=1000001738279c10 
disappeared from fabric

/pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
         Lun=0 for target=10400 disappeared
WARNING: /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
         FCP: target=10400 reported NO Luns
/pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
         Lun=0 for target=10400 disappeared
WARNING: /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
         FCP: target=10400 reported NO Luns
/scsi_vhci/ssd at g001738010080001c (ssd5) multipath status: failed, path 
/pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address: 
w1000001738043811,3 is offline Load balancing: none
/scsi_vhci/ssd at g001738010080001c (ssd5) offline
/scsi_vhci/ssd at g001738010080001f (ssd6) multipath status: failed, path 
/pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address: 
w1000001738043811,2 is offline Load balancing: none
/scsi_vhci/ssd at g001738010080001f (ssd6) offline
/scsi_vhci/ssd at g001738010080001e (ssd7) multipath status: failed, path 
/pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address: 
w1000001738043811,1 is offline Load balancing: none
/scsi_vhci/ssd at g001738010080001e (ssd7) offline
/scsi_vhci/ssd at g001738010080001a (ssd8) multipath status: failed, path 
/pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address: 
w1000001738043811,0 is offline Load balancing: none

panic[cpu0]/thread=2a10057dcc0:
BAD TRAP: type=31 rp=2a10057cee0 addr=0 mmu_fsr=0 occurred in module 
"unix" due
to a NULL pointer dereference

Can you reproduce this hang?

Thanks,
George

Leon Koll wrote:> On 8/7/06, William D. Hathaway <william.hathaway at versatile.com>
wrote:
>> If this is reproducible, can you force a panic so it can be analyzed?
>>
> The core files and explorer output are here:
> http://napobo3.lk.net/vinc/
> The core files were created after the box was hung....break to OBP...sync
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Leon Koll

2006-Aug-07 15:38 UTC

head link

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

On 8/7/06, George Wilson <George.Wilson at sun.com>
wrote:> Leon,
>
> Looking at the corefile doesn''t really show much from the zfs
side. It
> looks like you were having problems with your san though:
>
> /scsi_vhci/ssd at g001738010080001c (ssd5) offline
> /scsi_vhci/ssd at g001738010080001c (ssd5) multipath status: failed, path
> /pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address:
> w1000001738043811,3 is offline Load balancing: none
> /scsi_vhci/ssd at g001738010080001f (ssd6) offline
> /scsi_vhci/ssd at g001738010080001f (ssd6) multipath status: failed, path
> /pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address:
> w1000001738043811,2 is offline Load balancing: none
> /scsi_vhci/ssd at g001738010080001e (ssd7) offline
> /scsi_vhci/ssd at g001738010080001e (ssd7) multipath status: failed, path
> /pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address:
> w1000001738043811,1 is offline Load balancing: none
> WARNING: /scsi_vhci/ssd at g001738010080001a (ssd8):
>          transport rejected fatal error
>
> WARNING: fp(0)::GPN_ID for D_ID=10400 failed
>
> WARNING: fp(0)::N_x Port with D_ID=10400, PWWN=1000001738279c10
> disappeared from fabric
>
> /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
>          Lun=0 for target=10400 disappeared
> WARNING: /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
>          FCP: target=10400 reported NO Luns
> WARNING: fp(0)::GPN_ID for D_ID=10400 failed
>
> WARNING: fp(0)::N_x Port with D_ID=10400, PWWN=1000001738279c10
> disappeared from fabric
>
> /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
>          Lun=0 for target=10400 disappeared
> WARNING: /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
>          FCP: target=10400 reported NO Luns
> /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
>          Lun=0 for target=10400 disappeared
> WARNING: /pci at 1d,700000/SUNW,emlxs at 1,1/fp at 0,0 (fcp0):
>          FCP: target=10400 reported NO Luns
> /scsi_vhci/ssd at g001738010080001c (ssd5) multipath status: failed, path
> /pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address:
> w1000001738043811,3 is offline Load balancing: none
> /scsi_vhci/ssd at g001738010080001c (ssd5) offline
> /scsi_vhci/ssd at g001738010080001f (ssd6) multipath status: failed, path
> /pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address:
> w1000001738043811,2 is offline Load balancing: none
> /scsi_vhci/ssd at g001738010080001f (ssd6) offline
> /scsi_vhci/ssd at g001738010080001e (ssd7) multipath status: failed, path
> /pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address:
> w1000001738043811,1 is offline Load balancing: none
> /scsi_vhci/ssd at g001738010080001e (ssd7) offline
> /scsi_vhci/ssd at g001738010080001a (ssd8) multipath status: failed, path
> /pci at 1d,700000/SUNW,emlxs at 1/fp at 0,0 (fp2) to target address:
> w1000001738043811,0 is offline Load balancing: none
>
> panic[cpu0]/thread=2a10057dcc0:
> BAD TRAP: type=31 rp=2a10057cee0 addr=0 mmu_fsr=0 occurred in module
> "unix" due
> to a NULL pointer dereference
>
> Can you reproduce this hang?
George,

Doing it now.

Thanks,
-- Leon

Spencer Shepler

2006-Aug-07 21:58 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

On Mon, Leon Koll wrote:> I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB
> LUNs, connected via FC SAN.
> The filesystems that were created on LUNS: UFS,VxFS,ZFS.
> Unfortunately the ZFS test couldn''t complete bacuase the box was
hung
> under very moderate load (3000 IOPs).
> Additional tests were done using UFS and VxFS that were built on ZFS
> raw devices (Zvolumes).
> Results can be seen here:
> http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html
Leon,

Might I suggest that you provide the details as specified in the
SPEC SFS run and reporting rules?  They can be buried in a link
from your blog but it would be helpful to have that information
available to your readers.

Spencer

eric kustarz

2006-Aug-07 23:50 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

Leon Koll wrote:
> I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB
> LUNs, connected via FC SAN.
> The filesystems that were created on LUNS: UFS,VxFS,ZFS.
> Unfortunately the ZFS test couldn''t complete bacuase the box was
hung
> under very moderate load (3000 IOPs).
> Additional tests were done using UFS and VxFS that were built on ZFS
> raw devices (Zvolumes).
> Results can be seen here:
> http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html
>
hiya leon,

Out of curiosity, how was the setup for each filesystem type done?

I wasn''t sure what "4 ZFS''es" in "The bad news
that the test on 4 ZFS''es
couldn''t run at all" meant... so something like ''zpool
status'' would be
great.

Do you know what you''re limiting factor was for ZFS (CPU, memory,
I/O...)?

eric
> -- Leon
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Leon Koll

2006-Aug-08 15:18 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

On 8/8/06, eric kustarz <eric.kustarz at sun.com>
wrote:> Leon Koll wrote:
>
> > I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB
> > LUNs, connected via FC SAN.
> > The filesystems that were created on LUNS: UFS,VxFS,ZFS.
> > Unfortunately the ZFS test couldn''t complete bacuase the box
was hung
> > under very moderate load (3000 IOPs).
> > Additional tests were done using UFS and VxFS that were built on ZFS
> > raw devices (Zvolumes).
> > Results can be seen here:
> >
http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html
> >
>
> hiya leon,
>
> Out of curiosity, how was the setup for each filesystem type done?
>
> I wasn''t sure what "4 ZFS''es" in "The bad
news that the test on 4 ZFS''es
> couldn''t run at all" meant... so something like
''zpool status'' would be
> great.
Hi Eric,
here it is:

root at vinc ~ # zpool status
  pool: pool1
 state: ONLINE
 scrub: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        pool1                    ONLINE       0     0     0
          c4t001738010140000Bd0  ONLINE       0     0     0

errors: No known data errors

  pool: pool2
 state: ONLINE
 scrub: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        pool2                    ONLINE       0     0     0
          c4t001738010140000Cd0  ONLINE       0     0     0

errors: No known data errors

  pool: pool3
 state: ONLINE
 scrub: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        pool3                    ONLINE       0     0     0
          c4t001738010140001Cd0  ONLINE       0     0     0

errors: No known data errors

  pool: pool4
 state: ONLINE
 scrub: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        pool4                    ONLINE       0     0     0
          c4t0017380101400012d0  ONLINE       0     0     0

errors: No known data errors
>
> Do you know what you''re limiting factor was for ZFS (CPU, memory,
I/O...)?
Thanks to George Wilson who pointed me to the fact that the memory was
fully consumed.
I removed the line
"set ncsize = 0x100000" from /etc/system
and the now the host isn''t hung during the test anymore.
But performance is still an issue.

-- Leon

eric kustarz

2006-Aug-08 18:19 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

Leon Koll wrote:
> On 8/8/06, eric kustarz <eric.kustarz at sun.com> wrote:
>
>> Leon Koll wrote:
>>
>> > I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4
64GB
>> > LUNs, connected via FC SAN.
>> > The filesystems that were created on LUNS: UFS,VxFS,ZFS.
>> > Unfortunately the ZFS test couldn''t complete bacuase the
box was hung
>> > under very moderate load (3000 IOPs).
>> > Additional tests were done using UFS and VxFS that were built on
ZFS
>> > raw devices (Zvolumes).
>> > Results can be seen here:
>> > 
>>
http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html
>> >
>>
>> hiya leon,
>>
>> Out of curiosity, how was the setup for each filesystem type done?
>>
>> I wasn''t sure what "4 ZFS''es" in "The
bad news that the test on 4 ZFS''es
>> couldn''t run at all" meant... so something like
''zpool status'' would be
>> great.
>
>
> Hi Eric,
> here it is:
>
> root at vinc ~ # zpool status
>  pool: pool1
> state: ONLINE
> scrub: none requested
> config:
>
>        NAME                     STATE     READ WRITE CKSUM
>        pool1                    ONLINE       0     0     0
>          c4t001738010140000Bd0  ONLINE       0     0     0
>
> errors: No known data errors
>
>  pool: pool2
> state: ONLINE
> scrub: none requested
> config:
>
>        NAME                     STATE     READ WRITE CKSUM
>        pool2                    ONLINE       0     0     0
>          c4t001738010140000Cd0  ONLINE       0     0     0
>
> errors: No known data errors
>
>  pool: pool3
> state: ONLINE
> scrub: none requested
> config:
>
>        NAME                     STATE     READ WRITE CKSUM
>        pool3                    ONLINE       0     0     0
>          c4t001738010140001Cd0  ONLINE       0     0     0
>
> errors: No known data errors
>
>  pool: pool4
> state: ONLINE
> scrub: none requested
> config:
>
>        NAME                     STATE     READ WRITE CKSUM
>        pool4                    ONLINE       0     0     0
>          c4t0017380101400012d0  ONLINE       0     0     0
>
> errors: No known data errors

So having 4 pools isn''t a recommended config - i would destroy those 4 
pools and just create 1 RAID-0 pool:
#zpool create sfsrocks c4t001738010140000Bd0 c4t001738010140000Cd0 
c4t001738010140001Cd0 c4t0017380101400012d0

each of those devices is a 64GB lun, right?
>
>>
>> Do you know what you''re limiting factor was for ZFS (CPU,
memory,
>> I/O...)?
>
>
> Thanks to George Wilson who pointed me to the fact that the memory was
> fully consumed.
> I removed the line
> "set ncsize = 0x100000" from /etc/system
> and the now the host isn''t hung during the test anymore.
> But performance is still an issue.
>
ah, you were limiting the # of dnlc entries... so you''re still seeing 
ZFS max out at 2000 ops/s?  Let us know what happends when you switch to 
1 pool.

eric

Leon Koll

2006-Aug-09 21:34 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

<...>> So having 4 pools isn''t a recommended config - i would destroy
those 4
> pools and just create 1 RAID-0 pool:
> #zpool create sfsrocks c4t001738010140000Bd0 c4t001738010140000Cd0
> c4t001738010140001Cd0 c4t0017380101400012d0
>
> each of those devices is a 64GB lun, right?
I did it - created one pool, 4*64GB size, and running the benchmark now.
I''ll update you on results, but one pool is definitely not what I need.
My target is - SunCluster with HA ZFS where I need 2 or 4 pools per node.
>
> >
> >>
> >> Do you know what you''re limiting factor was for ZFS (CPU,
memory,
> >> I/O...)?
> >
> >
> > Thanks to George Wilson who pointed me to the fact that the memory was
> > fully consumed.
> > I removed the line
> > "set ncsize = 0x100000" from /etc/system
> > and the now the host isn''t hung during the test anymore.
> > But performance is still an issue.
> >
>
> ah, you were limiting the # of dnlc entries... so you''re still
seeing
> ZFS max out at 2000 ops/s?  Let us know what happends when you switch to
> 1 pool.
I''d say "increasing" instead of "limiting".

TIA,
-- Leon

eric kustarz

2006-Aug-11 01:04 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

Leon Koll wrote:
> <...>
>
>> So having 4 pools isn''t a recommended config - i would destroy
those 4
>> pools and just create 1 RAID-0 pool:
>> #zpool create sfsrocks c4t001738010140000Bd0 c4t001738010140000Cd0
>> c4t001738010140001Cd0 c4t0017380101400012d0
>>
>> each of those devices is a 64GB lun, right?
>
>
> I did it - created one pool, 4*64GB size, and running the benchmark now.
> I''ll update you on results, but one pool is definitely not what I
need.
> My target is - SunCluster with HA ZFS where I need 2 or 4 pools per node.
>Why do you need 2 or 4 pools per node?

If you''re doing HA-ZFS (which is SunCluster 3.2 - only available in
beta
right now), then you should divide your storage up to the number of 
*active* pools.  So say you have 2 nodes and 4 luns (each lun being 
64GB), and only need one active node - then you can create one pool of 
all 4 luns, and attach the 4 luns to both nodes.

The way HA-ZFS basically works is that when the "active" node fails,
it
does a ''zpool export'', and the takeover node does a
''zpool import''.  So
both nodes are using the same storage, but they cannot use the same 
storage at the same time, see:
http://www.opensolaris.org/jive/thread.jspa?messageID=49617

If however, you have 2 nodes, 4 luns, and wish both nodes to be active, 
then you can divy up the storage into two pools.  So each node has one 
active pool of 2 luns.  All 4 luns are doubly attached to both nodes, 
and when one node fails, the takeover node then has 2 active pools.

So how many nodes do you have? and how many do you wish to be "active"
at a time?

And what was your configuration for VxFS and SVM/UFS?

eric

Robert Milkowski

2006-Aug-11 05:24 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

Hello eric,

Friday, August 11, 2006, 3:04:38 AM, you wrote:

ek> Leon Koll wrote:
>> <...>
>>
>>> So having 4 pools isn''t a recommended config - i would
destroy those 4
>>> pools and just create 1 RAID-0 pool:
>>> #zpool create sfsrocks c4t001738010140000Bd0 c4t001738010140000Cd0
>>> c4t001738010140001Cd0 c4t0017380101400012d0
>>>
>>> each of those devices is a 64GB lun, right?
>>
>>
>> I did it - created one pool, 4*64GB size, and running the benchmark
now.
>> I''ll update you on results, but one pool is definitely not
what I need.
>> My target is - SunCluster with HA ZFS where I need 2 or 4 pools per
node.
>>ek> Why do you need 2 or 4 pools per node?

ek> If you''re doing HA-ZFS (which is SunCluster 3.2 - only available
in beta
ek> right now), then you should divide your storage up to the number of 
ek> *active* pools.  So say you have 2 nodes and 4 luns (each lun being 
ek> 64GB), and only need one active node - then you can create one pool of
ek> all 4 luns, and attach the 4 luns to both nodes.

ek> The way HA-ZFS basically works is that when the "active" node
fails, it
ek> does a ''zpool export'', and the takeover node does a
''zpool import''.  So
ek> both nodes are using the same storage, but they cannot use the same 
ek> storage at the same time, see:
ek> http://www.opensolaris.org/jive/thread.jspa?messageID=49617

ek> If however, you have 2 nodes, 4 luns, and wish both nodes to be active,
ek> then you can divy up the storage into two pools.  So each node has one
ek> active pool of 2 luns.  All 4 luns are doubly attached to both nodes, 
ek> and when one node fails, the takeover node then has 2 active pools.

ek> So how many nodes do you have? and how many do you wish to be
"active"
ek> at a time?

ek> And what was your configuration for VxFS and SVM/UFS?

With 2-node NFS clusters normally I have one active one standby.
However with many disks I always configure it that way so I have a
possibility to split workload (pools,filesystems,...). I do it in that
way I create two cluster groups each with its own IP, disks, etc.

That way is I do have a performance problem related to a server
performance and not an array itself I can quickly and temporarily
solve it.

So I think it is good to create at least two ZFS pools, two SC groups
and normally set primary node for those two groups to the same node.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Leon Koll

2006-Aug-11 05:47 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

On 8/11/06, eric kustarz <eric.kustarz at sun.com>
wrote:> Leon Koll wrote:
>
> > <...>
> >
> >> So having 4 pools isn''t a recommended config - i would
destroy those 4
> >> pools and just create 1 RAID-0 pool:
> >> #zpool create sfsrocks c4t001738010140000Bd0 c4t001738010140000Cd0
> >> c4t001738010140001Cd0 c4t0017380101400012d0
> >>
> >> each of those devices is a 64GB lun, right?
> >
> >
> > I did it - created one pool, 4*64GB size, and running the benchmark
now.
> > I''ll update you on results, but one pool is definitely not
what I need.
> > My target is - SunCluster with HA ZFS where I need 2 or 4 pools per
node.
> >
> Why do you need 2 or 4 pools per node?
>
> If you''re doing HA-ZFS (which is SunCluster 3.2 - only available
in beta
> right now), then you should divide your storage up to the number of
I know, I run the 3.2  now.> *active* pools.  So say you have 2 nodes and 4 luns (each lun being
> 64GB), and only need one active node - then you can create one pool of
To have one active node doesn''t look smart to me. I want to distribute
load between 2 nodes, not to have 1 active and 1 standby.
The LUN size in this test is 64GB but in real configuration it will be
6TB> all 4 luns, and attach the 4 luns to both nodes.
>
> The way HA-ZFS basically works is that when the "active" node
fails, it
> does a ''zpool export'', and the takeover node does a
''zpool import''.  So
> both nodes are using the same storage, but they cannot use the same
> storage at the same time, see:
> http://www.opensolaris.org/jive/thread.jspa?messageID=49617
Yes, it works this way.>
> If however, you have 2 nodes, 4 luns, and wish both nodes to be active,
> then you can divy up the storage into two pools.  So each node has one
> active pool of 2 luns.  All 4 luns are doubly attached to both nodes,
> and when one node fails, the takeover node then has 2 active pools.
I agree with you - I can have 2 active pools, not 4 in case of
dual-node cluster.>
> So how many nodes do you have? and how many do you wish to be
"active"
> at a time?
Currently - 2 nodes, both active. If I define 4 pools, I can easily
expand the cluster to the 4-nodes configuration, that may be the good
reason to have 4 pools.>
> And what was your configuration for VxFS and SVM/UFS?
4 SVM concat volumes (I need a concatenation of 1TB LUNs if I am in
SC3.1 that doesn''t support EFI label) with UFS or VxFS on top.

And now comes the questions - my short test showed that 1-pool config
doesn''t behave better than 4-pools one - with the first the box was
hung, with the second - didn''t.
Why do you think the 1-pool config is better?

TIA,
-- Leon

eric kustarz

2006-Aug-11 18:04 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

Leon Koll wrote:
> On 8/11/06, eric kustarz <eric.kustarz at sun.com> wrote:
>
>> Leon Koll wrote:
>>
>> > <...>
>> >
>> >> So having 4 pools isn''t a recommended config - i
would destroy
>> those 4
>> >> pools and just create 1 RAID-0 pool:
>> >> #zpool create sfsrocks c4t001738010140000Bd0
c4t001738010140000Cd0
>> >> c4t001738010140001Cd0 c4t0017380101400012d0
>> >>
>> >> each of those devices is a 64GB lun, right?
>> >
>> >
>> > I did it - created one pool, 4*64GB size, and running the
benchmark
>> now.
>> > I''ll update you on results, but one pool is definitely
not what I
>> need.
>> > My target is - SunCluster with HA ZFS where I need 2 or 4 pools
per
>> node.
>> >
>> Why do you need 2 or 4 pools per node?
>>
>> If you''re doing HA-ZFS (which is SunCluster 3.2 - only
available in beta
>> right now), then you should divide your storage up to the number of
>
>
> I know, I run the 3.2  now.
>
>> *active* pools.  So say you have 2 nodes and 4 luns (each lun being
>> 64GB), and only need one active node - then you can create one pool of
>
>
> To have one active node doesn''t look smart to me. I want to
distribute
> load between 2 nodes, not to have 1 active and 1 standby.
> The LUN size in this test is 64GB but in real configuration it will be 
> 6TB
>
>> all 4 luns, and attach the 4 luns to both nodes.
>>
>> The way HA-ZFS basically works is that when the "active" node
fails, it
>> does a ''zpool export'', and the takeover node does a
''zpool import''.  So
>> both nodes are using the same storage, but they cannot use the same
>> storage at the same time, see:
>> http://www.opensolaris.org/jive/thread.jspa?messageID=49617
>
>
> Yes, it works this way.
>
>>
>> If however, you have 2 nodes, 4 luns, and wish both nodes to be active,
>> then you can divy up the storage into two pools.  So each node has one
>> active pool of 2 luns.  All 4 luns are doubly attached to both nodes,
>> and when one node fails, the takeover node then has 2 active pools.
>
>
> I agree with you - I can have 2 active pools, not 4 in case of
> dual-node cluster.
>
>>
>> So how many nodes do you have? and how many do you wish to be
"active"
>> at a time?
>
>
> Currently - 2 nodes, both active. If I define 4 pools, I can easily
> expand the cluster to the 4-nodes configuration, that may be the good
> reason to have 4 pools.

Ok, that makes sense.
>>
>> And what was your configuration for VxFS and SVM/UFS?
>
>
> 4 SVM concat volumes (I need a concatenation of 1TB LUNs if I am in
> SC3.1 that doesn''t support EFI label) with UFS or VxFS on top.

So you have 2 nodes, 2 file systems (of either UFS or VxFS) on each node?

I''m just trying to make sure its a fair comparison bewteen ZFS, UFS,
and
VxFS.
>
> And now comes the questions - my short test showed that 1-pool config
> doesn''t behave better than 4-pools one - with the first the box
was
> hung, with the second - didn''t.
> Why do you think the 1-pool config is better?

I suggested the 1 pool config before i knew you were doing HA-ZFS :)  
Purposely dividing up your storage (by creating separate pools) in a 
non-clustered environment usually doesn''t make sense (root being one 
notable exception).

eric

Leon Koll

2006-Aug-11 21:45 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

On 8/11/06, eric kustarz <eric.kustarz at sun.com>
wrote:> Leon Koll wrote:
>
> > On 8/11/06, eric kustarz <eric.kustarz at sun.com> wrote:
> >
> >> Leon Koll wrote:
> >>
> >> > <...>
> >> >
> >> >> So having 4 pools isn''t a recommended config - i
would destroy
> >> those 4
> >> >> pools and just create 1 RAID-0 pool:
> >> >> #zpool create sfsrocks c4t001738010140000Bd0
c4t001738010140000Cd0
> >> >> c4t001738010140001Cd0 c4t0017380101400012d0
> >> >>
> >> >> each of those devices is a 64GB lun, right?
> >> >
> >> >
> >> > I did it - created one pool, 4*64GB size, and running the
benchmark
> >> now.
> >> > I''ll update you on results, but one pool is
definitely not what I
> >> need.
> >> > My target is - SunCluster with HA ZFS where I need 2 or 4
pools per
> >> node.
> >> >
> >> Why do you need 2 or 4 pools per node?
> >>
> >> If you''re doing HA-ZFS (which is SunCluster 3.2 - only
available in beta
> >> right now), then you should divide your storage up to the number
of
> >
> >
> > I know, I run the 3.2  now.
> >
> >> *active* pools.  So say you have 2 nodes and 4 luns (each lun
being
> >> 64GB), and only need one active node - then you can create one
pool of
> >
> >
> > To have one active node doesn''t look smart to me. I want to
distribute
> > load between 2 nodes, not to have 1 active and 1 standby.
> > The LUN size in this test is 64GB but in real configuration it will be
> > 6TB
> >
> >> all 4 luns, and attach the 4 luns to both nodes.
> >>
> >> The way HA-ZFS basically works is that when the "active"
node fails, it
> >> does a ''zpool export'', and the takeover node
does a ''zpool import''.  So
> >> both nodes are using the same storage, but they cannot use the
same
> >> storage at the same time, see:
> >> http://www.opensolaris.org/jive/thread.jspa?messageID=49617
> >
> >
> > Yes, it works this way.
> >
> >>
> >> If however, you have 2 nodes, 4 luns, and wish both nodes to be
active,
> >> then you can divy up the storage into two pools.  So each node has
one
> >> active pool of 2 luns.  All 4 luns are doubly attached to both
nodes,
> >> and when one node fails, the takeover node then has 2 active
pools.
> >
> >
> > I agree with you - I can have 2 active pools, not 4 in case of
> > dual-node cluster.
> >
> >>
> >> So how many nodes do you have? and how many do you wish to be
"active"
> >> at a time?
> >
> >
> > Currently - 2 nodes, both active. If I define 4 pools, I can easily
> > expand the cluster to the 4-nodes configuration, that may be the good
> > reason to have 4 pools.
>
>
> Ok, that makes sense.
>
> >>
> >> And what was your configuration for VxFS and SVM/UFS?
> >
> >
> > 4 SVM concat volumes (I need a concatenation of 1TB LUNs if I am in
> > SC3.1 that doesn''t support EFI label) with UFS or VxFS on
top.
>
>
> So you have 2 nodes, 2 file systems (of either UFS or VxFS) on each node?
I have 2 nodes, 2 file systems per node. One share is working via
bge0, the second one - via bge1.
>
> I''m just trying to make sure its a fair comparison bewteen ZFS,
UFS, and
> VxFS.
After I saw that ZFS performance (when the box isn''t stuck) is about 3
times lower than UFS/VxFS, I understood I should wait with ZFS for
Solaris 11official release.
I don''t believe that it''s possible to do some magic with my
setup and
increase the ZFS performance 3 times. Fix me if I''m wrong.
>
> >
> > And now comes the questions - my short test showed that 1-pool config
> > doesn''t behave better than 4-pools one - with the first the
box was
> > hung, with the second - didn''t.
> > Why do you think the 1-pool config is better?
>
>
> I suggested the 1 pool config before i knew you were doing HA-ZFS :)
> Purposely dividing up your storage (by creating separate pools) in a
> non-clustered environment usually doesn''t make sense (root being
one
> notable exception).
I see.
Thanks,
-- Leon

eric kustarz

2006-Aug-11 21:50 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

> After I saw that ZFS performance (when the box isn''t stuck) is
about 3
> times lower than UFS/VxFS, I understood I should wait with ZFS for
> Solaris 11official release.
> I don''t believe that it''s possible to do some magic with
my setup and
> increase the ZFS performance 3 times. Fix me if I''m wrong.

Yep, we''re working on this right now, though you shouldn''t
have to wait
until Solaris11 - hopefully a s10 update will be out earlier with the 
proper perf fixes.  U3 already has some improvements over U2 (which you 
were running).

I''m actually doing specSFS benchmarking right now, and i''ll
keep the
list updated.

eric

Frank Cusack

2006-Aug-18 20:40 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

On August 10, 2006 6:04:38 PM -0700 eric kustarz <eric.kustarz at sun.com>
wrote:> If you''re doing HA-ZFS (which is SunCluster 3.2 - only available
in beta right now),
Is the 3.2 beta publicly available?  I can only locate 3.1.

-frank

George Wilson

2006-Aug-18 21:08 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

Frank,

The SC 3.2 beta maybe closed but I''m forwarding your request to Eric 
Redmond.

Thanks,
George

Frank Cusack wrote:> On August 10, 2006 6:04:38 PM -0700 eric kustarz <eric.kustarz at
sun.com>
> wrote:
>> If you''re doing HA-ZFS (which is SunCluster 3.2 - only
available in
>> beta right now),
> 
> Is the 3.2 beta publicly available?  I can only locate 3.1.
> 
> -frank
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Eric Redmond

2006-Aug-19 17:53 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

George Wilson wrote On 08/18/06 14:08,:
> Frank,
>
> The SC 3.2 beta maybe closed but I''m forwarding your request to
Eric
> Redmond.
The Sun Cluster 3.2 Beta program has been extended. You can apply for
the Beta via this URL:
https://feedbackprograms.sun.com/callout/default.html?callid={11B4E37C-D608-433B-AF69-07F6CD714AA1}
------------------------------------------------------------------------

Sun Cluster 3.2: New Features

*Ease of Use*

_*New Sun Cluster Object Oriented Command Set*_

The new SC command line interface includes one command per cluster
object type and consistent use of sub-command names and option letters.
It also supports short and long command names. The commands output have
been greatly improved with better help and error messages as well as
more readable status and configuration reporting. In addition some
commands include export and import options with use of portable
XML-based configuration files allowing replication of portion of, or
entire cluster configurations.

This new interface is easier to learn and easier to use, thereby
limiting human errors during administration of clusters. It also speeds
up partial or full configuration cloning.

_*Oracle RAC 10g improved integration and administration*_

Sun Cluster RAC packages installation as well as configuration are now
integrated in the Sun Cluster procedures. New RAC specific resource
types and properties can be used for finer grained control.

Oracle RAC extended manageability leads to easier set-up of Oracle RAC
within Sun Cluster as well as improved diagnosability and availability.

_*Agent configuration wizards*_

New GUI-based wizard provides simplified configuration for popular
applications via on-line help, automatic discovery of parameter choices
and immediate validation. Supported applications include Oracle RAC and
HA, NFS, Apache and SAP.

The configuration of agents is easier and less error-prone enabling
faster set-up of popular solutions.

_*Flexible IP address scheme*_

Sun Cluster now allows a reduced range of IP addresses for its private
interconnect. In addition it becomes also possible to customize the IP
base address and its range during or after installation.

These changes facilitate integration of Sun Cluster environments in
existing networks with limited or regulated address spaces.

*Availability*

_*Cluster support for SMF services*_

Sun Cluster now integrates tightly with Solaris 10 Service Management
Facility (SMF) and enables the encapsulation of SMF controlled
applications in the Sun Cluster resource management model. Local
service-level life-cycle management continues to be operated by SMF
while whole resource level cluster-wide failure (node, storage, ...)
handling operations are carried out by Sun Cluster.

Moving applications from a single node Solaris 10 environment to
multi-node Sun Cluster environment enables increased availability while
requiring limited-to-no effort.

_*Extended flexibility for fencing protocol*_

This new functionality allows the customization of the default fencing
protocol: choices include

SCSI 3, SCSI 2 or per-device discovery.

This flexibility enables the default usage of SCSI 3, a more recent
protocol, for better support for multi-pathing, easier integration with
non-Sun storage and shorter recovery times on newer storage while still
supporting the SC 3.0/3.1 behavior and SCSI 2 for older devices.

_*Quorum Server*_

A new quorum device option is now available in Sun Cluster: instead of
using a shared disk and SCSI reservation protocols, it is now possible
to use a Solaris server outside of the cluster to run a quorum server
module supporting an atomic reservation protocol over TCP/IP.

This enables faster failover time but also lowers deployment costs: it
removes the need of a shared quorum disk for any scenario where quorum
is required (2-node) or desired.

_*Disk path failure handling*_

Sun Cluster can now be configured to automatically reboot a node if all
its path to shared disk have failed.

Faster reaction in case of severe disk path failure enables improved
availability.

_*HA Storage plus availability improvements*_

HA Storage plus mount points are created automatically in case of mount
failure to eliminate failure-to-failover cases thus improving
availability of the environment

*Flexibility*

_*Solaris Container expanded support

Any application of scalable or failover type and their associated Sun
Cluster agents can now run unmodified within Solaris Containers (except
Oracle RAC).

This allows the combination of the benefits of application containment
offered by Solaris containers and the increased availability provided by
Sun Cluster.

Note: Currently only the following Sun Cluster Agents are supported in
Solaris Containers

* JES Application Server
* JES Web Server
* JES MQ Server
* DNS
* Apache
* Kerberos
* HA-Oracle

_*HA ZFS*_

ZFS is supported as a failover file system in Sun Cluster.

ZFS and Sun Cluster offers a best class file system solution combining
high availability, data integrity, performance and scalability covering
the needs of the most demanding environments.

_*HDS TrueCopy campus cluster*_

Sun Cluster based campus clusters now support HDS TrueCopy controller
based replication allowing for automated management of Truecopy
configurations. Sun Cluster handles automatically and transparently the
switch to the secondary campus site in case of fail-over making this
procedure less error-prone and improving the overall availability of the
solution. This new remote data replication infrastructure allows Sun
Cluster to support new configurations for customers who have been
standardizing on specific replication infrastructure like TrueCopy and
for places where host based replication is not a viable solution because
of distance or application incompatibility.

This new combination brings improved availability and less complexity
while lowering cost. Sun Cluster can make use of existing TrueCopy
customer replication infrastructure limiting the need for additional
replication solution.

_*Multi-terabyte disk and EFI label support*_

Sun Cluster configurations can now include disks with capacity over 1TB
thanks to the support of the new disk format called EFI. This format is
required for multi-terabyte disks but can also be used with smaller
capacity disks.

This extends the supported Sun Cluster configurations to environments
with high-end storage requirements.

_*Extended support for Veritas software components *_

Veritas Volume Manager and File System part of Veritas Storage
Foundation 5.0 are now supported on SPARC platforms as well as Veritas
Volume Manager 4.1 with Solaris 10 OS on x86/x64 platforms.

Veritas Volume Replicator (VVR) and Veritas Fast Mirror
Resynchronisation (FMR), part of Veritas FlashSnap can now be used in
Sun Cluster environments on SPARC platforms.

In adding support for Veritas replication and synchronisation
technology, x86/x64 version and the latest release of Veritas software
Sun Cluster provides more choice for customers and allows them to use
Sun Cluster in environments where 3rd. party storage management
solutions like Veritas Storage Foundation are standards.

_*Quota support*_

Quota management can now be used with HA Storageplus on local UFS file
systems for better control of resource consumption.

_*Oracle DataGuard support*_

Customers are now able to operate Oracle Data Guard data replication
configurations under Sun Cluster control. Sun Cluster now offers
improved usability for Oracle deployments including Data Guard data
replication software.

*OAMP*

_*Dual partition software swap*_

With this new software swap feature the upgrade process is greatly
simplified: any component(s) of the software stack along with Sun
Cluster can be upgraded in one step: Solaris, Sun Cluster, File Systems
and Volume Managers, Applications. This automation lowers the risk
induced by human errors during cluster upgrade ,a very complex procedure
and minimizes service outage occurring for a classical cluster upgrade .

_*Live upgrade*_

The Live upgrade procedure can now be used with Sun Cluster. This
procedures allows to reduce system downtime of a node during upgrade as
well as unnecessary reboots therefore lowering the required maintenance
window where the service is at risk.

_*Optional GUI installation*_

Sun Cluster Manager, the Sun Cluster management GUI can be left out
during installation. This removes web based access to the cluster to
comply with potential security rules.

_*SNMP event MIB*_

Sun Cluster includes a new Sun Cluster SNMP event mechanism as well as a
new SNMP MIB. They now allow 3rd. party SNMP management applications to
directly register with Sun Cluster and receive timely notifications of
cluster events.

Fine grained event notification and direct integration with 3rd. party
enterprise management framework through standard SNMP support allow
proactive monitoring and increase availability.

_*Command logging*_

Commands can now be logged within Sun Cluster. This ability facilitates
diagnostics of cluster failures and provides history of the
administration actions for archiving or replication.

_*Workload system resource monitoring*_

Sun Cluster offers new system resources utilization measurement and
visualizatio tools including fine grained measurement of consumptions
per node, resource, resource group. These new tools provide historical
data as well as threshold management and CPU reservation and control.

This improved control allows for better management of service level and
capacity

*Performance *

Several performance improvements have been introduced in this latest Sun
Cluster release.

Sun Cluster Manager previously known as SunPlex Manager has
undergone several performance improvements in particular in
navigating to the different screens. Some operations have been
sped up to four_ times.

PxFS performance improvements of the order of 5-6 times are
possible depending on the workload using Fastwrite option.

Switchover times for HA Storage Plus are improved up to five times
thanks to parallelizing mounting of file systems under
HAStoragePlus control.

<http://www.sun.com/solaris> *Eric Redmond*
Solaris Enterprise System, Beta Program Manager

*Sun Microsystems, Inc.*
17 Network Circle
Menlo Park, CA 94025
Phone: x85550/+1 650 786 5550
Fax: 650-786-5734
Email Eric.Redmond at Sun.COM
<http://www.sun.com/solaris>

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20060819/dc50e091/attachment.html>

Frank Cusack

2006-Aug-19 19:59 UTC

head link

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

On August 19, 2006 10:53:55 AM -0700 Eric Redmond <Eric.Redmond at
Sun.COM> wrote:>       Sun Cluster 3.2: New Features
wow, this makes 3.1 sound like dog food.

-frank

Alan Romeril

2006-Aug-20 11:54 UTC

head link

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

PxFS performance improvements of the order of 5-6 times are possible depending
on the workload using Fastwrite option.

Fantastic!  Has this been targetted at directory operations?  We''ve had
issues with large directorys full of small files being very slow to handle over
PxFS.

Are there plans for PxFS on ZFS any time soon :) ?  Or any plans to release PxFS
as part of opensolaris?

Cheers,
Alan
 
 
This message posted from opensolaris.org

Manoj Joseph

2006-Aug-21 03:20 UTC

head link

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

Alan Romeril wrote:> PxFS performance improvements of the order of 5-6 times are possible
> depending on the workload using Fastwrite option.
> 
> Fantastic!  Has this been targetted at directory operations? 
We''ve
> had issues with large directorys full of small files being very slow
> to handle over PxFS.
The ''fastwrite option'' speeds up write operations. So this
doesn''t do
much for directory operations.
> Are there plans for PxFS on ZFS any time soon :) ?  
PxFS on ZFS is unlikely to happen. A clusterized version of ZFS (as 
mentioned before on this alias) is being considered.
> Or any plans to
> release PxFS as part of opensolaris?
PxFS is tightly couple to the cluster framework. Without open sourcing 
cluster, PxFS in its current form cannot be opensourced as it would not 
make sense.

As for open sourcing cluster, my guess is as good as yours.

Regards,
Manoj

--
Sun Cluster Engineering

zfs discuss - Aug 2006 - SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS

[zfs-discuss] Re: SPEC SFS97 benchmark of ZFS,UFS,VxFS