thr3ads.net - zfs discuss - [zfs-discuss] SSD and ZFS [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Andreas Höschler

2010-Feb-12 13:03 UTC

[zfs-discuss] SSD and ZFS

Hi all,

just after sending a message to sunmanagers I realized that my question 
should rather have gone here. So sunmanagers please excus ethe double 
post:

I have inherited a X4140 (8 SAS slots) and have just setup the system 
with Solaris 10 09. I first setup the system on a mirrored pool over 
the first two disks

   pool: rpool
  state: ONLINE
  scrub: none requested
config:

         NAME          STATE     READ WRITE CKSUM
         rpool         ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c1t0d0s0  ONLINE       0     0     0
             c1t1d0s0  ONLINE       0     0     0

errors: No known data errors

and then tried to add the second pair of disks to this pool which did 
not work (famous error message reagding label, root pool BIOS issue). I 
therefore simply created an additional pool tank.

   pool: rpool
  state: ONLINE
  scrub: none requested
config:

         NAME          STATE     READ WRITE CKSUM
         rpool         ONLINE       0     0     0
           mirror      ONLINE       0     0     0
             c1t0d0s0  ONLINE       0     0     0
             c1t1d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: tank
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         tank        ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             c1t2d0  ONLINE       0     0     0
             c1t3d0  ONLINE       0     0     0

errors: No known data errors

So far so good. I have now replaced the last two SAS disks with 32GB 
SSDs and am wondering how to add these to the system. I googled a lot 
for best practise but found nothing so far that made me any wiser. My 
current approach still is to simply do

	zpool add tank mirror c0t6d0 c0t7d0

as I would do with normal disks but I am wondering whether that''s the 
right approach to significantly increase system performance. Will ZFS 
automatically use these SSDs and optimize accesses to tank? Probably! 
But it won''t optimize accesses to rpool of course. Not sure whether I 
need that or should look for that. Should I try to get all disks into 
rpool inspite of the BIOS label issue so that SSDs are used for all 
accesses to the disk system?

Hints (best practises) are greatly appreciated?

Thanks a lot,

  Andreas

Scott Meilicke

2010-Feb-12 20:19 UTC

head link

[zfs-discuss] SSD and ZFS

I don''t think adding an SSD mirror to an existing pool will do much for
performance. Some of your data will surely go to those SSDs, but I
don''t think the solaris will know they are SSDs and move blocks in and
out according to usage patterns to give you an all around boost. They will just
be used to store data, nothing more.

Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for
the ZIL, but that will depend upon your work load. If you do NFS or iSCSI
access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added to
the L2ARC will speed up reads.

Here is the ZFS best practices guide, which should help with this decision:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Read that, then come back with more questions.

Best,
Scott
-- 
This message posted from opensolaris.org

TMB

2010-Feb-12 22:25 UTC

head link

[zfs-discuss] SSD and ZFS

I have a similar question, I put together a cheapo RAID with four 1TB WD Black
(7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with slice 0
(5GB) for ZIL and the rest of the SSD  for cache:
# zpool status dpool
  pool: dpool
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        dpool       ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t0d1  ONLINE       0     0     0
            c0t0d2  ONLINE       0     0     0
            c0t0d3  ONLINE       0     0     0
[b]        logs
          c0t0d4s0  ONLINE       0     0     0[/b]
[b]        cache
          c0t0d4s1  ONLINE       0     0     0[/b]
        spares
          c0t0d6    AVAIL   
          c0t0d7    AVAIL   

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
dpool       72.1G  3.55T    237     12  29.7M   597K
  raidz1    72.1G  3.55T    237      9  29.7M   469K
    c0t0d0      -      -    166      3  7.39M   157K
    c0t0d1      -      -    166      3  7.44M   157K
    c0t0d2      -      -    166      3  7.39M   157K
    c0t0d3      -      -    167      3  7.45M   157K
  c0t0d4s0    20K  4.97G      0      3      0   127K
cache           -      -      -      -      -      -
  c0t0d4s1  17.6G  36.4G      3      1   249K   119K
----------  -----  -----  -----  -----  -----  -----
I just don''t seem to be getting any bang for the buck I should be. 
This was taken while rebuilding an Oracle index, all files stored in this pool. 
The WD disks are at 100%, and nothing is coming from the cache.  The cache does
have the entire DB cached (17.6G used), but hardly reads anything from it.  I
also am not seeing the spike of data flowing into the ZIL either, although
iostat show there is just write traffic hitting the SSD:

                 extended device statistics                      cpu
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  8  0 82
sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100 
sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100 
sd3       0.0      0.0      0.0    0.0  0.0  0.0    0.0   0   0 
sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100 
[b]sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

Since this SSD is in a RAID array, and just presents as a regular disk LUN, is
there a special incantation required to turn on the Turbo mode?

Doesnt it seem that all this traffic should be maxing out the SSD? Reads from
the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to add
as a mirror, but it seems pointless if there''s no zip to be had....

help?

Thanks,
Tracey
-- 
This message posted from opensolaris.org

Brendan Gregg - Sun Microsystems

2010-Feb-12 22:43 UTC

head link

[zfs-discuss] SSD and ZFS

On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:> I have a similar question, I put together a cheapo RAID with four 1TB WD
Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> # zpool status dpool
>   pool: dpool
>  state: ONLINE
>  scrub: none requested
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         dpool       ONLINE       0     0     0
>           raidz1    ONLINE       0     0     0
>             c0t0d0  ONLINE       0     0     0
>             c0t0d1  ONLINE       0     0     0
>             c0t0d2  ONLINE       0     0     0
>             c0t0d3  ONLINE       0     0     0
> [b]        logs
>           c0t0d4s0  ONLINE       0     0     0[/b]
> [b]        cache
>           c0t0d4s1  ONLINE       0     0     0[/b]
>         spares
>           c0t0d6    AVAIL   
>           c0t0d7    AVAIL   
> 
>                capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> dpool       72.1G  3.55T    237     12  29.7M   597K
>   raidz1    72.1G  3.55T    237      9  29.7M   469K
>     c0t0d0      -      -    166      3  7.39M   157K
>     c0t0d1      -      -    166      3  7.44M   157K
>     c0t0d2      -      -    166      3  7.39M   157K
>     c0t0d3      -      -    167      3  7.45M   157K
>   c0t0d4s0    20K  4.97G      0      3      0   127K
> cache           -      -      -      -      -      -
>   c0t0d4s1  17.6G  36.4G      3      1   249K   119K
> ----------  -----  -----  -----  -----  -----  -----
> I just don''t seem to be getting any bang for the buck I should be.
This was taken while rebuilding an Oracle index, all files stored in this pool. 
The WD disks are at 100%, and nothing is coming from the cache.  The cache does
have the entire DB cached (17.6G used), but hardly reads anything from it.  I
also am not seeing the spike of data flowing into the ZIL either, although
iostat show there is just write traffic hitting the SSD:
> 
>                  extended device statistics                      cpu
> device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
> sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  8  0 82
> sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100 
> sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100 
> sd3       0.0      0.0      0.0    0.0  0.0  0.0    0.0   0   0 
> sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100 
> [b]sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
> 
> Since this SSD is in a RAID array, and just presents as a regular disk LUN,
is there a special incantation required to turn on the Turbo mode?
> 
> Doesnt it seem that all this traffic should be maxing out the SSD? Reads
from the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to
add as a mirror, but it seems pointless if there''s no zip to be had....
The most likely reason is that this workload has been identified as streaming
by ZFS, which is prefetching from disk instead of the L2ARC (l2arc_nopreftch=1).

It also looks like you''ve used a 128 Kbyte ZFS record size.  Is Oracle
doing
128 Kbyte random I/O?  We usually tune that down before creating the database;
which will use the L2ARC device more efficiently.

Brendan

-- 
Brendan Gregg, Fishworks                       http://blogs.sun.com/brendan

Tracey Bernath

2010-Feb-13 04:49 UTC

head link

[zfs-discuss] SSD and ZFS

Thanks Brendan,
I was going to move it over to 8kb block size once I got through this index
rebuild. My thinking was that a disproportionate block size would show up as
excessive IO thruput, not a lack of thruput.

The question about the cache comes from the fact that the 18GB or so that it
says is in the cache IS the database. This was why I was thinking the index
rebuild should be CPU constrained, and I should see a spike in reading from
the cache.  If the entire file is cached, why would it go to the disks at
all for the reads?

The disks are delivering about 30MB/s of reads, but this SSD is rated for
sustained 70MB/s, so there should be a chance to pick up 100% gain.

I''ve seen lots of mention of kernel settings, but those only seem to
apply
to cache flushes on sync writes.

Any idea on where to look next? I''ve spent about a week tinkering with
it.I''m trying to get a major customer to switch over to zfs and an open
storage solution, but I''m afraid if I cant get it to work in the small
scale, I cant convince them about the large scale.

Thanks,
Tracey


On Fri, Feb 12, 2010 at 4:43 PM, Brendan Gregg - Sun Microsystems <
brendan at sun.com> wrote:
> On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
> > I have a similar question, I put together a cheapo RAID with four 1TB
WD
> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD,
with
> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> > # zpool status dpool
> >   pool: dpool
> >  state: ONLINE
> >  scrub: none requested
> > config:
> >
> >         NAME        STATE     READ WRITE CKSUM
> >         dpool       ONLINE       0     0     0
> >           raidz1    ONLINE       0     0     0
> >             c0t0d0  ONLINE       0     0     0
> >             c0t0d1  ONLINE       0     0     0
> >             c0t0d2  ONLINE       0     0     0
> >             c0t0d3  ONLINE       0     0     0
> > [b]        logs
> >           c0t0d4s0  ONLINE       0     0     0[/b]
> > [b]        cache
> >           c0t0d4s1  ONLINE       0     0     0[/b]
> >         spares
> >           c0t0d6    AVAIL
> >           c0t0d7    AVAIL
> >
> >                capacity     operations    bandwidth
> > pool         used  avail   read  write   read  write
> > ----------  -----  -----  -----  -----  -----  -----
> > dpool       72.1G  3.55T    237     12  29.7M   597K
> >   raidz1    72.1G  3.55T    237      9  29.7M   469K
> >     c0t0d0      -      -    166      3  7.39M   157K
> >     c0t0d1      -      -    166      3  7.44M   157K
> >     c0t0d2      -      -    166      3  7.39M   157K
> >     c0t0d3      -      -    167      3  7.45M   157K
> >   c0t0d4s0    20K  4.97G      0      3      0   127K
> > cache           -      -      -      -      -      -
> >   c0t0d4s1  17.6G  36.4G      3      1   249K   119K
> > ----------  -----  -----  -----  -----  -----  -----
> > I just don''t seem to be getting any bang for the buck I
should be.  This
> was taken while rebuilding an Oracle index, all files stored in this pool.
>  The WD disks are at 100%, and nothing is coming from the cache.  The cache
> does have the entire DB cached (17.6G used), but hardly reads anything from
> it.  I also am not seeing the spike of data flowing into the ZIL either,
> although iostat show there is just write traffic hitting the SSD:
> >
> >                  extended device statistics                      cpu
> > device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt
id
> > sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  8  0
82
> > sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100
> > sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100
> > sd3       0.0      0.0      0.0    0.0  0.0  0.0    0.0   0   0
> > sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100
> > [b]sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
> >
> > Since this SSD is in a RAID array, and just presents as a regular disk
> LUN, is there a special incantation required to turn on the Turbo mode?
> >
> > Doesnt it seem that all this traffic should be maxing out the SSD?
Reads
> from the cache, and writes to the ZIL? I have a seocnd identical SSD I
> wanted to add as a mirror, but it seems pointless if there''s no
zip to be
> had....
>
> The most likely reason is that this workload has been identified as
> streaming
> by ZFS, which is prefetching from disk instead of the L2ARC
> (l2arc_nopreftch=1).
>
> It also looks like you''ve used a 128 Kbyte ZFS record size.  Is
Oracle
> doing
> 128 Kbyte random I/O?  We usually tune that down before creating the
> database;
> which will use the L2ARC device more efficiently.
>
> Brendan
>
> --
> Brendan Gregg, Fishworks
> http://blogs.sun.com/brendan
>


-- 
Tracey Bernath
913-488-6284
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100212/41f4d761/attachment.html>

Richard Elling

2010-Feb-13 15:22 UTC

head link

[zfs-discuss] SSD and ZFS

comment below...

On Feb 12, 2010, at 2:25 PM, TMB wrote:> I have a similar question, I put together a cheapo RAID with four 1TB WD
Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> # zpool status dpool
>  pool: dpool
> state: ONLINE
> scrub: none requested
> config:
> 
>        NAME        STATE     READ WRITE CKSUM
>        dpool       ONLINE       0     0     0
>          raidz1    ONLINE       0     0     0
>            c0t0d0  ONLINE       0     0     0
>            c0t0d1  ONLINE       0     0     0
>            c0t0d2  ONLINE       0     0     0
>            c0t0d3  ONLINE       0     0     0
> [b]        logs
>          c0t0d4s0  ONLINE       0     0     0[/b]
> [b]        cache
>          c0t0d4s1  ONLINE       0     0     0[/b]
>        spares
>          c0t0d6    AVAIL   
>          c0t0d7    AVAIL   
> 
>               capacity     operations    bandwidth
> pool         used  avail   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> dpool       72.1G  3.55T    237     12  29.7M   597K
>  raidz1    72.1G  3.55T    237      9  29.7M   469K
>    c0t0d0      -      -    166      3  7.39M   157K
>    c0t0d1      -      -    166      3  7.44M   157K
>    c0t0d2      -      -    166      3  7.39M   157K
>    c0t0d3      -      -    167      3  7.45M   157K
>  c0t0d4s0    20K  4.97G      0      3      0   127K
> cache           -      -      -      -      -      -
>  c0t0d4s1  17.6G  36.4G      3      1   249K   119K
> ----------  -----  -----  -----  -----  -----  -----
> I just don''t seem to be getting any bang for the buck I should be.
This was taken while rebuilding an Oracle index, all files stored in this pool. 
The WD disks are at 100%, and nothing is coming from the cache.  The cache does
have the entire DB cached (17.6G used), but hardly reads anything from it.  I
also am not seeing the spike of data flowing into the ZIL either, although
iostat show there is just write traffic hitting the SSD:
> 
>                 extended device statistics                      cpu
> device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
> sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  8  0 82
> sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100 
> sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100 
> sd3       0.0      0.0      0.0    0.0  0.0  0.0    0.0   0   0 
> sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100 
> [b]sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
iostat has a "n" option, which is very useful for looking at device
names :-)

The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
response time will be agonizingly slow.

By default, for this version of ZFS, up to 35 I/Os will be queued to the
disk, which is why you see 35.0 in the "actv" column. The combination
of actv=35 and svc_t>200 indicates that this is the place to start working.
Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
This will reduce the concurrent load on the disks, thus reducing svc_t.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 -- richard
> Since this SSD is in a RAID array, and just presents as a regular disk LUN,
is there a special incantation required to turn on the Turbo mode?
> 
> Doesnt it seem that all this traffic should be maxing out the SSD? Reads
from the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to
add as a mirror, but it seems pointless if there''s no zip to be had....
> 
> help?
> 
> Thanks,
> Tracey
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Tracey Bernath

2010-Feb-14 05:51 UTC

head link

[zfs-discuss] SSD and ZFS

OK, that was the magic  incantation I  was looking for:
- changing the noprefetch option opened the floodgates to the L2ARC
- changing the max queue depth relived the wait time on the drives, although
I may undo this again in the benchmarking since these drives all have NCQ

I went from all four disks of the array at 100%, doing about 170 read
IOPS/25MB/s
to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
off the cache drive (@ only 50% load).
This bodes well for adding a second mirrored cache drive to push for the
1KIOPS.

Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
be ready
for some production benchmarking.


 device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  80  0 82
  sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100
  sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100
  sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100
  sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31

                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d1
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d2
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d3
  285.2    0.8 36236.2   14.4  0.0  0.5    0.0    1.8   1  37 c0t0d4


And, keep  in mind this was on less than $1000 of hardware.

Thanks,
Tracey


On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling <richard.elling at
gmail.com>wrote:
> comment below...
>
> On Feb 12, 2010, at 2:25 PM, TMB wrote:
> > I have a similar question, I put together a cheapo RAID with four 1TB
WD
> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD,
with
> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> > # zpool status dpool
> >  pool: dpool
> > state: ONLINE
> > scrub: none requested
> > config:
> >
> >        NAME        STATE     READ WRITE CKSUM
> >        dpool       ONLINE       0     0     0
> >          raidz1    ONLINE       0     0     0
> >            c0t0d0  ONLINE       0     0     0
> >            c0t0d1  ONLINE       0     0     0
> >            c0t0d2  ONLINE       0     0     0
> >            c0t0d3  ONLINE       0     0     0
> > [b]        logs
> >          c0t0d4s0  ONLINE       0     0     0[/b]
> > [b]        cache
> >          c0t0d4s1  ONLINE       0     0     0[/b]
> >        spares
> >          c0t0d6    AVAIL
> >          c0t0d7    AVAIL
> >
> >               capacity     operations    bandwidth
> > pool         used  avail   read  write   read  write
> > ----------  -----  -----  -----  -----  -----  -----
> > dpool       72.1G  3.55T    237     12  29.7M   597K
> >  raidz1    72.1G  3.55T    237      9  29.7M   469K
> >    c0t0d0      -      -    166      3  7.39M   157K
> >    c0t0d1      -      -    166      3  7.44M   157K
> >    c0t0d2      -      -    166      3  7.39M   157K
> >    c0t0d3      -      -    167      3  7.45M   157K
> >  c0t0d4s0    20K  4.97G      0      3      0   127K
> > cache           -      -      -      -      -      -
> >  c0t0d4s1  17.6G  36.4G      3      1   249K   119K
> > ----------  -----  -----  -----  -----  -----  -----
> > I just don''t seem to be getting any bang for the buck I
should be.  This
> was taken while rebuilding an Oracle index, all files stored in this pool.
>  The WD disks are at 100%, and nothing is coming from the cache.  The cache
> does have the entire DB cached (17.6G used), but hardly reads anything from
> it.  I also am not seeing the spike of data flowing into the ZIL either,
> although iostat show there is just write traffic hitting the SSD:
> >
> >                 extended device statistics                      cpu
> > device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt
id
> > sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  8  0
82
> > sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100
> > sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100
> > sd3       0.0      0.0      0.0    0.0  0.0  0.0    0.0   0   0
> > sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100
> > [b]sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
>
> iostat has a "n" option, which is very useful for looking at
device names
> :-)
>
> The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
> response time will be agonizingly slow.
>
> By default, for this version of ZFS, up to 35 I/Os will be queued to the
> disk, which is why you see 35.0 in the "actv" column. The
combination
> of actv=35 and svc_t>200 indicates that this is the place to start
working.
> Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
> This will reduce the concurrent load on the disks, thus reducing svc_t.
>
>
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29
>
>  -- richard
>
> > Since this SSD is in a RAID array, and just presents as a regular disk
> LUN, is there a special incantation required to turn on the Turbo mode?
> >
> > Doesnt it seem that all this traffic should be maxing out the SSD?
Reads
> from the cache, and writes to the ZIL? I have a seocnd identical SSD I
> wanted to add as a mirror, but it seems pointless if there''s no
zip to be
> had....
> >
> > help?
> >
> > Thanks,
> > Tracey
> > --
> > This message posted from opensolaris.org
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100213/d7b17a74/attachment.html>

Tracey Bernath

2010-Feb-15 05:08 UTC

head link

[zfs-discuss] SSD and ZFS

For those following the saga:
With the prefetch problem fixed, and data coming off the L2ARC instead of
the disks, the system switched from IO bound to CPU bound, I opened up the
throttles with some explicit PARALLEL hints in the Oracle commands, and we
were finally able to max out the single SSD:


    r/s      w/s     kr/s      kw/s wait actv wsvc_t asvc_t  %w  %b device

  826.0    3.2 104361.8   35.2  0.0  9.9    0.0   12.0   3 100 c0t0d4

So, when we maxed out the SSD cache, it was delivering 100+MB/s, and 830
IOPS
with 3.4 TB behind it in a 4 disk SATA RAIDz1.

Still have to remap it to 8k blocks to get more efficiency, but for raw
numbers, it''s right what I was
looking for. Now, to add the second SSD ZIL/L2ARC for a mirror. I may even
splurge for one more to
get a three way mirror. That will completely saturate the SCSI channel. Now
I need a bigger server....

Did I mention it was <$1000 for the whole setup? Bah-ha-ha-ha.....

Tracey


On Sat, Feb 13, 2010 at 11:51 PM, Tracey Bernath <tbernath at
ix.netcom.com>wrote:
> OK, that was the magic  incantation I  was looking for:
> - changing the noprefetch option opened the floodgates to the L2ARC
> - changing the max queue depth relived the wait time on the drives,
> although I may undo this again in the benchmarking since these drives all
> have NCQ
>
> I went from all four disks of the array at 100%, doing about 170 read
> IOPS/25MB/s
> to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
> off the cache drive (@ only 50% load).
> This bodes well for adding a second mirrored cache drive to push for the
> 1KIOPS.
>
> Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
> be ready
> for some production benchmarking.
>
>
> BEFORE:
>  device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
>   sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  80  0 82
>
>   sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100
>   sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100
>   sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100
>   sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31
>
> AFTER:
>                     extended device statistics
>     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
>     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d1
>     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d2
>     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t0d3
>   285.2    0.8 36236.2   14.4  0.0  0.5    0.0    1.8   1  37 c0t0d4
>
>
> And, keep  in mind this was on less than $1000 of hardware.
>
> Thanks for the pointers guys,
> Tracey
>
>
>
> On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling <richard.elling at
gmail.com>wrote:
>
>> comment below...
>>
>> On Feb 12, 2010, at 2:25 PM, TMB wrote:
>> > I have a similar question, I put together a cheapo RAID with four
1TB WD
>> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD,
with
>> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
>> > # zpool status dpool
>> >  pool: dpool
>> > state: ONLINE
>> > scrub: none requested
>> > config:
>> >
>> >        NAME        STATE     READ WRITE CKSUM
>> >        dpool       ONLINE       0     0     0
>> >          raidz1    ONLINE       0     0     0
>> >            c0t0d0  ONLINE       0     0     0
>> >            c0t0d1  ONLINE       0     0     0
>> >            c0t0d2  ONLINE       0     0     0
>> >            c0t0d3  ONLINE       0     0     0
>> > [b]        logs
>> >          c0t0d4s0  ONLINE       0     0     0[/b]
>> > [b]        cache
>> >          c0t0d4s1  ONLINE       0     0     0[/b]
>> >        spares
>> >          c0t0d6    AVAIL
>> >          c0t0d7    AVAIL
>> >
>> >               capacity     operations    bandwidth
>> > pool         used  avail   read  write   read  write
>> > ----------  -----  -----  -----  -----  -----  -----
>> > dpool       72.1G  3.55T    237     12  29.7M   597K
>> >  raidz1    72.1G  3.55T    237      9  29.7M   469K
>> >    c0t0d0      -      -    166      3  7.39M   157K
>> >    c0t0d1      -      -    166      3  7.44M   157K
>> >    c0t0d2      -      -    166      3  7.39M   157K
>> >    c0t0d3      -      -    167      3  7.45M   157K
>> >  c0t0d4s0    20K  4.97G      0      3      0   127K
>> > cache           -      -      -      -      -      -
>> >  c0t0d4s1  17.6G  36.4G      3      1   249K   119K
>> > ----------  -----  -----  -----  -----  -----  -----
>> > I just don''t seem to be getting any bang for the buck I
should be.  This
>> was taken while rebuilding an Oracle index, all files stored in this
pool.
>>  The WD disks are at 100%, and nothing is coming from the cache.  The
cache
>> does have the entire DB cached (17.6G used), but hardly reads anything
from
>> it.  I also am not seeing the spike of data flowing into the ZIL
either,
>> although iostat show there is just write traffic hitting the SSD:
>> >
>> >                 extended device statistics                     
cpu
>> > device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy
wt id
>> > sd0     170.0    0.4 7684.7    0.0  0.0 35.0  205.3   0 100  11  8
0 82
>> > sd1     168.4    0.4 7680.2    0.0  0.0 34.6  205.1   0 100
>> > sd2     172.0    0.4 7761.7    0.0  0.0 35.0  202.9   0 100
>> > sd3       0.0      0.0      0.0    0.0  0.0  0.0    0.0   0   0
>> > sd4     170.0    0.4 7727.1    0.0  0.0 35.0  205.3   0 100
>> > [b]sd5       1.6      2.6  182.4  104.8  0.0  0.5  117.8   0  31
[/b]
>>
>> iostat has a "n" option, which is very useful for looking at
device names
>> :-)
>>
>> The SSD here is perfoming well.  The rest are clobbered. 205
millisecond
>> response time will be agonizingly slow.
>>
>> By default, for this version of ZFS, up to 35 I/Os will be queued to
the
>> disk, which is why you see 35.0 in the "actv" column. The
combination
>> of actv=35 and svc_t>200 indicates that this is the place to start
>> working.
>> Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to
4.
>> This will reduce the concurrent load on the disks, thus reducing svc_t.
>>
>>
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29
>>
>>  -- richard
>>
>> > Since this SSD is in a RAID array, and just presents as a regular
disk
>> LUN, is there a special incantation required to turn on the Turbo mode?
>> >
>> > Doesnt it seem that all this traffic should be maxing out the SSD?
Reads
>> from the cache, and writes to the ZIL? I have a seocnd identical SSD I
>> wanted to add as a mirror, but it seems pointless if there''s
no zip to be
>> had....
>> >
>> > help?
>> >
>> > Thanks,
>> > Tracey
>> > --
>> > This message posted from opensolaris.org
>> > _______________________________________________
>> > zfs-discuss mailing list
>> > zfs-discuss at opensolaris.org
>> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>>
>

-- 
Tracey Bernath
913-488-6284
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100214/1fbb99ed/attachment.html>

Daniel Carosone

2010-Feb-15 23:51 UTC

head link

[zfs-discuss] SSD and ZFS

On Sun, Feb 14, 2010 at 11:08:52PM -0600, Tracey Bernath
wrote:> Now, to add the second SSD ZIL/L2ARC for a mirror. 
Just be clear: mirror ZIL by all means, but don''t mirror l2arc, just
add more devices and let them load-balance.   This is especially true
if you''re sharing ssd writes with ZIL, as slices on the same devices.
> I may even splurge for one more to get a three way mirror.
With more devices, questions about selecting different devices
appropriate for each purpose come into play.
> Now I need a bigger server....
See? :)

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100216/2c5e2dae/attachment.bin>

Tracey Bernath

2010-Feb-16 03:11 UTC

head link

[zfs-discuss] SSD and ZFS

On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone <dan at geek.com.au>
wrote:
> On Sun, Feb 14, 2010 at 11:08:52PM -0600, Tracey Bernath wrote:
> > Now, to add the second SSD ZIL/L2ARC for a mirror.
>
> Just be clear: mirror ZIL by all means, but don''t mirror l2arc,
just
> add more devices and let them load-balance.   This is especially true
> if you''re sharing ssd writes with ZIL, as slices on the same
devices.
>
> Well, the problem I am trying to solve is wouldn''t it read 2x
faster withthe mirror?  It seems once I can drive the single device to 10 queued
actions, and 100% busy, it would be more useful to have two channels to the
same data. Is ZFS not smart enough to understand that there are two
identical mirror devices in the cache to split requests to? Or, are you
saying that ZFS is smart enough to cache it in two places, although not
mirrored?

If the device itself was full, and items were falling off the L2ARC, then I
could see having two separate cache devices, but since I am only at about
50% utilization of the available capacity, and maxing out the IO, then
mirroring seemed smarter.

Am I missing something here?

Tracey


> > I may even splurge for one more to get a three way mirror.
>
> With more devices, questions about selecting different devices
> appropriate for each purpose come into play.
>
> > Now I need a bigger server....
>
> See? :)
>
> --
> Dan.-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100215/0851209e/attachment.html>

Bob Friesenhahn

2010-Feb-16 16:15 UTC

head link

[zfs-discuss] SSD and ZFS

On Mon, 15 Feb 2010, Tracey Bernath wrote:> 
> If the device itself was full, and items were falling off the L2ARC, then I
could see having two
> separate cache devices, but since I am only at about 50% utilization of the
available capacity, and
> maxing out the IO, then mirroring seemed smarter.
> 
> Am I missing something here?
I doubt it.  The only way to know for sure is to test it but it seems 
unlikely to me that zfs implementors would fail to load share the 
reads from mirrored L2ARC.  Richard''s points about L2ARC bandwidth vs 
pool disk bandwidth are still good ones.  L2ARC is all about read 
latency, but L2ARC does not necessarily help with read bandwidth.  It 
is also useful to keep in mind that L2ARC offers at least 40x less 
bandwidth than ARC in RAM.  So always populate RAM first if you can 
afford it.

Bob
--
Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Daniel Carosone

2010-Feb-16 20:39 UTC

head link

[zfs-discuss] SSD and ZFS

On Mon, Feb 15, 2010 at 09:11:02PM -0600, Tracey Bernath
wrote:> On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone <dan at geek.com.au>
wrote:
> > Just be clear: mirror ZIL by all means, but don''t mirror
l2arc, just
> > add more devices and let them load-balance.   This is especially true
> > if you''re sharing ssd writes with ZIL, as slices on the same
devices.
> >
> > Well, the problem I am trying to solve is wouldn''t it read 2x
faster with
> the mirror?  It seems once I can drive the single device to 10 queued
> actions, and 100% busy, it would be more useful to have two channels to the
> same data. Is ZFS not smart enough to understand that there are two
> identical mirror devices in the cache to split requests to? Or, are you
> saying that ZFS is smart enough to cache it in two places, although not
> mirrored?
First, Bob is right, measurement trumps speculation.  Try it.

As for speculation, you''re thinking only about reads.  I expect
reading from l2arc devices will be the same as reading from any other
zfs mirror, and largely the same in both cases above; load balanced
across either device.  In the rare case of a bad read from unmirrored
l2arc, data will be fetched from the pool, so mirroring l2arc doesn''t
add any resiliency benefit.

However, your cache needs to be populated and maintained as well, and
this needs writes.  Twice as many of them for the mirror as for the
"stripe". Half of what is written never needs to be read again. These
writes go to the same ssd devices you''re using for ZIL, on commodity
ssd''s which are not well write-optimised, they may be hurting zil
latency by making the ssd do more writing, stealing from the total
iops count on the channel, and (as a lesser concern) adding wear
cycles to the device.  

When you''re already maxing out the IO, eliminating wasted cycles opens
your bottleneck, even if only a little. 

Once you reach steady state, I don''t know how much turnover in l2arc
contents you will have, and therefore how many extra writes we''re
talking about.  It may not be many, but they are unnecessary ones.  

Normally, we''d talk about measuring a potential benefit, and then
choosing based on the results.  In this case, if I were you I''d
eliminate the unnecessary writes, and measure the difference more as a
matter of curiosity and research, since I was already set up to do so.

--
Dan.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100217/9fa45a2e/attachment.bin>

Richard Elling

2010-Feb-17 01:43 UTC

head link

[zfs-discuss] SSD and ZFS

On Feb 16, 2010, at 12:39 PM, Daniel Carosone wrote:> On Mon, Feb 15, 2010 at 09:11:02PM -0600, Tracey Bernath wrote:
>> On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone <dan at
geek.com.au> wrote:
>>> Just be clear: mirror ZIL by all means, but don''t mirror
l2arc, just
>>> add more devices and let them load-balance.   This is especially
true
>>> if you''re sharing ssd writes with ZIL, as slices on the
same devices.
>>> 
>>> Well, the problem I am trying to solve is wouldn''t it read
2x faster with
>> the mirror?  It seems once I can drive the single device to 10 queued
>> actions, and 100% busy, it would be more useful to have two channels to
the
>> same data. Is ZFS not smart enough to understand that there are two
>> identical mirror devices in the cache to split requests to? Or, are you
>> saying that ZFS is smart enough to cache it in two places, although not
>> mirrored?
> 
> First, Bob is right, measurement trumps speculation.  Try it.
> 
> As for speculation, you''re thinking only about reads.  I expect
> reading from l2arc devices will be the same as reading from any other
> zfs mirror, and largely the same in both cases above; load balanced
> across either device.  In the rare case of a bad read from unmirrored
> l2arc, data will be fetched from the pool, so mirroring l2arc
doesn''t
> add any resiliency benefit.
> 
> However, your cache needs to be populated and maintained as well, and
> this needs writes.  Twice as many of them for the mirror as for the
> "stripe". Half of what is written never needs to be read again.
These
> writes go to the same ssd devices you''re using for ZIL, on
commodity
> ssd''s which are not well write-optimised, they may be hurting zil
> latency by making the ssd do more writing, stealing from the total
> iops count on the channel, and (as a lesser concern) adding wear
> cycles to the device.  
The L2ARC writes are throttled to be 8MB/sec, except during cold
start where the throttle is 16MB/sec.  This should not be noticeable
on the channels.
> When you''re already maxing out the IO, eliminating wasted cycles
opens
> your bottleneck, even if only a little. 
+1 
 -- richard
> Once you reach steady state, I don''t know how much turnover in
l2arc
> contents you will have, and therefore how many extra writes we''re
> talking about.  It may not be many, but they are unnecessary ones.  
> 
> Normally, we''d talk about measuring a potential benefit, and then
> choosing based on the results.  In this case, if I were you I''d
> eliminate the unnecessary writes, and measure the difference more as a
> matter of curiosity and research, since I was already set up to do so.
> 
> --
> Dan.
> 
ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 15-17, 2010)

Fajar A. Nugraha

2010-Feb-17 03:30 UTC

head link

[zfs-discuss] SSD and ZFS

On Sun, Feb 14, 2010 at 12:51 PM, Tracey Bernath <tbernath at
ix.netcom.com> wrote:> I went from all four disks of the array at 100%, doing about 170 read
> IOPS/25MB/s
> to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
> off the cache drive (@ only 50% load).
> And, keep? in mind this was on less than $1000 of hardware.
really? complete box and all, or is it just the disks? Cause the 4
disks alone should cost about $400. Did you use ECC RAM?

-- 
Fajar

Maybe Matching Threads

Search for more maybe matching threads

zfs discuss - Feb 2010 - SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

[zfs-discuss] SSD and ZFS

Maybe Matching Threads