thr3ads.net - zfs discuss - [zfs-discuss] ZFS vs NFS vs array caches, revisited [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Marion Hakanson

2007-Feb-02 00:12 UTC

[zfs-discuss] ZFS vs NFS vs array caches, revisited

I had followed with interest the "turn off NV cache flushing" thread,
in
regard to doing ZFS-backed NFS on our low-end Hitachi array:

  http://www.mail-archive.com/zfs-discuss at opensolaris.org/msg05000.html

In short, if you have non-volatile cache, you can configure the array
to ignore the ZFS cache-flush requests.  This is reported to improve the
really terrible performance of ZFS-backed NFS systems.  Feel free to
correct me if I''m misremembering....

Anyway, I''ve also read that if ZFS notices it''s using
"slices" instead of
whole disks, it will not enable/use the write cache.  So I thought I''d
be
clever and configure a ZFS pool on our array with a slice of a LUN instead
of the whole LUN, and "fool" ZFS into not issuing cache-flushes,
rather
than having to change config of the array itself.

Unfortunately, it didn''t make a bit of difference in my little NFS
benchmark,
namely extracting a small 7.6MB tar file (C++ source code, 500 files/dirs).

I used three test zpools and a UFS filesystem (not all were in play at the
same time):
  pool: bulk_sp1
 state: ONLINE
 scrub: none requested
config:

        NAME                                             STATE     READ WRITE 
CKSUM
        bulk_sp1                                         ONLINE       0     0  
   0
          c6t4849544143484920443630303133323230303230d0  ONLINE       0     0  
   0

errors: No known data errors

  pool: bulk_sp1s
 state: ONLINE
 scrub: none requested
config:

        NAME                                               STATE     READ 
WRITE
CKSUM
        bulk_sp1s                                          ONLINE       0     
0
    0
          c6t4849544143484920443630303133323230303230d0s0  ONLINE       0     
0
    0

errors: No known data errors

  pool: int01
 state: ONLINE
 scrub: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        int01         ONLINE       0     0     0
          mirror      ONLINE       0     0     0
            c0t0d0s5  ONLINE       0     0     0
            c0t1d0s5  ONLINE       0     0     0

errors: No known data errors

# prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      4    00         34 4294879232 4294879265
       1      4    00  4294879266     67517 4294946782
       8     11    00  4294946783     16384 4294963166
# 

Both NFS client and server are Sun T2000''s, 16GB RAM, switched gigabit
ethernet, Solaris-10U3 patched as of 12-Jan-2007, doing nothing else
at the time of the tests.

The "bulk_sp1*" pools were both on the same Hitachi 9520V RAID-5 SATA
group
that I ran my bonnie++ tests on yesterday.  The "int01" pool is
mirrored
on two slice-5''s of the server T2000''s internal 2.5" SAS
73GB drives.

ZFS on whole-disk FC-SATA LUN via NFS:
    real 968.13
    user 0.33
    sys 0.04
      7.9 KB/sec overall

ZFS on partial slice-0 of FC-SATA LUN via NFS:
    real 950.77
    user 0.33
    sys 0.04
      8.0 KB/sec overall

ZFS on slice-5 mirror of internal SAS drives via NFS:
    real 17.48
    user 0.32
    sys 0.03
      438.8 KB/sec overall

UFS on partial slice-0 of FC-SATA LUN via NFS:
    real 6.13
    user 0.32
    sys 0.03
      1251.4 KB/sec overall


I''m not willing to disable the ZIL.  I think I''d settle for
the 400KB/sec
range in this test from NFS on ZFS, if I could get that on our FC-SATA
Hitachi array.  As things are now, ZFS just won''t work for us, and
I''m
not sure how to make it go faster.

Thoughts & suggestions are welcome....

Marion

Marion Hakanson

2007-Feb-02 01:50 UTC

head link

[zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

Adding to my own post, I said earlier:> Anyway, I''ve also read that if ZFS notices it''s using
"slices" instead of
> whole disks, it will not enable/use the write cache.  So I thought
I''d be
> clever and configure a ZFS pool on our array with a slice of a LUN instead
of
> the whole LUN, and "fool" ZFS into not issuing cache-flushes,
rather than
> having to change config of the array itself.
> 
> Unfortunately, it didn''t make a bit of difference in my little NFS
benchmark,
> namely extracting a small 7.6MB tar file (C++ source code, 500 files/dirs).
I was checking the write-cache settings via the "cache" submenu of
"format -e";
All LUN''s on this array appear (to "format") to have write
cache disabled.
Trying to enable it yields:
	Write cache setting is not changeable

Re-creating a zpool with whole-disk devices does not change the setting
reported by format, either.  Given that format can''t control the cache
settings, can one assume that ZFS isn''t trying to flush the cache
either?

My question here is, how can one tell if ZFS is trying to flush the write
caches?  Dtrace to the rescue?

Regards,

Marion

Roch - PAE

2007-Feb-02 09:55 UTC

head link

[zfs-discuss] ZFS vs NFS vs array caches, revisited

Marion, this is a common misintrepetation :

	"Anyway, I''ve also read that if ZFS notices it''s using
"slices" instead of
	whole disks, it will not enable/use the write cache. "

The reality is that

	ZFS turns on the write cache when it owns the
	whole disk.

_Independantly_ of that,
	
	ZFS flushes the write cache when ZFS needs to insure 
	that data reaches stable storage.

The point is that the flushes occur whether or not ZFS turned
the caches on or not (caches might be turned on by some
other means outside the visibility of ZFS).

The problem is that the flush cache command means 2
different things to the 2 components.

To ZFS :

	"put on stable storage"

To Storage:

	"flush the cache"

Until we get this  house in order,  storage needs  to ignore
the requests.

-r

dudekula mastan

2007-Feb-02 10:22 UTC

head link

[zfs-discuss] Meta data corruptions on ZFS.

Hi All,
   
  In my test set up, I have one zpool of size 1000M bytes.
   
  On this zpool, my application writes 100 files each of size 10 MB.
   
  First 96 files were written successfully with out any problem.
   
  But the 97 file is not written successfully , it written only 5 MB (the return
value of write() call ).
   
  Since it is short write my application tried to truncate it to 5MB. But
ftruncate is failing with an erroe message saying that No space on the devices.
   
  Have you people ever seen these kind of error message ?
   
  After ftruncate failure I checked the size of 97 th file, it is strange. The
size is 7 MB but the expected size is only 5 MB.
   
  You help is appreciated.
   
  Thanks & Regards
  Mastan
   

 
---------------------------------
TV dinner still cooling?
Check out "Tonight''s Picks" on Yahoo! TV.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070202/ee1794c6/attachment.html>

Marion Hakanson

2007-Feb-03 04:19 UTC

head link

[zfs-discuss] ZFS vs NFS vs array caches, revisited

Roch.Bourbonnais at Sun.Com said:> The reality is that
> 	ZFS turns on the write cache when it owns the
> 	whole disk.
> _Independantly_ of that,
>  	ZFS flushes the write cache when ZFS needs to insure 
> 	that data reaches stable storage.
> 
> The point is that the flushes occur whether or not ZFS turned the caches on
> or not (caches might be turned on by some other means outside the
visibility
> of ZFS). 
Thanks for taking the time to clear this up for us (assuming others than
just me had this misunderstanding :-).

Yet today I measured something that leaves me puzzled again.  How can we
explain the following results?

# zpool status -v
  pool: bulk_zp1
 state: ONLINE
 scrub: none requested
config:

        NAME                                                 STATE     READ 
WRITE CKSUM
        bulk_zp1                                             ONLINE       0    
 0     0
          raidz1                                             ONLINE       0    
 0     0
            c6t4849544143484920443630303133323230303230d0s0  ONLINE       0    
 0     0
            c6t4849544143484920443630303133323230303230d0s1  ONLINE       0    
 0     0
            c6t4849544143484920443630303133323230303230d0s2  ONLINE       0    
 0     0
            c6t4849544143484920443630303133323230303230d0s3  ONLINE       0    
 0     0
            c6t4849544143484920443630303133323230303230d0s4  ONLINE       0    
 0     0
            c6t4849544143484920443630303133323230303230d0s5  ONLINE       0    
 0     0
            c6t4849544143484920443630303133323230303230d0s6  ONLINE       0    

0     0

errors: No known data errors
# prtvtoc -s /dev/rdsk/c6t4849544143484920443630303133323230303230d0
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      4    00         34 613563821 613563854
       1      4    00  613563855 613563821 1227127675
       2      4    00  1227127676 613563821 1840691496
       3      4    00  1840691497 613563821 2454255317
       4      4    00  2454255318 613563821 3067819138
       5      4    00  3067819139 613563821 3681382959
       6      4    00  3681382960 613563821 4294946780
       8     11    00  4294946783     16384 4294963166
# 

And, at a later time:
# zpool status -v bulk_sp1s
  pool: bulk_sp1s
 state: ONLINE
 scrub: none requested
config:

        NAME                                               STATE     READ 
WRITE CKSUM
        bulk_sp1s                                          ONLINE       0     
0     0
          c6t4849544143484920443630303133323230303230d0s0  ONLINE       0     
0     0
          c6t4849544143484920443630303133323230303230d0s1  ONLINE       0     
0     0
          c6t4849544143484920443630303133323230303230d0s2  ONLINE       0     
0     0
          c6t4849544143484920443630303133323230303230d0s3  ONLINE       0     
0     0
          c6t4849544143484920443630303133323230303230d0s4  ONLINE       0     
0     0
          c6t4849544143484920443630303133323230303230d0s5  ONLINE       0     
0     0
          c6t4849544143484920443630303133323230303230d0s6  ONLINE       0     
0     0

errors: No known data errors
# 


The storage is that same "single 2TB LUN" I used yesterday, except
I''ve
used "format" to slice it up into 7 equal chunks, and made a raidz
(and later a simple striped) pool across all of them.  My "tar over
NFS"
benchmark on these goes pretty fast.  If ZFS is making the flush-cache call,
it sure works faster than in the whole-LUN case:

ZFS on whole-disk FC-SATA LUN via NFS, yesterday:
    real 968.13
    user 0.33
    sys 0.04
      7.9 KB/sec overall

ZFS on whole-disk FC-SATA LUN via NFS, ssd_max_throttle=32 today:
    real 664.78
    user 0.33
    sys 0.04
      11.4 KB/sec overall

ZFS raidz on 7 slices of FC-SATA LUN via NFS today:
    real 12.32
    user 0.32
    sys 0.03
      620.2 KB/sec overall

ZFS striped on 7 slices of FC-SATA LUN via NFS today:
    real 6.51
    user 0.32
    sys 0.03
      1178.3 KB/sec overall

Not that I''m not complaining, mind you.  I appear to have stumbled
across
a way to get NFS over ZFS to work at a reasonable speed, without making
changes to the array (nor resorting to giving ZFS SVN soft partitions
instead of "real" devices).  Suboptimal, mind you, but it''s
workable
if our Hitachi folks don''t turn up a way to tweak the array.

Guess I should go read the ZFS source code (though my 10U3 surely lags
the Opensolaris stuff).

Thanks and regards,

Marion

dudekula mastan

2007-Feb-06 18:55 UTC

head link

[zfs-discuss] Meta data corruptions on ZFS.

Hi All,
   
  No one has any idea on this ?
   
  -Masthan

dudekula mastan <d_mastan at yahoo.com> wrote:
  
  Hi All,
   
  In my test set up, I have one zpool of size 1000M bytes.
   
  On this zpool, my application writes 100 files each of size 10 MB.
   
  First 96 files were written successfully with out any problem.
   
  But the 97 file is not written successfully , it written only 5 MB (the return
value of write() call ).
   
  Since it is short write my application tried to truncate it to 5MB. But
ftruncate is failing with an erroe message saying that No space on the devices.
   
  Have you people ever seen these kind of error message ?
   
  After ftruncate failure I checked the size of 97 th file, it is strange. The
size is 7 MB but the expected size is only 5 MB.
   
  You help is appreciated.
   
  Thanks & Regards
  Mastan
   
    
---------------------------------
  TV dinner still cooling?
Check out "Tonight''s Picks" on Yahoo!
TV._______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 
---------------------------------
Have a burning question? Go to Yahoo! Answers and get answers from real people
who know.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070206/c7921cb0/attachment.html>

Sanjeev Bagewadi

2007-Feb-07 04:06 UTC

head link

[zfs-discuss] Meta data corruptions on ZFS.

Masthan,
>
> */dudekula mastan <d_mastan at yahoo.com>/* wrote:
>
>
>     Hi All,
>      
>     In my test set up, I have one zpool of size 1000M bytes.
>      
>Is this the size given by zfs list ? Or is the amount of disk space that 
you had ?
The reason I ask this is because ZFS/Zpool takes up some amount of space 
for its house keeping.
So, if you add 1G worth of disk space to the pool the effective space 
available is a little less (few MBs)
than 1G.
>     On this zpool, my application writes 100 files each of size 10 MB.
>      
>     First 96 files were written successfully with out any problem.
>Here you are filling the FS to the brim. This is a border case and the 
copy-on-write nature of ZFS
could lead to the behaviour that you are seeing.
>      
>     But the 97 file is not written successfully , it written only 5 MB
>     (the return value of write() call ).
>      
>     Since it is short write my application tried to truncate it to
>     5MB. But ftruncate is failing with an erroe message saying that No
>     space on the devices.
>This is expected because of the copy-onwrite nature of ZFS. During 
truncate it is trying to allocate
new disk blocks probably to write the new metadata and fails to find them.
>      
>     Have you people ever seen these kind of error message ?
>Yes, there are others who have seen these errors.
>      
>     After ftruncate failure I checked the size of 97 th file, it is
>     strange. The size is 7 MB but the expected size is only 5 MB.
>
Is there any particular reason that you are pushing the filesystem to 
the brim ?
Is this part of some test ? Please, help us understand what you are 
trying to test.

Thanks and regards,
Sanjeev.

-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:    x27521 +91 80 669 27521

Matthew Ahrens

2007-Feb-09 21:21 UTC

head link

[zfs-discuss] ENOSPC on full FS (was: Meta data corruptions on ZFS.)

dudekula mastan wrote:> 
> Hi All,
>  
> In my test set up, I have one zpool of size 1000M bytes.
>  
> On this zpool, my application writes 100 files each of size 10 MB.
>  
> First 96 files were written successfully with out any problem.
>  
> But the 97 file is not written successfully , it written only 5 MB (the 
> return value of write() call ).
>  
> Since it is short write my application tried to truncate it to 5MB. But 
> ftruncate is failing with an erroe message saying that No space on the 
> devices.
Try removing one of the larger files.  Alternatively, upgrade to a more 
recent version of solaris express / nevada / opensolaris where this 
problem is much less severe.

--matt

ps. subject changed, not sure what this had to do with corruption.

Leon Koll

2007-Feb-11 00:02 UTC

head link

[zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

Marion asked the community
[i]> How can we explain the following results?[/i]
and nobody replied so I ask this question again because it''s very
important to me:
 
[b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked [u]146 times
faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b]

I''ll appreciate your inputs.
[i]-- leon[/i]
 
 
This message posted from opensolaris.org

Jeff Bonwick

2007-Feb-11 08:03 UTC

head link

[zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

> [b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked [u]146
times faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b]
Without knowing more I can only guess, but most likely it''s a simple
matter of working set.  Suppose the benchmark in question has a 4G
working set, and suppose that each LUN is fronted by a 1G cache.
With a single LUN, only 1/4 of your working set fits in cache,
so you''re doing a fair amount of actual disk I/O.  With 7 LUNs,
you''ve got 7G of cache, so the entire benchmark fits in cache --
no disk I/O.  The factor of >100x is what tells me this is almost
certainly a working-set effect.

Jeff

Leon Koll

2007-Feb-11 16:53 UTC

head link

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

Jeff,
thank you for the explanation but it''s hard to me to accept it because:

1.You described a different configuration: 7 LUNs. Marion post was about 7
slices of the same LUN.

2.I never saw the storage controller with cache-per-LUN setting. Cache size
doesn''t depend on number of LUNs IMHO, it''s a fixed size per
controller or per FC port, SAN-experts-please-fix-me-if-I''m-wrong.

[i]-- leon[/i]
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Feb-11 23:20 UTC

head link

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

Hello Leon,

Sunday, February 11, 2007, 5:53:48 PM, you wrote:

LK> Jeff,
LK> thank you for the explanation but it''s hard to me to accept it
because:

LK> 1.You described a different configuration: 7 LUNs. Marion post
LK> was about 7 slices of the same LUN.

LK> 2.I never saw the storage controller with cache-per-LUN setting.
LK> Cache size doesn''t depend on number of LUNs IMHO, it''s
a fixed
LK> size per controller or per FC port,
LK> SAN-experts-please-fix-me-if-I''m-wrong.

IIRC Symmetrix boxes used to at least reserve minimum amount of cache
per LUN basis. However it''s not relevant to your case as you are
talking about: entire LUN vs. 7 slices on the LUN in a striped pool.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Marion Hakanson

2007-Feb-13 00:36 UTC

head link

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

leon.is.here at gmail.com said:> [b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked [u]146
times
> faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b] 
Well, I do have more info to share on this issue, though how it worked
faster in that test still remains a mystery.  Folks may recall that I said:
> Not that I''m not complaining, mind you.  I appear to have stumbled
across a
> way to get NFS over ZFS to work at a reasonable speed, without making
changes
> to the array (nor resorting to giving ZFS SVN soft partitions instead of
> "real" devices).  Suboptimal, mind you, but it''s
workable if our Hitachi
> folks don''t turn up a way to tweak the array.
Unfortunately, I was wrong.  I _don''t_ know how to make it go fast. 
While
I _have_ been able to reproduce the result on a couple different LUN/slice
configurations, I don''t know what triggers the "fast"
behavior.  All I can
say for sure is that a little dtrace one-liner that counts sync-cache calls
turns up no such calls (for both local ZFS and remote NFS extracts) when
things are going fast on a particular filesystem.

By comparison, a local ZFS tar-extraction triggers 12 sync-cache calls,
and one hits 288 such calls during an NFS extraction before interrupting
the run after 30 seconds (est. 1/100th of the way through) when things
are working in the "slow" mode.  Oh yeah, here''s the
one-liner (type in
the command, run your test in another session, then hit ^C on this one):

  dtrace -n fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry''{@a[probefunc] =
count()}''

This is my first ever use of dtrace, so please be gentle with me (:-).


hakansom at ohsu.edu said:> Guess I should go read the ZFS source code (though my 10U3 surely lags the
> Opensolaris stuff). 
I did go read the source code, for my own edification.  To reiterate what
was said earlier:

Roch.Bourbonnais at Sun.Com said:> The point is that the flushes occur whether or not ZFS turned the caches on
> or not (caches might be turned on by some other means outside the
visibility
> of ZFS). 
My limited reading of ZFS (on opensolaris.org site) code so far has turned
up no obvious way to make ZFS skip the sync-cache call.  However my dtrace
test, unless it''s flawed, shows that on some filesystems, the call is
made,
and on other filesystems the call is not made.


leon.is.here at gmail.com said:> 2.I never saw the storage controller with cache-per-LUN setting. Cache size
> doesn''t depend on number of LUNs IMHO, it''s a fixed size
per controller or
> per FC port, SAN-experts-please-fix-me-if-I''m-wrong. 
Robert has already mentioned array cache being reserved on a per-LUN basis
in Symmetrix boxes.  Our low-end HDS unit also has cache pre-fetch settings
on a per-LUN basis (defaults according to number of disks in RAID-group).

Regards,

Marion

Roch - PAE

2007-Feb-13 10:58 UTC

head link

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

The only obvious thing would be if the exported ZFS
filesystems where initially mounted at a point in time when
zil_disable was non-null.

The stack trace that is relevant is:

  	      sd_send_scsi_SYNCHRONIZE_CACHE                    
              sd`sdioctl+0x1770
              zfs`vdev_disk_io_start+0xa0
              zfs`zil_flush_vdevs+0x108
              zfs`zil_commit_writer+0x2b8
		...

You might want to try in turn:

	dtrace -n
''sd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''

	dtrace -n ''sdioctl:entry{@a[stack(20)]=count()}''

	dtrace -n zil_flush_vdevs:entry{@a[stack(20)]=count()}''

	dtrace -n zil_commit_writer:entry{@a[stack(20)]=count()}''

And see if you loose your footing along the way.


-r


Marion Hakanson writes:
 > leon.is.here at gmail.com said:
 > > [b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked
[u]146 times
 > > faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b] 
 > 
 > Well, I do have more info to share on this issue, though how it worked
 > faster in that test still remains a mystery.  Folks may recall that I
said:
 > 
 > > Not that I''m not complaining, mind you.  I appear to have
stumbled across a
 > > way to get NFS over ZFS to work at a reasonable speed, without making
changes
 > > to the array (nor resorting to giving ZFS SVN soft partitions instead
of
 > > "real" devices).  Suboptimal, mind you, but it''s
workable if our Hitachi
 > > folks don''t turn up a way to tweak the array.
 > 
 > Unfortunately, I was wrong.  I _don''t_ know how to make it go
fast.  While
 > I _have_ been able to reproduce the result on a couple different LUN/slice
 > configurations, I don''t know what triggers the "fast"
behavior.  All I can
 > say for sure is that a little dtrace one-liner that counts sync-cache
calls
 > turns up no such calls (for both local ZFS and remote NFS extracts) when
 > things are going fast on a particular filesystem.
 > 
 > By comparison, a local ZFS tar-extraction triggers 12 sync-cache calls,
 > and one hits 288 such calls during an NFS extraction before interrupting
 > the run after 30 seconds (est. 1/100th of the way through) when things
 > are working in the "slow" mode.  Oh yeah, here''s the
one-liner (type in
 > the command, run your test in another session, then hit ^C on this one):
 > 
 >   dtrace -n
fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry''{@a[probefunc] =
count()}''
 > 
 > This is my first ever use of dtrace, so please be gentle with me (:-).
 > 
 > 
 > hakansom at ohsu.edu said:
 > > Guess I should go read the ZFS source code (though my 10U3 surely
lags the
 > > Opensolaris stuff). 
 > 
 > I did go read the source code, for my own edification.  To reiterate what
 > was said earlier:
 > 
 > Roch.Bourbonnais at Sun.Com said:
 > > The point is that the flushes occur whether or not ZFS turned the
caches on
 > > or not (caches might be turned on by some other means outside the
visibility
 > > of ZFS). 
 > 
 > My limited reading of ZFS (on opensolaris.org site) code so far has turned
 > up no obvious way to make ZFS skip the sync-cache call.  However my dtrace
 > test, unless it''s flawed, shows that on some filesystems, the
call is made,
 > and on other filesystems the call is not made.
 > 
 > 
 > leon.is.here at gmail.com said:
 > > 2.I never saw the storage controller with cache-per-LUN setting.
Cache size
 > > doesn''t depend on number of LUNs IMHO, it''s a fixed
size per controller or
 > > per FC port, SAN-experts-please-fix-me-if-I''m-wrong. 
 > 
 > Robert has already mentioned array cache being reserved on a per-LUN basis
 > in Symmetrix boxes.  Our low-end HDS unit also has cache pre-fetch
settings
 > on a per-LUN basis (defaults according to number of disks in RAID-group).
 > 
 > Regards,
 > 
 > Marion
 > 
 > 
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Leon Koll

2007-Feb-13 14:35 UTC

head link

[zfs-discuss] Re: Re: Re: ZFS vs NFS vs array caches, revisited

Hi Marion,
your one-liner works only on SPARC and doesn''t work on x86: 
# dtrace -n fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry''{@a[probefunc] =
count()}''
dtrace: invalid probe specifier
fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[probefunc] = count()}: probe
description fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry does not match any probes

What''s wrong with it?
Thanks,
[i]-- leon[/i]
 
 
This message posted from opensolaris.org

Roch - PAE

2007-Feb-13 14:51 UTC

head link

[zfs-discuss] Re: Re: Re: ZFS vs NFS vs array caches, revisited

On x86 try with sd_send_scsi_SYNCHRONIZE_CACHE

Leon Koll writes:
 > Hi Marion,
 > your one-liner works only on SPARC and doesn''t work on x86: 
 > # dtrace -n
fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry''{@a[probefunc] =
count()}''
 > dtrace: invalid probe specifier
fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[probefunc] = count()}: probe
description fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry does not match any probes
 > 
 > What''s wrong with it?
 > Thanks,
 > [i]-- leon[/i]
 >  
 >  
 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Peter Schuller

2007-Feb-13 18:49 UTC

head link

[zfs-discuss] Meta data corruptions on ZFS.

> This is expected because of the copy-onwrite nature of ZFS. During 
> truncate it is trying to allocate
> new disk blocks probably to write the new metadata and fails to find them.
I realize there is a fundamental issue with copy on write, but does
this mean ZFS does not maintain some kind of reservation to guarantee
you can always remove data?

If so I would consider this a major issue for general purpose use, and
if nothing else it should most definitely be clearly
documented. Accidentally filling up space is not at *all* uncommon in
many situations, be it home use or medium sized business type use. Yes
you should avoid it, but shit (always) happens.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

Marion Hakanson

2007-Feb-14 00:36 UTC

head link

[zfs-discuss] ZFS vs NFS vs array caches, revisited

Roch.Bourbonnais at Sun.Com said:> The only obvious thing would be if the exported ZFS filesystems where
> initially mounted at a point in time when zil_disable was non-null. 
No changes have been made to zil_disable.  It''s 0 now, and
we''ve never
changed the setting.  Export/import doesn''t appear to change the
behavior.


Roch.Bourbonnais at Sun.Com said:> You might want to try in turn:
> 	dtrace -n
''sd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''
> 	dtrace -n ''sdioctl:entry{@a[stack(20)]=count()}''
> 	dtrace -n zil_flush_vdevs:entry{@a[stack(20)]=count()}''
> 	dtrace -n zil_commit_writer:entry{@a[stack(20)]=count()}''
> And see if you loose your footing along the way. 
I''ve included below the complete list of dtrace output.  This system
has
two zpools, one that goes "fast" for NFS and one that goes
"slow".  You
can see the details of the pools'' configs below.  Let me re-state that
at times in the past, the "fast" pool has gone "slow", and I
don''t know
what made it start going "fast" again.

To summarize, the first dtrace above gives no output on the fast pool,
and lists 6, 7, 12, or 14 calls for the slow pool.  The second dtrace
above counts 6 or 7 calls on both pools.  The last third dtrace above
gives no output for either pool, but zil_flush_vdevs isn''t in the stack
trace for the earlier trace on my machine (SPARC, Sol-10U3).  The last
dtrace doesn''t find a matching probe here.

================================================================
# echo "zil_disable/D" | mdb -k     
zil_disable:
zil_disable:    0               
# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
bulk_zp1               2.14T    160K   2.14T     0%  ONLINE     -
bulk_zp2               2.14T    346K   2.14T     0%  ONLINE     -
int01                  48.2G   1.94G   46.3G     4%  ONLINE     -
# cd                
# zpool export bulk_zp1
# zpool export bulk_zp2
# zpool import
  pool: bulk_zp2
    id: 803252704584693135
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        bulk_zp2                                             ONLINE
          raidz1                                             ONLINE
            c6t4849544143484920443630303133323230303330d0s0  ONLINE
            c6t4849544143484920443630303133323230303330d0s1  ONLINE
            c6t4849544143484920443630303133323230303331d0s0  ONLINE
            c6t4849544143484920443630303133323230303331d0s1  ONLINE
            c6t4849544143484920443630303133323230303332d0s0  ONLINE
            c6t4849544143484920443630303133323230303332d0s1  ONLINE

  pool: bulk_zp1
    id: 14914295292657419291
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

        bulk_zp1                                             ONLINE
          raidz1                                             ONLINE
            c6t4849544143484920443630303133323230303230d0s0  ONLINE
            c6t4849544143484920443630303133323230303230d0s1  ONLINE
            c6t4849544143484920443630303133323230303231d0s0  ONLINE
            c6t4849544143484920443630303133323230303231d0s1  ONLINE
            c6t4849544143484920443630303133323230303232d0s0  ONLINE
            c6t4849544143484920443630303133323230303232d0s1  ONLINE
            c6t4849544143484920443630303133323230303232d0s2  ONLINE
# zpool import bulk_zp1
# zpool import bulk_zp2
# zfs list bulk_zp1
NAME                   USED  AVAIL  REFER  MOUNTPOINT
bulk_zp1               123K  1.79T  53.6K  /zp1
# zfs list bulk_zp2
NAME                   USED  AVAIL  REFER  MOUNTPOINT
bulk_zp2               193K  1.75T  63.9K  /zp2
# dtrace -n
''ssd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''
\>  -n
''sd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''dtrace: description ''ssd_send_scsi_SYNCHRONIZE_CACHE:entry''
matched 1 probe
dtrace: description ''sd_send_scsi_SYNCHRONIZE_CACHE:entry''
matched 1 probe
^C

# : no output from zp1 test.
# dtrace -n
''ssd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''
\>  -n
''sd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''dtrace: description ''ssd_send_scsi_SYNCHRONIZE_CACHE:entry''
matched 1 probe
dtrace: description ''sd_send_scsi_SYNCHRONIZE_CACHE:entry''
matched 1 probe
^C


              ssd`ssdioctl+0x17a8
              zfs`vdev_disk_io_start+0xa0
              zfs`zio_ioctl+0xec
              zfs`vdev_config_sync+0xe0
              zfs`spa_sync+0x2ec
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               12

              ssd`ssdioctl+0x17a8
              zfs`vdev_disk_io_start+0xa0
              zfs`zio_ioctl+0xec
              zfs`vdev_config_sync+0x258
              zfs`spa_sync+0x2ec
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               12
# : above output from zp2 test.
# dtrace -n ''ssdioctl:entry{@a[stack(20)]=count()}'' -n 
''sdioctl:entry{@a[stack(20)]=count()}''
dtrace: description ''ssdioctl:entry'' matched 1 probe
dtrace: description ''sdioctl:entry'' matched 1 probe
^C


              zfs`vdev_disk_io_start+0xa0
              zfs`zio_ioctl+0xec
              zfs`vdev_config_sync+0xe0
              zfs`spa_sync+0x2ec
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
                6
# : above is from zp2 test.
# dtrace -n ''vdev_config_sync:entry{@a[stack(20)]=count()}''
dtrace: description ''vdev_config_sync:entry'' matched 1 probe
^C


              zfs`spa_sync+0x2ec
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               12
# : above is from zp2 test.
# dtrace -n ''vdev_config_sync:entry{@a[stack(20)]=count()}''
dtrace: description ''vdev_config_sync:entry'' matched 1 probe
^C


              zfs`spa_sync+0x2ec
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
                6
# : above is from zp1 test
# dtrace -n ''ssdioctl:entry{@a[stack(20)]=count()}'' -n 
''sdioctl:entry{@a[stack(20)]=count()}''
dtrace: description ''ssdioctl:entry'' matched 1 probe
dtrace: description ''sdioctl:entry'' matched 1 probe
^C


              zfs`vdev_disk_io_start+0xa0
              zfs`zio_ioctl+0xec
              zfs`vdev_config_sync+0xe0
              zfs`spa_sync+0x2ec
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               14

              zfs`vdev_disk_io_start+0xa0
              zfs`zio_ioctl+0xec
              zfs`vdev_config_sync+0x258
              zfs`spa_sync+0x2ec
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               14
# : above is from zp1 test.
# dtrace -n
''ssd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''
\> -n
''sd_send_scsi_SYNCHRONIZE_CACHE:entry{@a[stack(20)]=count()}''dtrace: description ''ssd_send_scsi_SYNCHRONIZE_CACHE:entry''
matched 1 probe
dtrace: description ''sd_send_scsi_SYNCHRONIZE_CACHE:entry''
matched 1 probe
^C

# : above is from zp1 test, i.e. no sync-cache calls happened.

================================================================
Regards,

Marion

dudekula mastan

2007-Feb-15 10:08 UTC

head link

[zfs-discuss] Is ZFS file system supports short writes ?

Hi all,
   
  Please let me know the ZFS support for short writes ?
   
  Thanks & Regards
  Masthan

 
---------------------------------
Cheap Talk? Check out Yahoo! Messenger''s low PC-to-Phone call rates.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070215/d6e0c76a/attachment-0001.html>

Robert Milkowski

2007-Feb-15 13:55 UTC

head link

Re: Is ZFS file system supports short writes ?

Hello dudekula,

Thursday, February 15, 2007, 11:08:26 AM, you wrote:

&gt;

Hi all,

Please let me know the ZFS support for short writes ?

And what are short writes?

-- 

Best regards,

 Robert                            mailto:rmilkowski@task.gda.pl

                                       http://milek.blogspot.com


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Torrey McMahon

2007-Feb-15 18:36 UTC

head link

[zfs-discuss] Is ZFS file system supports short writes ?

Robert Milkowski wrote:>
> Hello dudekula,
>
>
> Thursday, February 15, 2007, 11:08:26 AM, you wrote:
>
>
> >
>
> 	
>
> Hi all,
>
>  
>
> Please let me know the ZFS support for short writes ?
>
>  
>
>
>
> And what are short writes?
>
http://www.pittstate.edu/wac/newwlassignments.html#ShortWrites :-P

dudekula mastan

2007-Feb-17 09:42 UTC

head link

[zfs-discuss] Is ZFS file system supports short writes ?

If a write call attempted to write X bytes of data, and if writecall writes only
x ( hwere x <X) bytes, then we call that write as short write.
   
  -Masthan

Torrey McMahon <tmcmahon2 at yahoo.com> wrote:
  Robert Milkowski wrote:>
> Hello dudekula,
>
>
> Thursday, February 15, 2007, 11:08:26 AM, you wrote:
>
>
> >
>
> 
>
> Hi all,
>
> 
>
> Please let me know the ZFS support for short writes ?
>
> 
>
>
>
> And what are short writes?
>
http://www.pittstate.edu/wac/newwlassignments.html#ShortWrites :-P


 
---------------------------------
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food & Drink Q&A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070217/b1aa07b9/attachment.html>

Roch - PAE

2007-Feb-19 16:48 UTC

head link

[zfs-discuss] Is ZFS file system supports short writes ?

dudekula mastan writes:
 > If a write call attempted to write X bytes of data, and if writecall
writes only x ( hwere x <X) bytes, then we call that write as short write.
 >    
 >   -Masthan

What kind of support do you want/need ?

-r

Dan Mick

2007-Feb-23 11:43 UTC

head link

[zfs-discuss] Is ZFS file system supports short writes ?

So, that would be an "error", and, other than reporting it accurately,
what
would you want ZFS to do to "support" it?

dudekula mastan wrote:> If a write call attempted to write X bytes of data, and if writecall 
> writes only x ( hwere x <X) bytes, then we call that write as short
write.
>  
> -Masthan
>      > Please let me know the ZFS support for short writes ?

Frank Hofmann

2007-Feb-23 14:13 UTC

head link

[zfs-discuss] Is ZFS file system supports short writes ?

On Fri, 23 Feb 2007, Dan Mick wrote:
> So, that would be an "error", and, other than reporting it
accurately, what
> would you want ZFS to do to "support" it?
It''s not an error for write(2) to return with less bytes written than 
requested. In some situations, that''s pretty much expected. Like, for 
example, an writing to network sockets. But filesystems may also decide to 
do short writes, e.g. in the case when the write would extend the file, 
but the filesystem runs out of space before all of the write completed; 
it''s up to the implementation whether it returns ENOSPC for all of the 
write or whether it returns the number of bytes successfully written. Same 
if you exceed the rlimits or quota allocations; if the write is 
interrupted before completion.
>
> dudekula mastan wrote:
>> If a write call attempted to write X bytes of data, and if writecall
writes
>> only x ( hwere x <X) bytes, then we call that write as short write.
>>  -Masthan
>
>>      > Please let me know the ZFS support for short writes ?
In the sense that it does them ? Well, it''s UNIX/POSIX standard to do 
them, the write(2) manpage puts it like this:

      If a  write() requests that more bytes be written than there
      is  room for-for example, if the write would exceed the pro-
      cess file size limit (see getrlimit(2) and  ulimit(2)),  the
      system file size limit, or the free space on the device-only
      as many bytes as there is room  for  will  be  written.  For
      example,  suppose there is space for 20 bytes more in a file
      before reaching a limit. A write() of 512-bytes returns  20.
      The  next  write()  of  a  non-zero  number of bytes gives a
      failure return (except as noted for pipes and FIFO below).

I.e. you get a partial write before a failing write. ZFS behaves like 
this (on quota, definitely - "filesystem full" on ZFS is a bit
different
due to the space needs for COW), just as other filesystems do.

Where have you encountered a filesystem _NOT_ supporting this behaviour ?

FrankH.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

zfs discuss - Feb 2007 - ZFS vs NFS vs array caches, revisited

[zfs-discuss] ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] ZFS vs NFS vs array caches, revisited

[zfs-discuss] Meta data corruptions on ZFS.

[zfs-discuss] ZFS vs NFS vs array caches, revisited

[zfs-discuss] Meta data corruptions on ZFS.

[zfs-discuss] Meta data corruptions on ZFS.

[zfs-discuss] ENOSPC on full FS (was: Meta data corruptions on ZFS.)

[zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: Re: Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Re: Re: Re: ZFS vs NFS vs array caches, revisited

[zfs-discuss] Meta data corruptions on ZFS.

[zfs-discuss] ZFS vs NFS vs array caches, revisited

[zfs-discuss] Is ZFS file system supports short writes ?

Re: Is ZFS file system supports short writes ?

[zfs-discuss] Is ZFS file system supports short writes ?

[zfs-discuss] Is ZFS file system supports short writes ?

[zfs-discuss] Is ZFS file system supports short writes ?

[zfs-discuss] Is ZFS file system supports short writes ?

[zfs-discuss] Is ZFS file system supports short writes ?