thr3ads.net - zfs discuss - [zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Damon Atkins

2007-Jul-26 10:01 UTC

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

Guys,
   What is the best way to ask for a feature enhancement to ZFS.

To allow ZFS to be usefull for DR disk replication, we need to be able 
set an option against the pool or file system or both, called close 
sync. ie When a programme closes a file any outstanding writes are flush 
to disk, before the close returns to the programme.  So when a programme 
ends you are guarantee any state information is save to the disk. 
(exit() also results in close being called)

open(xxx, O_DSYNC) is only good if you can alter the source code.  Shell 
scripts use of awk, head, tail, echo etc to create output  files do not 
use O_DSYNC, when the shell script returns 0, you want to know that all 
the data is on the disk, so if the system crashes the data is still there.

PS it would be nice if  UFS had closessync as well, instead of using 
forcedirectio.

Cheers

Frank Hofmann

2007-Jul-26 10:11 UTC

head link

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

On Thu, 26 Jul 2007, Damon Atkins wrote:
> Guys,
>   What is the best way to ask for a feature enhancement to ZFS.
>
> To allow ZFS to be usefull for DR disk replication, we need to be able
> set an option against the pool or file system or both, called close
> sync. ie When a programme closes a file any outstanding writes are flush
> to disk, before the close returns to the programme.  So when a programme
> ends you are guarantee any state information is save to the disk.
> (exit() also results in close being called)
>
> open(xxx, O_DSYNC) is only good if you can alter the source code.  Shell
> scripts use of awk, head, tail, echo etc to create output  files do not
> use O_DSYNC, when the shell script returns 0, you want to know that all
> the data is on the disk, so if the system crashes the data is still there.
>
> PS it would be nice if  UFS had closessync as well, instead of using
> forcedirectio.
I''d implement this via LD_PRELOAD library, implementing your own
''close'',
so that this not only dispatches to libc`close but also does an fsync() 
call on that filedescriptor before.

Or, if really wanting to make sourcecode changes, again change it in 
libc`close(), and make it depend on an environment variable; if 
DO_CLOSE_SYNC is set, perform fsync(); close() instead of just the 
latter.

There''s a problem with sync-on-close anyway - mmap for file I/O. Who 
guarantees you no file contents are being modified after the close() ?

FrankH.

Anton B. Rang

2007-Jul-27 03:39 UTC

head link

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

> I''d implement this via LD_PRELOAD library [ ... ]
>
> There''s a problem with sync-on-close anyway - mmap for file I/O.
Who
> guarantees you no file contents are being modified after the close() ?
The latter is actually a good argument for doing this (if it is necessary) in
the file system, rather than via a preload library. The file system knows when
the file is no longer accessible by a user process (neither opened nor mapped).

That said, I?m not sure exactly what this buys you for disk replication. What?s
special about files which have been closed? Is the point that applications might
close a file and then notify some other process of the file?s availability for
use?
 
 
This message posted from opensolaris.org

Damon Atkins

2007-Aug-03 15:09 UTC

head link

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

>Date: Thu, 26 Jul 2007 20:39:09 PDT
>From: "Anton B. Rang"
>That said, I?m not sure exactly what this buys you for disk replication.
What?s special about files >which have been closed? Is the point that
applications might close a file and then notify some other >process of the
file?s availability for use?
Yes


E.g. 1
Program starts output job,and completes job in OS Cache on Server A. Server A
tells batch scheduling software on Server B, that job is complete. Server A
Crashes, file no longer exists or is truncated due to what is left in the OS
Cache. Server B Schedules the next job, on the assumption that the file creates
on Server A is ok.

E.g. 2
Program starts output job,and completes job in OS Cache on Server A. A DB on
Server A running in a different ZFS Pool, updates a DB record to record the fact
the output is complete (DB uses O_DSYNC) Server A Crashes, file no longer exists
or is truncated due to what is left in the OS Cache. Server A DB contains
information saying that the file is completed.

I believe that sync-on-close should be the default. File systems integrity
should be more than just being able to read a file which has been truncated due
to a system crash/power failure etc.

E.g. 3 (a bit cheeky -:)
$ vi xxxx a file, save the file, system crashes, you look back at the screen and
you say thank god, I save the file in time, because on your screen in the prompt
$ again. This is all happening in the OS Cache file. When the system returns the
file does not exist. (I am ignoring vi -r)
$ vi xxxxx
$ connection lost
Therefore users should do
$ vi xxxxx
$ sleep 5 ; echo file xxxxx now on disk :-)
$ echo add a line > xxxxx
$ sleep 5; echo update to xxxxx complete

UFS forcedirectio and VxFS closesync ensure that what ever happens your files
will always exist if the program completes. Therefore with Disk Replication
(sync) the file exists at the other site at its finished size. When you
introduce DR with Disk Replication, general means you can not afford to lose any
save data. UFS forcedirectio has a larger performance hit than VxFS closesync.

Cheers
 
 
This message posted from opensolaris.org

Frank Hofmann

2007-Aug-03 15:19 UTC

head link

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

On Fri, 3 Aug 2007, Damon Atkins wrote:

[ ... ]> UFS forcedirectio and VxFS closesync ensure that what ever happens your
files will always exist if the program completes. Therefore with Disk
Replication (sync) the file exists at the other site at its finished size. When
you introduce DR with Disk Replication, general means you can not afford to lose
any save data. UFS forcedirectio has a larger performance hit than VxFS
closesync.
Hmm, not quite.

forcedirectio, at least on UFS, is bound on the I/O operations meeting 
certain criteria. These are explained in directio(3C):

      DIRECTIO_ON     The system behaves as though the application
                      is  not  going to reuse the file data in the
                      near future. In other words, the  file  data
                      is not cached in the system''s memory pages.

                      When  possible,  data  is  read  or  written
                      directly  between  the  application''s memory
                      and the device when  the  data  is  accessed
                      with  read(2)  and write(2) operations. When
                      such transfers are not possible, the  system
                      switches  back  to the default behavior, but
                      just for that  operation.  In  general,  the
                      transfer  is possible when the application''s
                      buffer is  aligned  on  a  two-byte  (short)
                      boundary,  the  offset into the file is on a
                      device sector boundary, and the size of  the
                      operation is a multiple of device sectors.

                      This advisory  is  ignored  while  the  file
                      associated   with   fildes  is  mapped  (see
                      mmap(2)).

So, it all depends on how exactly your workload looks like. If you''re 
doing non-blocked writes or writes to nonalinged offsets, and/or mmap 
access, directio is not being done, the advisory AND (!) the mount option 
notwithstanding.

As far as the hot backup consistency goes:

Do a "lockfs -w", then start the BCV copy, then (once that started) do
a
"lockfs -u".
A writelocked filesystem is "clean", needs not to be fsck''ed
before being
able to mount it.

The disadvantage is that write ops to that fs in question will block while 
the lockfs -w is active. But then, you don''t need to wait until the BCV
finished - you only need the consistent state to start with, and can 
unlock immediately as the copy started.

Note that fssnap also writelocks temporarily. So if you have used 
UFS snapshots in the past, "lockfs -w";<BCV start>;"lockfs
-u" is not
going to cause you more impact.

"lockfs -f" is only a best-try-if-I-cannot-writelock. It''s no
guarantee
for consistency, because by the time the command returns something else 
can already be writing again.

FrankH.

>
> Cheers
>
>
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Darren Dunham

2007-Aug-03 16:12 UTC

head link

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

> Do a "lockfs -w", then start the BCV copy, then (once that
started) do a
> "lockfs -u".
> A writelocked filesystem is "clean", needs not to be
fsck''ed before being
> able to mount it.
> 
> The disadvantage is that write ops to that fs in question will block while 
> the lockfs -w is active. But then, you don''t need to wait until
the BCV
> finished - you only need the consistent state to start with, and can 
> unlock immediately as the copy started.
Unless I''m misunderstanding the terminology, consistency isn''t
required
for the BCV copy (start or otherwise), it''s required during the split.


-- 
Darren Dunham                                           ddunham at taos.com
Senior Technical Consultant         TAOS            http://www.taos.com/
Got some Dr Pepper?                           San Francisco, CA bay area
         < This line left intentionally blank to confuse you. >

zfs discuss - Jul 2007 - ZFS with HDS TrueCopy and EMC SRDF

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF

[zfs-discuss] ZFS with HDS TrueCopy and EMC SRDF