thr3ads.net - zfs discuss - [zfs-discuss] ZFS: unreliable for professional usage? [Feb 2009]

If this information is useful, please help other people find it:
Share via:

D. Eckert

2009-Feb-09 09:46 UTC

[zfs-discuss] ZFS: unreliable for professional usage?

Hi,

after working for 1 month with ZFS on 2 external USB drives I have experienced,
that the all new zfs filesystem is the most unreliable FS I have ever seen.

Since working with the zfs, I have lost datas from:

1 80 GB external Drive
1 1 Terrabyte external Drive

It is a shame, that zfs has no filesystem management tools for repairing e. g.
being able to repair those errors:

       NAME        STATE     READ WRITE CKSUM
        usbhdd1     ONLINE       0     0     8
          c3t0d0s0  ONLINE       0     0     8

errors: Permanent errors have been detected in the following files:

        usbhdd1: 0x0


It is indeed very disappointing that moving USB zpools between computers ends in
90 % with a massive loss of data.

This is to the not reliable working command zfs umount <poolname>, even if
the output of mount shows you, that the pool is no longer mounted and ist
removed from mntab.

It works only 1 or 2 times, but removing the device back to the other machine,
the pool won''t be either recognized at all or the error mentioned above
occurs.

Or suddenly you''ll find that message inside your messages: "Fault
tolerance of the pool may be compromised."

However, I just want to state a warning, that ZFS is far from being that what it
is promising, and so far from my sum of experience I can''t recommend at
all to use zfs on a professional system.

Regards,

Dave.
-- 
This message posted from opensolaris.org

Tomas Ögren

2009-Feb-09 09:53 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 09 February, 2009 - D. Eckert sent me these 1,5K bytes:
> Hi,
> 
> after working for 1 month with ZFS on 2 external USB drives I have
experienced, that the all new zfs filesystem is the most unreliable FS I have
ever seen.
> 
> Since working with the zfs, I have lost datas from:
> 
> 1 80 GB external Drive
> 1 1 Terrabyte external Drive
> 
> It is a shame, that zfs has no filesystem management tools for repairing e.
g. being able to repair those errors:
> 
>        NAME        STATE     READ WRITE CKSUM
>         usbhdd1     ONLINE       0     0     8
>           c3t0d0s0  ONLINE       0     0     8
> 
> errors: Permanent errors have been detected in the following files:
> 
>         usbhdd1: 0x0
> 
> 
> It is indeed very disappointing that moving USB zpools between
> computers ends in 90 % with a massive loss of data.
> 
> This is to the not reliable working command zfs umount <poolname>,
> even if the output of mount shows you, that the pool is no longer
> mounted and ist removed from mntab.
You don''t move a pool with ''zfs umount'', that only
unmounts a single zfs
filesystem within a pool, but the pool is still active.. ''zpool
export''
releases the pool from the OS, then ''zpool import'' on the
other machine.
> It works only 1 or 2 times, but removing the device back to the other
> machine, the pool won''t be either recognized at all or the error
> mentioned above occurs.
> 
> Or suddenly you''ll find that message inside your messages:
"Fault
> tolerance of the pool may be compromised."
> 
> However, I just want to state a warning, that ZFS is far from being
> that what it is promising, and so far from my sum of experience I
> can''t recommend at all to use zfs on a professional system.
You''re basically yanking disks from a live filesystem, if you
don''t do
that, filesystems are happier.

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Casper.Dik at Sun.COM

2009-Feb-09 09:56 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>However, I just want to state a warning, that ZFS is far from being that
what it
>is promising, and so far from my sum of experience I can''t
recommend at all to
>use zfs on a professional system.

Or, perhaps, you''ve given ZFS disks which are so broken that they are 
really unusable; it is USB, after all.

And certainly, on Solaris you''d get the same errors with UFS or PCFS;
but
you would not able to detect any corruption.

You may have seen Al''s post about moving a spinning 1TB hard disk.

Before we can judge what goes wrong, we would need a bit more information
such as:

	- motherboard and the USB controller
	- the USB enclosure which holds the disk(s)
	- the type of the disks themselves.
	- any messages recorded in /var/adm/messages (for the time you used
	  the database)

	- and how did you remove the disks from the system?

Unfortunately, you cannot be sure that when the USB enclosure says that 
all the data is safe, it is actually written to the disk.

Casper

D. Eckert

2009-Feb-09 10:06 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Hi Caspar,

thanks for you reply.

I completely disagreed to your opinion, that is USB. And seems as well, that I
am not the only one having this opinion regarding ZFS.

However, the hardware used is:

1 Sun Fire 280R Solaris 10 generic 10-08 latest updates
1 Lenovo T61 Notebook running Solaris 10 genric 10-08 latest updates
1 Sony VGN-NR38Z

Harddrives in use: Trekstore 1 TB, Seagate momentus 7.200 rpm 2.5" 80 GB.

The harddrives used are brand new, as well the Sony notebook.

Even if I did zfs umount poolname I waited for 30 sec. and then unplugged, data
corruption occurs.

For testing purposes on a Sun Fire 280R completely set up with ZFS I tried 
hotswaping a HDD.
There happens the same.

It is a big administrator''s burden to get such a zfs drive back to
live.

So how can I get my zpools back to life?

Regards,

Dave.
-- 
This message posted from opensolaris.org

Ian Collins

2009-Feb-09 10:10 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

D. Eckert wrote:> Hi Caspar,
>
> thanks for you reply.
>
> I completely disagreed to your opinion, that is USB. And seems as well,
that I am not the only one having this opinion regarding ZFS.
>
> However, the hardware used is:
>
> 1 Sun Fire 280R Solaris 10 generic 10-08 latest updates
> 1 Lenovo T61 Notebook running Solaris 10 genric 10-08 latest updates
> 1 Sony VGN-NR38Z
>
> Harddrives in use: Trekstore 1 TB, Seagate momentus 7.200 rpm 2.5" 80
GB.
>
> The harddrives used are brand new, as well the Sony notebook.
>
> Even if I did zfs umount poolname I waited for 30 sec. and then unplugged,
data corruption occurs.
>
>   You don''t zfs umount poolname, you zpool export it.

-- 
Ian.

Casper.Dik at Sun.COM

2009-Feb-09 10:25 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>However, the hardware used is:
>
>1 Sun Fire 280R Solaris 10 generic 10-08 latest updates
>1 Lenovo T61 Notebook running Solaris 10 genric 10-08 latest updates
>1 Sony VGN-NR38Z
>
>Harddrives in use: Trekstore 1 TB, Seagate momentus 7.200 rpm 2.5" 80
GB.
(Is that the Trekstore with 2x500GB)
>The harddrives used are brand new, as well the Sony notebook.
>
>Even if I did zfs umount poolname I waited for 30 sec. and then unplugged,
data corruption occurs.
Did you EXPORT the pool?

"Unmount" is not sufficient.

You need to use:
	zpool export poolname

How exactly did you remove the disk from the 280R?

And what exact problem did you get?

You need to "off-line" the disk before actually removing it,
physically.


Casper

Ahmed Kamal

2009-Feb-09 10:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>
> "Unmount" is not sufficient.
>
Well, umount is not the "right" way to do it, so he''d be
simulating a
power-loss/system-crash. That still doesn''t explain why massive data
loss
would occur ? I would understand the last txg being lost, but 90% according
to OP ?!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090209/2f4a658a/attachment.html>

Casper.Dik at Sun.COM

2009-Feb-09 11:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>Well, umount is not the "right" way to do it, so he''d be
simulating a
>power-loss/system-crash. That still doesn''t explain why massive
data loss
>would occur ? I would understand the last txg being lost, but 90% according
>to OP ?!

On USB or? I think he was trying to properly unmount the USB devices.

One of the known issues with USB devices is that they may not properly 
work; for a typical disk, it will properly "flush write cache" when it
is instructed to do so.

However, when you connect the devices using a USB controller and a USB
enclosure, we''re less certain that "flush write cache" will
make it
to the drive, because:
	- was a command send to the enclosure (e.g., if you needed to configure
	  the device with "reduced-cmd-support=true", then all bets are 
	  off)
	- when the enclosure responds, did it send a "flush write cache"
	  to the disk?
	- and when it responds, did it wait until the disk completed the
	  command?

It is one of the reasons why I''d recommend against USB for disks.  Too 
many variables.

Casper

D. Eckert

2009-Feb-09 11:10 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

ok, so far so good.

but how can I get my pool up and running????

Following output:

bash-3.00# zfs get all usbhdd1
NAME     PROPERTY         VALUE                 SOURCE
usbhdd1  type             filesystem            -
usbhdd1  creation         Do Dez 25 23:36 2008  -
usbhdd1  used             34,3G                 -
usbhdd1  available        39,0G                 -
usbhdd1  referenced       34,3G                 -
usbhdd1  compressratio    1.00x                 -
usbhdd1  mounted          no                    -
usbhdd1  quota            none                  default
usbhdd1  reservation      none                  default
usbhdd1  recordsize       128K                  default
usbhdd1  mountpoint       /usbhdd1              default
usbhdd1  sharenfs         off                   default
usbhdd1  checksum         on                    local
usbhdd1  compression      off                   default
usbhdd1  atime            on                    default
usbhdd1  devices          on                    default
usbhdd1  exec             on                    default
usbhdd1  setuid           on                    default
usbhdd1  readonly         off                   default
usbhdd1  zoned            off                   default
usbhdd1  snapdir          hidden                default
usbhdd1  aclmode          groupmask             default
usbhdd1  aclinherit       restricted            default
usbhdd1  canmount         on                    default
usbhdd1  shareiscsi       off                   default
usbhdd1  xattr            on                    default
usbhdd1  copies           1                     default
interner Fehler: unable to get version property
interner Fehler: unable to get utf8only property
interner Fehler: unable to get normalization property
interner Fehler: unable to get casesensitivity property
usbhdd1  vscan            off                   default
usbhdd1  nbmand           off                   default
usbhdd1  sharesmb         off                   default
usbhdd1  refquota         none                  default
usbhdd1  refreservation   none                  default

bash-3.00# zpool status -xv usbhdd1
  Pool: usbhdd1
 Status: ONLINE
Zustand: Auf mindestens einem Ger?t ist ein Fehler aufgetreten, der eine
        Datenbesch?digung bewirkt hat.  M?glicherweise sind davon Anwendungen
betroffen.
Aktion: Stellen Sie die betreffende Datei wenn m?glich wieder her.  Anderenfalls
stellen Sie den
        gesamten Pool aus einer Sicherung wieder her.
   Siehe: http://www.sun.com/msg/ZFS-8000-8A
 scrub: Keine erforderlich
config:

        NAME        STATE     READ WRITE CKSUM
        usbhdd1     ONLINE       0     0    16
          c3t0d0s0  ONLINE       0     0    16

errors: Permanent errors have been detected in the following files:

        usbhdd1:&lt;0x0&gt;

bash-3.00# zpool list
NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
storage  48,8G  16,3G  32,4G    33%  ONLINE  -
usbdrv1   484M  2,79M   481M     0%  ONLINE  -
usbhdd1  74,5G  34,3G  40,2G    46%  ONLINE  -

I don''t understand, that I get status information about the pool, e. g.
cap, size, health, but I can not mount it to the system:

bash-3.00# zfs mount usbhdd1
cannot mount ''usbhdd1'': E/A-Fehler
bash-3.00#

any suggestion for help?

thanks and regards,

dave.
-- 
This message posted from opensolaris.org

James C. McPherson

2009-Feb-09 11:59 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Mon, 09 Feb 2009 03:10:21 -0800 (PST)
"D. Eckert" <contact at desystems.cc> wrote:
> ok, so far so good.
> 
> but how can I get my pool up and running????
I can''t help you with this bit

....> bash-3.00# zpool status -xv usbhdd1
>   Pool: usbhdd1
>  Status: ONLINE
> Zustand: Auf mindestens einem Ger?t ist ein Fehler aufgetreten, der
> eine Datenbesch?digung bewirkt hat.  M?glicherweise sind davon
> Anwendungen betroffen. Aktion: Stellen Sie die betreffende Datei wenn
> m?glich wieder her.  Anderenfalls stellen Sie den gesamten Pool aus
> einer Sicherung wieder her. Siehe: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: Keine erforderlich
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         usbhdd1     ONLINE       0     0    16
>           c3t0d0s0  ONLINE       0     0    16
> 
> errors: Permanent errors have been detected in the following files:
> 
>         usbhdd1:&lt;0x0&gt;
> 
> bash-3.00# zpool list
> NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> storage  48,8G  16,3G  32,4G    33%  ONLINE  -
> usbdrv1   484M  2,79M   481M     0%  ONLINE  -
> usbhdd1  74,5G  34,3G  40,2G    46%  ONLINE  -
> 
> I don''t understand, that I get status information about the pool,
e.
> g. cap, size, health, but I can not mount it to the system:
> 
> bash-3.00# zfs mount usbhdd1
> cannot mount ''usbhdd1'': E/A-Fehler
> bash-3.00#
You have checksum errors on a non-replicated pool. This
is not something that can be ignored. 


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

D. Eckert

2009-Feb-09 12:20 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

James,

on a UFS ore reiserfs such errors could be corrected.

It is grossly negligent to develop a file system without proper repairing tools.

More and more becomes clear, that it just was a marketing slogan by Sun to
state, that ZFS does not use any repairing tools due to healing itself.

In this particular case we are talking about a a loss of at least 35 GB (!!!!)
of data.

And as long as the ZFS Developer are more focused on proven wrong marketing
aspects I can''t recommend ZFS at all in a professional area and I am
thinking about to make this issue clear on
the Sun Conference we have in Germany in March this year.

It is not a good practice just to make someone believe who just lost that mass
of data "I am sorry, but I can''t help you." Even in the fact,
if you don''t understand why it happened.

A good practice would be to care first for a proper documentation.
There''s nothing stated in the man pages, if USB zpools are used, that
the zfs mount/unmount is NOT recommended and zpool export should be used
instead.

Having facilities to override checksumming to get even an as corrupted tagged
pool mounted to rescue the data shouldn''t be just a dream. it should be
a MUST TO HAVE.

I agree, it is always - regardless the used FSType - to have a proper backup
facility in place. But based on the issues ZFS was designed for - for very big
pools - it becomes as well a cost aspect.

And as well it would be a good practice by Sun due to having internet boards
full of complaining people loosing data just because of using zfs, to CARE FOR
THAT.

Regards,

DE
-- 
This message posted from opensolaris.org

2009-Feb-09 12:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> on a UFS ore reiserfs such errors could be corrected.
I think some of these people are assuming your hard drive is broken. 
I''m not sure what you''re assuming, but if the hard drive is
broken, I don''t think ANY file system can do anything about that.

At best, if the disk was in a RAID 5 array, and the other disks worked, then the
parity from the working disks could correct the broken data on the broken
drive... but you only have a single disk, not a mirror or a raid 5, so this fix
can''t be done...

I think this might be a case of zfs reporting errors that other file systems
don''t notice.  Your hard drive might have been broken for months
without you knowing it until now.  In that case the errors aren''t the
fault of zfs.  It is the fault of the broken drive, and the fault of the other
file systems for not knowing when data is corrupted.  See what I mean?
-- 
This message posted from opensolaris.org

Jürgen Keil

2009-Feb-09 12:53 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> bash-3.00# zfs mount usbhdd1
> cannot mount ''usbhdd1'': E/A-Fehler
> bash-3.00#
Why is there an I/O error?

Is there any information logged to /var/adm/messages when this
I/O error is reported?  E.g. timeout errors for the USB storage device?
-- 
This message posted from opensolaris.org

Casper.Dik at Sun.COM

2009-Feb-09 13:58 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>James,
>
>on a UFS ore reiserfs such errors could be corrected.
That''s not true.  That depends on the nature of the error.

I''ve seen quite a few problems on UFS with corrupted file contents;
such filesystems always are "clean".  Yet the filesystems are
corrupted.
And no tool can fix those filesystems.
>It is grossly negligent to develop a file system without proper repairing
tools.
Repairing to what state?

One of the reasons why there''s a "ufs fsck" is because its
disk state is
nearly always "corrupted".  The log only allows you to repair the
metadata,
NEVER the data.  And I''ve seen the corrupted files many times.
(Specifically, when you upgrade a driver and it''s buggy, you would 
typically have a broken driver_alias, name_to_major, etc.  Though I added
a few fsyncs in update_drv and ilk and it is better, fsck does not
"fix" UFS filesystems.

Fsck can only repair known faults; known discrepancies in the meta data.
Since ZFS doesn''t have such known discrepancies, there''s
nothing to repair.
>More and more becomes clear, that it just was a marketing slogan by Sun to
state,
>that ZFS does not use any repairing tools due to healing itself.
If it can repair, then it does.  But if you only have one copy of the data,
then you cannot repair the data missing.
>In this particular case we are talking about a a loss of at least 35 GB
(!!!!)
>of data.
>A good practice would be to care first for a proper documentation.
There''s
>nothing stated in the man pages, if USB zpools are used, that the zfs
>mount/unmount is NOT recommended and zpool export should be used instead.
You have a live pool and you yank it out of the system?  Where does it say
that you can do that? 
>Having facilities to override checksumming to get even an as corrupted
tagged pool mounted to rescue the data shouldn''t be just a dream. it
should be a MUST TO HAVE.
Depends on how much of the data is corrupted and which parts they are.
>I agree, it is always - regardless the used FSType - to have a proper backup
>facility in place. But based on the issues ZFS was designed for - for very
big
>pools - it becomes as well a cost aspect.
>
>And as well it would be a good practice by Sun due to having internet boards
>full of complaining people loosing data just because of using zfs, to CARE
FOR THAT.
I''ve not seen a lot of people who complained; or perhaps I
don''t look
carefully (I''m not in ZFS development)

What I have seen is some issues with weird BIOS issues (taking part of a 
disk); connecting a zpool to different systems at the same time, including 
what you may have done by having the zpool "imported" on both systems.

Casper

D. Eckert

2009-Feb-09 15:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

too many words wasted, but not a single word, how to restore the data.

I have read the man pages carefully. But again: there''s nothing said,
that on USB drives zfs umount pool is not allowed.

So how on earth should a simple user know that, if he knows that filesystems
properly unmounted using the umount cmd??

And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if there
was no shock, hit or any other event like that?

It is of course easier to blame the stupid user instead of having proper
documentation and emergency tools to handle that.

The list of malfunctions of SNV Builts gets longer and longer with every version
released.

e. g. on SNV 107

- installation script is unable to write properly the boot blocks for grub
- you choose German locale, but have an American Keyboard style in the gnome
(since SNV 103)
- in SNV 107 adding these lines to xorg.conf:

    Option         "XkbRules" "xorg"
    Option         "XkbModel" "pc105"
    Option         "XkbLayout" "de"

(was working in SNV 103)

lets crash the Xserver.

- latest Nvidia Driver (Vers. 180) for GeForce 8400M doesn''t work with
OpenSolaris SNV 107
- nwam and iwk0: not solved, no DHCP responses

it seems better, to stay focused on having a colourfull gui with hundreds of
functions no one needs instead providing a stable core.

I am looking forward the day booting OpenSolaris and see a greeting Windows XP
Logo surrounded by the blue bubbles of OpenSolaris.....

Cheers,

D.
-- 
This message posted from opensolaris.org

Kyle McDonald

2009-Feb-09 15:18 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Hi Dave,

Having read through the whole thread, I think there are several things 
that could all be adding to your problems.
At least some of which are not related to ZFS at all.

You mentioned the ZFS docs not warning you about this, and yet I know 
the docs explictly tell you that:

1. While a ZFS pool that has no redundancy (Mirroring or Parity,) like 
your''s is missing, can still *detect* errors in the data read from the 
drive, it can''t *repair* those errors. Repairing errors requires that 
ZFS be performing (at least) the (top-most level of) Mirroring or Parity 
functions. Since you have no Mirroring or Parity ZFS cannot 
automatically recover this data.

2. As others have said, a zpool can contain many filesystems. ''zfs 
umount'' only unmounts a single filesystem. Removing a full pool from a 
machine requires a ''zpool export'' no matter what disk
technology is
being used (USB, SCSI, SATA, FC, etc.)  On the new system you would use 
''zpool import'' to bring the pool into the new system.

I''m sure this next on is documented by Sun also though not in the ZFS 
docs, probably in some other part of the system dealing with removable 
devices:

3. In addition, according to Casper''s message you need to
''off-line'' USB
(and probasbly other types too) storage in Solaris (Just like in 
Windows) before pulling the plug. This has nothing to do with ZFS. This 
will have corrupted (possibly even past the point of repair most other 
filesystems also.

Still, I had an idea on something you might try. I don''t know how long 
it''s been  since you pulled the drive, or what else you''ve
done since.

Which machine is reporting the errors you''ve shown us? The machine you 
pulled the drives from? or the machine you moved them too? Were you 
successful in ''zpool import'' the pool into the other machines?
This idea
might work either way, but if you haven''t successfully immported it
into
another machine there''s probably more of a chance.

If the output is from the machine you pulled them out of, then basically 
that machine still thinks the pool is connected to it, and it thinks the 
one and only disk in the pool is now not responding. In this case the 
errors you see in the tables are the errors from trying to contact a 
drive that no longer exists.

Have you reconnected the disk to the original machine yet? If not I''d 
attempt a ''zpool export'' now (though that may not work.) and
then shut
the machine down fully, and connect the disk. Then boot it all up. 
Depending on what you''ve tried to do with this disk to fix the problem 
since it happened I have no idea exactly how the machine will come up.

If you couldn''t do the ''zpool'' export, then the
machine will try to
mount the FS''s in the pool on boot. This may nor may not work.
If you were successful in doing the export with the disks disconnected, 
then it won''t try, and you''ll need to ''zpool
import'' them after the
machine is booted.

Depending on how the import goes, you might still see errors in the 
''zpool status'' output. If so, I know a ''zpool
clear'' will clear those
errors, and I doubt it can make the situation any worse than it is now. 
You''d have to give us info about what the machine tells you after this 
before I can advise you more. But (and the experts can correct me if
I''m
wrong) this might ''just work(tm)''.

My theory here is that the ZFS may have been successful in keeping the 
state of the (meta)data on the disk consistent after all. The checksum 
and I/O errors listed may be from ZFS trying to access the non-existent 
drive after you removed it. Which (in theory) are all bogus errors, and 
don''t really point to errors in the data on the drive.

Of course there are many things that all have to be true for this theory 
to turn out to be true. Depending on what has happened to the machines 
and the disks since they were originally unplugged from each other, all 
bets might be off. And then there''s the possibility that the my idea 
never could work at all. People much more expert than I can chime in on 
that.

  -Kyle

D. Eckert wrote:> Hi,
>
> after working for 1 month with ZFS on 2 external USB drives I have
experienced, that the all new zfs filesystem is the most unreliable FS I have
ever seen.
>
> Since working with the zfs, I have lost datas from:
>
> 1 80 GB external Drive
> 1 1 Terrabyte external Drive
>
> It is a shame, that zfs has no filesystem management tools for repairing e.
g. being able to repair those errors:
>
>        NAME        STATE     READ WRITE CKSUM
>         usbhdd1     ONLINE       0     0     8
>           c3t0d0s0  ONLINE       0     0     8
>
> errors: Permanent errors have been detected in the following files:
>
>         usbhdd1: 0x0
>
>
> It is indeed very disappointing that moving USB zpools between computers
ends in 90 % with a massive loss of data.
>
> This is to the not reliable working command zfs umount <poolname>,
even if the output of mount shows you, that the pool is no longer mounted and
ist removed from mntab.
>
> It works only 1 or 2 times, but removing the device back to the other
machine, the pool won''t be either recognized at all or the error
mentioned above occurs.
>
> Or suddenly you''ll find that message inside your messages:
"Fault tolerance of the pool may be compromised."
>
> However, I just want to state a warning, that ZFS is far from being that
what it is promising, and so far from my sum of experience I can''t
recommend at all to use zfs on a professional system.
>
> Regards,
>
> Dave.
>

Christian Wolff

2009-Feb-09 15:25 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

First: It sucks to loose data. That''s very uncool...BUT

I don''t know how ZFS should be able to recover data with no mirror to
copy from. If you have some kind of a RAID level you''re easily able to
recover your data. I saw that several times. Without any problems and even with
nearly no performance impact on a productive machine.

No offense. But you must admit that you flame on a filesystem without even
knowing the right commands and blame us to not recover your data?!
C''mon!

Regards,
Chris
-- 
This message posted from opensolaris.org

Casper.Dik at Sun.COM

2009-Feb-09 15:27 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>too many words wasted, but not a single word, how to restore the data.
>
>I have read the man pages carefully. But again: there''s nothing
said, that on USB drives zfs umount pool is not allowed.
You cannot unmount a pool.

You can only unmount a filesystem.

That the default name of the pool''s filesystem is the same as the name
of
the pool is an artifact of the implementation.

Surely, you can unmount the filesystem.

That is not illegal.

But you''ve removed a live pool WITHOUT exporting it.

I can understand that you make that mistake because you take what you know
from other filesystems and you apply that to ZFS.

>So how on earth should a simple user know that, if he knows that filesystems
>properly unmounted using the umount cmd??
Reading the documentation.  The zpool and zfs commands are easy to use and 
perhaps this stops you and others from reading, e.g.,

http://docs.sun.com/app/docs/doc/819-5461/gavwn?l=en&a=view

And before you use ZFS you must understand some of the basic concepts; 
rather than having a device which you can mount with "mount", you have
a
"pool" and that "pool" is owned by the system.

>
>And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if
there
>was no shock, hit or any other event like tht?
If you removed the device from a live pool and moved it on another system 
and them moved it back, then, yes, you could have problems.

You I''d suppose that the system shouldn''t go online and
requiring an
import (-f).
>
>It is of course easier to blame the stupid user instead of having proper
>documentation and emergency tools to handle that.
The document explains that must use export in order to remove pools from 
one system to another.

I''m not sure how the system can prevent that; there''s no
"lock" on your
USB slots.



As for the other problems with nv107, each time we change a lot of 
software; and sometimes we change important parts of the sofware; e.g.,
in nv107 we changed to a newer version of Xorg.

The cutting edge builds vary in quality.

Casper

Uwe Dippel

2009-Feb-09 15:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Full of sympathy, I still feel you might as well relax a bit.
It is the XkbVariant that starts X without any chance to return.
But look at the many "boot stops after the third line", and from my
side, the not working network settings, even without nwam.
The worst part was a so-called engineer stating that one simply can''t
expect a host to connect to the same gateway through 2 different paths properly.

But it would be wrong to admonish the individuals, and my excuses to those I
treated with contempt.
The problem cannot be solved in this forum. The issue needs to be addressed
elsewhere. When adoption (migration) is the objective, in the first place the
kernel needs to boot, whatever the hardware, even if graceful degradation is
unavoidable. Second, a network setting must be possible, and not simply doing
nothing, or requiring a dead NIC to be added just to boot. As much as I was
grateful to be helped, of course an X server needs to fall back to sane
behaviour at all times. And sendmail loses mail. All this is sick. But
priorities need to come from managers, or the community, not from the coders. In
OpenSolaris SUN insists on calling the shots, so it will be managers in this
case. I myself am very unhappy with ZFS; not because it had failed me, but for a
third party, cold-eyes review, the man page and the concept and (arcane)
commands by now surpasses by far the sequence of logical steps to partition
(fdisk) and format (newfs) a drive. Pools, tanks, scrubs, imports, exports and
whatnot; I don''t think this was the original intention. And - as bad as
the network engineer further up - is the statement on ''USB hard disk
not suitable for ZFS'' or similar.
Do not get me wrong, OpenSolaris is still my preferred Desktop, I love its
stability, and - laugh at me - it is the only one that always allows to kill an
application gone sour (Ubuntu usually fails here). I consider it elegant and
helpful with my daily work. *If* it is up, *if* it boots. Alas, this is by far
the more difficult part. And here I agree with you: USB hard disks need a
proper, clear, way to be attached and removed, without even exceeding the old
way of mount-umount.
Try to run a hard disk test. Let us also compare here: I never lost an
ext3-drive that would pass the hardware test. On the contrary, at times I could
recover data from one that failed. But let us introduce as measure the former
one: As long as the drive is not flagged ''corrupt'' by the disk
test utility, it surely must not lose any data (aside from
''rm''). My honest and curious question: Does ZFS pass this
test?

Uwe
-- 
This message posted from opensolaris.org

Kyle McDonald

2009-Feb-09 15:49 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

D. Eckert wrote:> too many words wasted, but not a single word, how to restore the data.
>
> I have read the man pages carefully. But again: there''s nothing
said, that on USB drives zfs umount pool is not allowed.
>   It is allowed. But it''s not enough. You need to read both the
''zpool ''
and ''zfs'' manpages. the ''zpool'' manpage will
tell you that the way to
move the ''whole pool'' to another machine is to run
''zpool export
<poolname>''. The ''zpool export'' will actually
run the ''zfs umount'' for
you, though it''s not a problem if it''s already been done.

Note, this isn''t USB specific, you won''t see anything in the
docs about
USB. This condition applies to SCSI and others too. You need to export 
the pool to move it to another machine. If the machine crashed before 
you could export it, ''zpool import -f'' on the new machine can
help
import it anyway.

With USB, there are probably other commands you''ll also need to use to 
notify Solaris that you are going to unplug the drive, Just like the 
''Safely remove hardware'' tool on windows. Or you need to
remove it only
when the system is shut down. These commands will be documented 
somewhere else, not in the ZFS docs because they don''t apply to just
ZFS.> So how on earth should a simple user know that, if he knows that
filesystems properly unmounted using the umount cmd??
>   You need to understand that the filesystems are all contained in a 
''pool'' (more than one filesystem can share the disk space in
in the same
pool). Unmounting the filesystem *does not* prepare the *pool* to be 
moved from one machine to another.> And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if
there was no shock, hit or any other event like that?
>   Who knows? Some harddrives are manufactured with problems. Remember that 
ZFS is designed to catch problems that even the ECC on the drive
doesn''t
catch. So it''s not impossible for it to catch errors even the 
manufacturer''s QA tests missed.> It is of course easier to blame the stupid user instead of having proper
documentation and emergency tools to handle that.
>
>   I beleive that between the man pages, the administration docs on the 
web, the best practices pages, and all the other blogs and web pages, 
that ZFS is documented well enough. It''s not like other filesystems, so
there is more to learn, and you need to review all the docs, not just 
the ones that cover the operations (like unmount) that you''re familiar 
with. Understanding pools (and the commands that manage pools,) is also 
important. Man pages and command references are good when you understand 
the architecture and need to learn about the details of a command you 
know you need to use. It''s the other documentation that will fill you
in
you on how the system parts work together, and advise you on the best 
way to setup or do what you want.

As I said in my other email ZFS can''t repair errors without a way to 
reconstruct the data. It needs mirroring, parity (or the copies=x 
setting) to be able to repair the data. By setting up a pool with no 
redundancy. So your email subject line is a little backwards, since any 
''professional'' usage would incorporate redundancy (Mirror,
Parity, etc.)
What you''re trying to do is more ''home/hobbiest''
usage. Though most
home/hobbiest users decide to incorporte redundancy for any data they 
really care about.> The list of malfunctions of SNV Builts gets longer and longer with every
version released.
>
>   I''m sure new things are added every release, but many are also fixed. 
sNV is pre-release software after all. Overall the problems found
aren''t
around long, and I beleive the list gets shorter as often as it gets 
longer. If you want production level Solaris, ZFS is available in 
solaris 10.> e. g. on SNV 107
>
> - installation script is unable to write properly the boot blocks for grub
> - you choose German locale, but have an American Keyboard style in the
gnome (since SNV 103)
> - in SNV 107 adding these lines to xorg.conf:
>
>     Option         "XkbRules" "xorg"
>     Option         "XkbModel" "pc105"
>     Option         "XkbLayout" "de"
>
> (was working in SNV 103)
>
> lets crash the Xserver.
>
> - latest Nvidia Driver (Vers. 180) for GeForce 8400M doesn''t work
with OpenSolaris SNV 107
> - nwam and iwk0: not solved, no DHCP responses
>
>   Yes there was a major update of the X server sources to catch up to the 
latest(?) X.org release. Workarounds are known, and I bet this will be 
working again in b108 (or not long after.)> it seems better, to stay focused on having a colourfull gui with hundreds
of functions no one needs instead providing a stable core.
>
>   The core of solaris is much more stable than anythign else I''ve used. 
The windowing system is not a part of the core of an operatinog system 
in my book.> I am looking forward the day booting OpenSolaris and see a greeting Windows
XP Logo surrounded by the blue bubbles of OpenSolaris.....
>
>   <roll-eyes>

Note that sNV (aka SXCE - or Solaris eXpress Community Edition) isn''t 
really OpenSolaris, though they are related. OpenSolaris is based of 
specifc snapshots of sNV (the last one being b101 I think) and is 
updated much less often than sNV. sNV is mainly targeted at those who 
want to develop Solaris itself, and those who want to try out the latest 
builds.

  -Kyle
> Cheers,
>
> D.
>

David Champion

2009-Feb-09 16:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> too many words wasted, but not a single word, how to restore the data.
>
> I have read the man pages carefully. But again: there''s nothing
said,
> that on USB drives zfs umount pool is not allowed.
You misunderstand.  This particular point has nothing to do with USB;
it''s the same for any ZFS environment.  You''re allowed to do a
zfs
umount on a filesystem, there''s no problem with that.  But remember
that ZFS is not just a filesystem, in the way that reiserfs and UFS are
filesystems.  It''s an integrated storage pooling system and filesystem.
When you umount a filesystem, you''re not taking any storage offline,
you''re just removing the filesystem''s presence on the VFS
hierarchy.

You umounted a zfs filesystem, not touching the pool, then removed
the device.  This is analogous to preparing an external hardware RAID
and creating one or more filesystems, using them a while, umounting
one of them, and powering down the RAID.  You did nothing to protect
other filesystems or the RAID''s r/w cache.  Everything on the RAID
is now inconsistent and suspect.  But since your "RAID" was a single
striped volume, there''s no mirror or parity information with which to
reconstruct the data.

ZFS is capable of detecting these problems, where other filesystems are
often not.  But no filesystem can tell what the data should have been
when the only copy of the data is damaged.

This is documented in ZFS.  It''s not about USB, it''s just that
USB
devices can be more vulnerable to this kind of treatment than other
kinds of storage are.
> And again: Why should a 2 weeks old Seagate HDD suddenly be damaged,
> if there was no shock, hit or any other event like that?
It happens all the time.  We just don''t always know about it.

-- 
 -D.    dgc at uchicago.edu    NSIT    University of Chicago

Andrew Gabriel

2009-Feb-09 16:01 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Kyle McDonald wrote:> D. Eckert wrote:
>   
>> too many words wasted, but not a single word, how to restore the data.
>>
>> I have read the man pages carefully. But again: there''s
nothing said, that on USB drives zfs umount pool is not allowed.
>>   
>>     
> It is allowed. But it''s not enough. You need to read both the
''zpool ''
> and ''zfs'' manpages. the ''zpool'' manpage
will tell you that the way to
> move the ''whole pool'' to another machine is to run
''zpool export
> <poolname>''. The ''zpool export'' will
actually run the ''zfs umount'' for
> you, though it''s not a problem if it''s already been done.
>
> Note, this isn''t USB specific, you won''t see anything in
the docs about
> USB. This condition applies to SCSI and others too. You need to export 
> the pool to move it to another machine. If the machine crashed before 
> you could export it, ''zpool import -f'' on the new machine
can help
> import it anyway.
>
> With USB, there are probably other commands you''ll also need to
use to
> notify Solaris that you are going to unplug the drive, Just like the 
> ''Safely remove hardware'' tool on windows. Or you need to
remove it only
> when the system is shut down. These commands will be documented 
> somewhere else, not in the ZFS docs because they don''t apply to
just ZFS.
>   
That would be cfgadm(1M).
It''s also used for hot-swapable SATA drives (and probably other
things).

-- 
Andrew

Bob Friesenhahn

2009-Feb-09 17:05 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Mon, 9 Feb 2009, D. Eckert wrote:>
> A good practice would be to care first for a proper documentation. 
> There''s nothing stated in the man pages, if USB zpools are used, 
> that the zfs mount/unmount is NOT recommended and zpool export 
> should be used instead.
I have been using USB mirrored disks for backup purposes for about 
eight months now.  No data loss, or even any reported uncorrectable 
read failures.  These disks have been shared between two different 
systems (x86 and SPARC).  The documentation said that I should use zfs 
export/import and so that is what I have done, with no problems.

While these USB disks seem to be working reliably, it is 
certainly possible to construct a USB arrangement which does not work 
reliably since most USB hardware is cheap junk.  My USB disks are 
direct attached and don''t go through a USB bridge.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Orvar Korvar

2009-Feb-09 21:55 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Seagate7,

You are not using ZFS correctly. You have misunderstood how it is used. If you
dont follow the manual (which you havent) then any filesystem will cause
problems and corruption, even ZFS or ntfs or FAT32, etc. You must use ZFS
correctly. Start by reading the manual.

For ZFS to be able to repair errors, you must use two drives or more. This is
clearly written in the manual. If you only use one drive then ZFS can not repair
errors. If you use one drive, then ZFS can only detect errors, but not repair
errors. This is also clearly written in the manual. And, when you pull out a
disk, you must use "zpool export" command. This is also clearly
written in the manual. If you pull out a drive without issuing a warning that
you will do so (by zpool export) then ZFS will not work.

You are not following the manual, then any software will cause problems. Even
Windows. You are not using ZFS as it is intended to do. I suggest, in the
future, you stay with Windows, which you know. If you use Unix without knowing
it or without reading the manual, then you will have problems. You know Windows,
stay with Windows.
-- 
This message posted from opensolaris.org

Glenn Lagasse

2009-Feb-09 23:12 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

* Orvar Korvar (knatte_fnatte_tjatte at yahoo.com)
wrote:> Seagate7,
> 
> You are not using ZFS correctly. You have misunderstood how it is
> used. If you dont follow the manual (which you havent) then any
> filesystem will cause problems and corruption, even ZFS or ntfs or
> FAT32, etc. You must use ZFS correctly. Start by reading the manual.
> 
> For ZFS to be able to repair errors, you must use two drives or more.
> This is clearly written in the manual. If you only use one drive then
> ZFS can not repair errors. If you use one drive, then ZFS can only
> detect errors, but not repair errors. This is also clearly written in
> the manual.
Or, you can set copies > 1 on your zfs filesystems.  This at least
protects you in cases of data corruption on a single drive but not if
the entire drive goes belly up.

Cheers,

-- 
Glenn

Miles Nordin

2009-Feb-09 23:17 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "ok" == Orvar Korvar <knatte_fnatte_tjatte at
yahoo.com> writes:
ok> You are not using ZFS correctly.
ok> You have misunderstood how it is used. If you dont follow the
ok> manual (which you havent) then any filesystem will cause
ok> problems and corruption, even ZFS or ntfs or FAT32, etc. You
ok> must use ZFS correctly. Start by reading the manual.

Before writing a reply dripping with condescention, why don''t you
start by reading the part of the ``manual'''' where it says
``always
consistent on disk''''?

Please, lay off the kool-aid, or else drink more of it: Unclean
dismounts are *SUPPORTED*. This is a great supposed ZFS feature BUT
cord-yanking is not supposed to cause loss of the entire filesystem,
not on _any_ modern filesystem such as: UFS, FFS, ext3, xfs, hfs+.

There is a real problem here. Maybe not all of the problem is in ZFS,
but some of it is. If ZFS is going to be vastly more sensitive to
discarded SYNCHRONIZE CACHE commands than competing filesystems to the
point that it trashes entire pools on an unclean dismount, then it
will have to include a storage stack qualification tool, not just a
row of defensive pundits ready to point their fingers at hard drives
which are guilty until proven innocent, and lack an innocence-proving
tool. And I''m not convinced that''s the only problem.

Even if it is, the write barrier problem is pervasive. Linux LVM2
throws them away, and many OS''s that _do_ implement fdatasync() for
the userland including Linux-without-LVM2 only sync part way down,
don''t propogate it all the way down the storage stack to the drive, so
file-backed pools (as you might use for testing, backup, or virtual
guests) are not completely safe.

Aside from these examples, note that, AIUI, Sun''s sun4v I/O
virtualizer, VirtualBox software, and iSCSI initiator and target were
all caught guilty of this write barrier problem, too, so it''s not
only, or even mostly, a consumer-grade problem or an other-tent
problem.

If this is really the problem trashing everyone''s pools, it
doesn''t
make me feel better because the problem is pretty hard to escape once
you do the slightest meagerly-creative thing with your storage. Even
if the ultimate problem turns out not to be in ZFS, the ZFS camp will
probably have to persecute the many fixes since they''re the ones so
unusually vulnerable to it.

also there are worse problems with some USB NAND FLASH sticks
according to Linux MTD/UBI folks:

http://www.linux-mtd.infradead.org/doc/ubifs.html#L_raw_vs_ftl

We have heard reports that MMC and SD cards corrupt and loose data
if power is cut during writing. Even the data which was there long
time before may corrupt or disappear. This means that they have bad
FTL which does not do things properly. But again, this does not have
to be true for all MMCs and SDs - there are many different
vendors. But again, you should be careful.

Of course this doesn''t apply to any spinning hard drives nor to all
sticks, only to some sticks.

The ubifs camp did an end-to-end test for their filesystem''s integrity
using a networked power strip to do automated cord-yanking. I think
ZFS needs an easier, faster test though, something everyone can do
before loading data into a pool.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090209/56cc52cb/attachment.bin>

Toby Thain

2009-Feb-10 02:35 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 9-Feb-09, at 6:17 PM, Miles Nordin wrote:
>>>>>> "ok" == Orvar Korvar <knatte_fnatte_tjatte
at yahoo.com> writes:
>
>     ok> You are not using ZFS correctly.
>     ok> You have misunderstood how it is used. If you dont follow the
>     ok> manual (which you havent) then any filesystem will cause
>     ok> problems and corruption, even ZFS or ntfs or FAT32, etc. You
>     ok> must use ZFS correctly. Start by reading the manual.
>
> Before writing a reply dripping with condescention, why don''t you
> start by reading the part of the ``manual'''' where it says
``always
> consistent on disk''''?
>
> Please, lay off the kool-aid, or else drink more of it: Unclean
> dismounts are *SUPPORTED*.  This is a great supposed ZFS feature BUT
> cord-yanking is not supposed to cause loss of the entire filesystem,
> not on _any_ modern filesystem such as: UFS, FFS, ext3, xfs, hfs+.
> ... the write barrier problem is pervasive.  Linux LVM2
> throws them away, and many OS''s that _do_ implement fdatasync()
for
> the userland including Linux-without-LVM2 only sync part way down,
> don''t propogate it all the way down the storage stack to the
drive, so
> file-backed pools (as you might use for testing, backup, or virtual
> guests) are not completely safe.
>
> Aside from these examples, note that, AIUI, Sun''s sun4v I/O
> virtualizer, VirtualBox software, and iSCSI initiator and target were
> all caught guilty of this write barrier problem, too,
YES! I recently discovered that VirtualBox apparently defaults to  
ignoring flushes, which would, if true, introduce a failure mode  
generally absent from real hardware (and eventually resulting in  
consistency problems quite unexpected to the user who carefully  
configured her journaled filesystem or transactional RDBMS!)

It seems as though I''ll have to dive into the source code to prove  
it, though:
http://forums.virtualbox.org/viewtopic.php?p=59123#59123

There is no substitute for cord-yank tests - many and often. The  
weird part is, the ZFS design team simulated millions of them. So the  
full explanation remains to be uncovered?

--Toby

> so it''s not
> only, or even mostly, a consumer-grade problem or an other-tent
> problem.
> ...
> The ubifs camp did an end-to-end test for their filesystem''s
integrity
> using a networked power strip to do automated cord-yanking.  I think
> ZFS needs an easier, faster test though, something everyone can do
> before loading data into a pool.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jeff Bonwick

2009-Feb-10 03:06 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> There is no substitute for cord-yank tests - many and often. The  
> weird part is, the ZFS design team simulated millions of them.
> So the full explanation remains to be uncovered?
We simulated power failure; we did not simulate disks that simply
blow off write ordering.  Any disk that you''d ever deploy in an
enterprise or storage appliance context gets this right.

The good news is that ZFS is getting popular enough on consumer-grade
hardware.  The bad news is that said hardware has a different set of
failure modes, so it takes a bit of work to become resilient to them.
This is pretty high on my short list.

Jeff

Ahmed Kamal

2009-Feb-10 08:09 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>
> The good news is that ZFS is getting popular enough on consumer-grade
> hardware.  The bad news is that said hardware has a different set of
> failure modes, so it takes a bit of work to become resilient to them.
> This is pretty high on my short list.

So does this basically mean zfs rolls-back to the latest on-disk consistent
state before any failure, even if it means (minor) data loss. Is there any
bug report I can follow so I would know when the fix for this is committed
Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/3bb98c12/attachment-0013.html>

Gino

2009-Feb-10 16:04 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> > There is no substitute for cord-yank tests - many
> and often. The  
> > weird part is, the ZFS design team simulated
> millions of them.
> > So the full explanation remains to be uncovered?
> 
> We simulated power failure; we did not simulate disks
> that simply
> blow off write ordering.  Any disk that you''d ever
> deploy in an
> enterprise or storage appliance context gets this
> right.
> 
> The good news is that ZFS is getting popular enough
> on consumer-grade
> hardware.  The bad news is that said hardware has a
> different set of
> failure modes, so it takes a bit of work to become
> resilient to them.
> This is pretty high on my short list.
Jeff,
we lost many zpools with multimillion$ EMC, Netapp and HDS arrays
just simulating fc switches power fails.
The problem is that ZFS can''t properly recover itself.

How can even think to adopt ZFS with >100TB pools if
a simple fc switch failure can make a pool totally unaccessible?
I know UFS fsck can only repair metadata but this is much better than loose all
your data!
All we know how much it would take to restore from backup 100TB of data ..

ZFS should be at least able to recover pools discarding last txg as you
suggested months ago. Any news about that?

thanks
gino
-- 
This message posted from opensolaris.org

dick hoogendijk

2009-Feb-10 16:28 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Mon, 09 Feb 2009 01:46:01 PST
"D. Eckert" <contact at desystems.cc> wrote:
> after working for 1 month with ZFS on 2 external USB drives I have
> experienced, that the all new zfs filesystem is the most unreliable
> FS I have ever seen.
> 
> Since working with the zfs, I have lost datas from:
> 
> 1 80 GB external Drive
> 1 1 Terrabyte external Drive
> 
> It is a shame, that zfs has no filesystem management tools for
> repairing e. g. being able to repair those errors:
> 
>        NAME        STATE     READ WRITE CKSUM
>         usbhdd1     ONLINE       0     0     8
>           c3t0d0s0  ONLINE       0     0     8
> 
> errors: Permanent errors have been detected in the following files:
> 
>         usbhdd1: 0x0
What filesystem likes it when disks are pulled out from a LIVE
filesystem? Try that on UFS and you''re f** up too.

You problem is that you have not read the manual well!
Using the wrong command gets you into trouble. Soi.

Maybe zpool export/import does what you want?

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv105 ++
+ All that''s really worth doing is what we do for others (Lewis Carrol)

Gino

2009-Feb-10 16:53 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> On Mon, 09 Feb 2009 01:46:01 PST
> "D. Eckert" <contact at desystems.cc> wrote:
> 
> > after working for 1 month with ZFS on 2 external
> USB drives I have
> > experienced, that the all new zfs filesystem is the
> most unreliable
> > FS I have ever seen.
> > 
> > Since working with the zfs, I have lost datas from:
> > 
> > 1 80 GB external Drive
> > 1 1 Terrabyte external Drive
> > 
> > It is a shame, that zfs has no filesystem
> management tools for
> > repairing e. g. being able to repair those errors:
> > 
> >        NAME        STATE     READ WRITE CKSUM
> >         usbhdd1     ONLINE       0     0     8
> >           c3t0d0s0  ONLINE       0     0     8
> > 
> > errors: Permanent errors have been detected in the
> following files:
> > 
> >         usbhdd1: 0x0
> 
> What filesystem likes it when disks are pulled out
> from a LIVE
> filesystem? Try that on UFS and you''re f** up too.
> 
> You problem is that you have not read the manual
> well!
> Using the wrong command gets you into trouble. Soi.
> 
> Maybe zpool export/import does what you want?
Dick,
Dave made a mistake pulling out the drives with out exporting them first.
For sure also UFS/XFS/EXT4/.. doesn''t like that kind of operations but
only with ZFS you risk to loose ALL your data.
that''s the point!

gino
-- 
This message posted from opensolaris.org

Mattias Pantzare

2009-Feb-10 16:55 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> What filesystem likes it when disks are pulled out from a LIVE
> filesystem? Try that on UFS and you''re f** up too.
Pulling a disk from a live filesystem is the same as pulling the power
from the computer. All modern filesystems can handle that just fine.
UFS with logging on do not even need fsck.

Now if you have a disk that lies and don''t write to the disk when it
should  all bets are off.

Peter Schuller

2009-Feb-10 18:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> >However, I just want to state a warning, that ZFS is far from being
that what it
> >is promising, and so far from my sum of experience I can''t
recommend at all to
> >use zfs on a professional system.
> 
> 
> Or, perhaps, you''ve given ZFS disks which are so broken that they
are
> really unusable; it is USB, after all.
I had a cheap-o USB enclosure that definitely did ignore such
commands. On every txg commit I''d get a warning in dmesg (this was on
FreeBSD) about the device not implementing the relevant SCSI command.

This of course would affect filesystems other than ZFS aswell. What is
worse, I was unable to completely disable write caching either because
that, too, did not actually propagate to the underlying device when
attempted.

(I could not say for certain whether this was fundamental to the
device or in combination with a FreeBSD issue.)

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/15aef5bc/attachment-0013.bin>

Charles Binford

2009-Feb-10 18:03 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Jeff, what do you mean by "disks that simply blow off write
ordering."?
My experience is that most enterprise disks are some flavor of SCSI, and
host SCSI drivers almost ALWAYS use simple queue tags, implying the
target is free to re-order the commands for performance.  Are talking
about something else, or does ZFS request Order Queue Tags on certain
commands?

Charles

Jeff Bonwick wrote:>> There is no substitute for cord-yank tests - many and often. The  
>> weird part is, the ZFS design team simulated millions of them.
>> So the full explanation remains to be uncovered?
>>     
>
> We simulated power failure; we did not simulate disks that simply
> blow off write ordering.  Any disk that you''d ever deploy in an
> enterprise or storage appliance context gets this right.
>
> The good news is that ZFS is getting popular enough on consumer-grade
> hardware.  The bad news is that said hardware has a different set of
> failure modes, so it takes a bit of work to become resilient to them.
> This is pretty high on my short list.
>
> Jeff
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Peter Schuller

2009-Feb-10 18:05 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> YES! I recently discovered that VirtualBox apparently defaults to  
> ignoring flushes, which would, if true, introduce a failure mode  
> generally absent from real hardware (and eventually resulting in  
> consistency problems quite unexpected to the user who carefully  
> configured her journaled filesystem or transactional RDBMS!)
I recommend everyone to be extremely hesitant to assume that any
particular storage setup actually honors write barriers and cache
flushes. This is a recommendation I would give even when you purchase
non-cheap battery backed hardware RAID controllers (I won''t mention
any names or details to avoid bashing as I''m sure it''s not
specific to
the particular vendor I had problems with most recently).

You need the underlying device to do the right thing, the driver to do
the right thing, the operating system in general to do the right thing
(which includes the file system, block device layer if any etc - for
example, if use md on Linux with RAID5/6 you''re toast).

So again I cannot stress enough - do not assume things behave in a
non-broken fashion with respect to write barriers and flushes. I can''t
speak to expensive integrated hardware solutions; I HOPE, though at
this point my level of paranoid does not allow me to assume, that if
you buy boxed systems from companies like Sun/HP/etc you get decent
stuff. But I can definitely say that paying non-trivial amounts of
money for hardware is not a guarantee that you won''t get completely
broken behavior.

<speculation>
I think it boils down to the fact that 99% of customers that aren''t
doing integration of the individual components in overall packages,
probably don''t care/understand/bother with it, so as long as the
benchmarks say it''s "fast", they sell.
</speculation>

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/3c9ef52f/attachment-0013.bin>

Peter Schuller

2009-Feb-10 18:07 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> And again: Why should a 2 weeks old Seagate HDD suddenly be damaged, if
there was no shock, hit or any other event like that?
I have no information about your particular situation, but you have to
remember the ZFS uncovers problems that otherwise go unnoticed. Just
personally on my private hardware (meaning a very limited set), I have
seen silent corruption issues several times. The most recent one I
discovered almost immediately because of ZFS. If it weren''t for ZFS, I
would have been highly likely to have transfered my entire system
without noticing and suffer weird problems a couple of weeks later.

While I don''t know what is going on in your case, blaming the
introduction of a piece of software/hardware/procedure on some problem
without identifying a causal relationship, is a common mistake to
make.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/7b2c24db/attachment-0013.bin>

Peter Schuller

2009-Feb-10 18:13 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> on a UFS ore reiserfs such errors could be corrected.
In general, UFS has zero capability to actually fix real corruption in
any reliable way.

What you normally do with fsck is repairing *expected* inconsistencies
that the file system was *designed* to produce in the event of e.g. a
sudden reboot or a crash. This is entirely different from repairing
arbitrary corruption. If ZFS says that a file has a checksum error,
that can very well be because there is a bug in ZFS. But it can also
be the case that there *is* actual on-disk (or in-transit) corruption
that ZFS has detected, and given I/O errors back to an application
instead of producing bad data.

Now it is probably entirely true that onces you *do* have broken
hardware or there is some other reason for corruption beyond that
which you can design for, ZFS is probably less mature than traditional
file systems in terms of the availability of tools and procedures to
salvage whatever might actually be salvagable. That is a valid
critisicm.

But you *have* to realize the distinction between "repairing" fully
expected inconsistencies specifically expected as part of regular
operation in the event of a crash/power outtage, from problems
arrising from misbehaving hardware or bugs in software. ZFS cannot
magically overcome such problems, nor can UFS/reiserfs/xfs/whatever
else.

--
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/9bfd7b95/attachment-0013.bin>

Miles Nordin

2009-Feb-10 18:28 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "jb" == Jeff Bonwick <Jeff.Bonwick at
sun.com> writes:
    jb> We simulated power failure; we did not simulate disks that
    jb> simply blow off write ordering.  Any disk that you''d ever
    jb> deploy in an enterprise or storage appliance context gets this
    jb> right.

Did you simulate power failure of iSCSI/FC target shelves without
power failure of the head node?  

How about power failure of iscsitadm-style iSCSI targets?

How about rebooting the master domain in sun4v---what is it called?
I''ve not had any sun4v but heard the I/O domain, the kernel that
contains all the disk drivers, can be rebooted without rebooting the
guest-domain kernels which have virtual-disk-drivers, and that sounds
like a great opportunity to lose a batch of writes.

Do you consider sun4v virtual I/O or iscsitadm as well-fitted to an
``enterprise'''' context, or are they not ready for deploying in
the
Enterprise yet?  :)


IMHO it''d really be fantastic if almost all the lost ZFS pools turned
out to be just this one write cache problem, and ZFS the canary---not
in terms of a checksum canary this time, but in terms of shitting
itself when write barriers are violated.  Then it''ll be almost a
blessing that ZFS is so vulnerable to it, because maybe there will be
enough awareness and pressure that it''ll finally become practical to
build an end-to-end system without this problem.  Suddenly having a
database-friendly filesystem everywhere, including trees mounted over
NFS/cifs/lustre/whatevers-next, might change some of our assumptions
about which MUA''s have fragile message stores and what programs need
to store things on ``a local drive''''.

I''m ordering a big batch of crappy peecee hardware tomorrow so I can
finally start testing and quit ranting.  I''ll see if this old post can
serve as the qualification tool I keep wanting:

 http://code.sixapart.com/svn/tools/trunk/diskchecker.pl

He used the tool on Linux, I think, and he used it end-to-end, to
check fsync() from user-level.  which is odd, because I thought I
remember reading Linux does _not_ propogate fsync() all the way to the
disk, and they''re trying to fix it.  In its internal storage stack,
Linux has separate ideas of ``cache flush'''' and ``write
barrier''''
while my impression is that physical disks have only the latter, so
they sort of rely sometimes on things happening ``soon'''', but
this guy
is saying whether fsync() works or not, on Linux ext3, is determined
almost entirely by the disk.

possibly the tool can be improved---someone on this list had the
interesting idea to write backwards, to provoke the drive into wanting
to reorder writes across a barrier since even the dumbest drive will
want to write in the direction the platter''s spinning.  I''m
not sure
that backwards-writing will provoke misbehavior inside iSCSI stacks
though.  In the end the obvious mtd/ubi-style test of writing to a
zpool and trying to destroy it by yanking cords might be the best
test.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/0cd8e479/attachment-0013.bin>

Toby Thain

2009-Feb-10 18:30 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 10-Feb-09, at 1:03 PM, Charles Binford wrote:
> Jeff, what do you mean by "disks that simply blow off write  
> ordering."?
> My experience is that most enterprise disks are some flavor of  
> SCSI, and
> host SCSI drivers almost ALWAYS use simple queue tags, implying the
> target is free to re-order the commands for performance.
That''s right; I/O is reordered in many unpredictable ways on the way  
to the disk. So a flush or barrier enforces ordering at certain  
critical points. Transactional and journaling systems normally  
*require* a *functioning* flush/barrier for integrity.

--Toby
>  Are talking
> about something else, or does ZFS request Order Queue Tags on certain
> commands?
>
> Charles
>
> Jeff Bonwick wrote:
>>> There is no substitute for cord-yank tests - many and often. The
>>> weird part is, the ZFS design team simulated millions of them.
>>> So the full explanation remains to be uncovered?
>>>
>>
>> We simulated power failure; we did not simulate disks that simply
>> blow off write ordering.  Any disk that you''d ever deploy in
an
>> enterprise or storage appliance context gets this right.
>>
>> The good news is that ZFS is getting popular enough on consumer-grade
>> hardware.  The bad news is that said hardware has a different set of
>> failure modes, so it takes a bit of work to become resilient to them.
>> This is pretty high on my short list.
>>
>> Jeff
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Miles Nordin

2009-Feb-10 18:46 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "g" == Gino  <dandr.ch at gmail.com>
writes:
     g> we lost many zpools with multimillion$ EMC, Netapp and
     g> HDS arrays just simulating fc switches power fails.  

     g> The problem is that ZFS can''t properly recover itself.

I don''t like what you call ``the problem''''---I think
it assumes too
much.  You mistake *A* fix for *THE* problem, before we can even agree
for sure on, what is the problem.  The problem may be in the solaris
FC initiator, in a corner case of the FC protocol itself, or in ZFS''s
exception handling when a ``SYNCHRONIZE CACHE'''' command
returns
failure.

It''s likely other filesystems are affected by ``the
problem'''' as I
define it, just much less so.  If that''s the case, it''d be
much better
IMHO to fix the real problem once and for all, and find it so that it
stays fixed, than to make ZFS work around it by losing a tiny bit of
data instead of the whole pool.  I don''t think ZFS should feel
entitled to brag about protection from Silent Corruption, if it were
at the same time willing to silently boot without a slog, or silently
rollback to an earlier ueberblock, or if it acts like a cheap USB
stick when an FC switch reboots (by quietly losing things that were
written long ago).  

That''s something else to think of: if what''s happening is what
we
think is happening, then you may be having ``the problem'''' at
other
times when you do not lose pools!

I''m a fan of availability and not of ZFS''s lazy panics and
peppering
of assertions, but I''m starting to come around a little bit: I
don''t
want to miss an opportunity to raise everyone''s expectations of their
storage stacks, to finally hold cheating disk vendors, cheating
virtualization software vendors, and lazy iSCSI programmers
accountable, and to make the exception handling in ZFS actually
capable of dealing with modern storage instead of hanging status
commands, hanging NFS stacks, and inability to replay writes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/3064e127/attachment-0013.bin>

Miles Nordin

2009-Feb-10 19:03 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "ps" == Peter Schuller <peter.schuller at
infidyne.com> writes:
ps> This is a recommendation I would give even when you purchase
ps> non-cheap battery backed hardware RAID controllers (I won''t
ps> mention any names or details to avoid bashing as I''m sure
it''s
ps> not specific to the particular vendor I had problems with most
ps> recently).

This again? If you''re sure the device is broken, then I think others
would like to know it, even if all devices are broken.

but, fine. Anyway, how did you determine the device was broken? At
least you can tell us that much without fear of retaliation (whether
baseless or founded), and maybe others can use the same test to
independently discover what you did which would be both fair and safe
for you.

This is the real problem as I see it---a bunch of FUD, without any
actual resolution beyond ``it''s working, I _think_, and in any case
the random beatings have stopped so D''OH-NT TOUCH *ANY*THING! THAR BE
DEMONZ IN THE BOWELS O DIS DISK SHELF!''''

If anyone asks questions, they get no actual information, but a huge
amount of blame heaped on the sysadmin. Your post is a great example
of the typical way this problem is handled because it does both: deny
information and blame the sysadmin. Though I''m really picking on you
way too much here. Hopefully everyone''s starting to agree, though, we
do need a real way out of this mess!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/eb7c64c4/attachment-0013.bin>

D. Eckert

2009-Feb-10 19:50 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

(..)
Dave made a mistake pulling out the drives with out exporting them first.
For sure also UFS/XFS/EXT4/.. doesn''t like that kind of operations but
only with ZFS you risk to loose ALL your data.
that''s the point!
(...)

I did that many times after performing the umount cmd with ufs/reiserfs
filesystems on USB external drives. And they never complainted or got corrupted.
-- 
This message posted from opensolaris.org

D. Eckert

2009-Feb-10 19:54 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

I disagree, see posting above.

ZFS just accepts it 2 or 3 times. after that, your data are passed away to
nirvana for no reason.

And it should be legal, to have an external USB drive with a ZFS. with all
respect, why should a user always care for redundancy, e. g. setup a mirror on a
single HDD between the slices??

This reduces half your available space you have on your drive.
-- 
This message posted from opensolaris.org

Kyle McDonald

2009-Feb-10 19:57 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 2/10/2009 2:50 PM, D. Eckert wrote:> (..)
> Dave made a mistake pulling out the drives with out exporting them first.
> For sure also UFS/XFS/EXT4/.. doesn''t like that kind of operations
but only with ZFS you risk to loose ALL your data.
> that''s the point!
> (...)
>
> I did that many times after performing the umount cmd with ufs/reiserfs
filesystems on USB external drives. And they never complainted or got corrupted.
>    Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0 
spanning removable drives, you probably wouldn''t have been so lucky.

Just because you only create a single ZFS filesystem inside your zpool, 
doesn''t mean that when that single filesystem is unmounted it si safe
to
remove the drive. When you consider the extra layer of the zPool (like 
LVM or sw RAID) it''s not surpriseing there are other things you have to
do before you remove the disk.

   -Kyle

D. Eckert

2009-Feb-10 19:58 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

(...)
If anyone asks questions, they get no actual information, but a huge
amount of blame heaped on the sysadmin. Your post is a great example
of the typical way this problem is handled because it does both: deny
information and blame the sysadmin. Though I''m really picking on you
way too much here. Hopefully everyone''s starting to agree, though, we
do need a real way out of this mess!
(...)

THANK YOU! It''s precisely walking in my shoes.

or with a different expression: THE STUPID USER.
-- 
This message posted from opensolaris.org

Roman Shaposhnik

2009-Feb-10 20:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Feb 9, 2009,	at 7:06	PM, Jeff Bonwick wrote:>>There is no substitute for cord-yank tests - many and often. The
>>weird part is, the ZFS design team simulated millions of them.  
>>So the full explanation remains to be uncovered?
>
>We simulated power failure; we did not simulate disks that simply
>blow off write ordering.  Any disk that you''d ever deploy in an  
>enterprise or storage appliance context gets this right.
>
>The good news is that ZFS is getting popular enough on consumer-grade
>hardware.  The bad news is that said hardware has a different set of 
>failure modes, so it takes a bit of work to become resilient to them.
>This is pretty high on my short list.
Speaking of "modes of failure":	historically fsck has been used	for
slightly different (although related purposes):
   0. as a tool capable of restoring consistency in a FS that didn''t
   guarantee an always consistent on-disk state

   1. as a forensics tool that would let you retrieve as much
information
   as possible from a physically ill device

Thanks goodness, ZFS doesn''t need fsck for #0. That still leaves
#1. So far all we have in that department is zdb/mdb. These two 
can do wonders when used by professionals, yet still fall
into "don''t try that at home" category for everybody else.

Does such a tool sound reasonable? Does	it have	a chance of
ever showing up on your list?

Thanks,
Roman.
-- 
This message posted from opensolaris.org

Kyle McDonald

2009-Feb-10 20:01 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 2/10/2009 2:54 PM, D. Eckert wrote:> I disagree, see posting above.
>
> ZFS just accepts it 2 or 3 times. after that, your data are passed away to
nirvana for no reason.
>
> And it should be legal, to have an external USB drive with a ZFS. with all
respect, why should a user always care for redundancy, e. g. setup a mirror on a
single HDD between the slices??
>
>    You don''t have to have redundancy. But if you don''t then I
don''t know
how you can expect the
''repair'' features of ZFS to bail you out when somethign bad
happens.> This reduces half your available space you have on your drive.
>    Mirroring between slices does more than that. it'' will ruin your 
performance also. It''s be much better to set
''copies=2'', though that
will still reduce your space by half.

   -Kyle

Carsten Aulbert

2009-Feb-10 20:02 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Hi,

i''ve followed this thread a bit and I think there are some correct
points on any side of the discussion, but here I see a misconception (at
least I think it is):

D. Eckert schrieb:> (..)
> Dave made a mistake pulling out the drives with out exporting them first.
> For sure also UFS/XFS/EXT4/.. doesn''t like that kind of operations
but only with ZFS you risk to loose ALL your data.
> that''s the point!
> (...)
> 
> I did that many times after performing the umount cmd with ufs/reiserfs
filesystems on USB external drives. And they never complainted or got corrupted.
This of ZFS as an entity which cannot live without the underlying ZPOOL.
You can have reiserfs, jfs, ext?, xfs - you name it - on any logical
device as it will only live on this one and when you umount it, it''s
safe to power it off, yank the disk out whatever since there is now
other layer between the file system and the logical disk partition/slice/...

However, as soon as you add another layer (say RAID which in this
analogy is somehow the ZPOOL) you might also lose data when you have a
RAID0 setup and umount reiserfs/ufs/whatever and take a disc out of the
RAID and destroy it or change a few sectors on it. When you then mount
the file system again, it''s utterly broken and lost. Or - which might
be
worse - you might end up with a "silent" data corruption you will
never
notice unless you try to open the data block which is damaged.

However, in your case you have some checksum error in the file system on
a single hard disk which might have been caused by some accident. ZFS is
good in the respect that it can tell you that somethings broken, but
without a mirror or parity device it won''t be able to fix the data out
of thin air.

I cannot claim to fully understand what happened to your devices, so
please take my written stuff with a grain of salt.

Cheers

Carsten

Tim

2009-Feb-10 20:16 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Tue, Feb 10, 2009 at 12:46 PM, Miles Nordin <carton at ivy.net> wrote:
>
> It''s likely other filesystems are affected by ``the
problem'''' as I
> define it, just much less so.  If that''s the case, it''d
be much better
> IMHO to fix the real problem once and for all, and find it so that it
> stays fixed, than to make ZFS work around it by losing a tiny bit of
> data instead of the whole pool.  I don''t think ZFS should feel
> entitled to brag about protection from Silent Corruption, if it were
> at the same time willing to silently boot without a slog, or silently
> rollback to an earlier ueberblock, or if it acts like a cheap USB
> stick when an FC switch reboots (by quietly losing things that were
> written long ago).
>
I agree, silently rolling back would be a *BAD THING*.  HOWEVER, not giving
you the option to easily roll back *AT ALL* is a *WORSE THING*.  I
don''t
think zfs should brag about anything if my pool can be down for hours or
days because I''m not given the option to roll back to a consistent
state
when I *KNOW* it''s what I want to do.

Of course, making that easy wouldn''t sell support contracts, would it?

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/9f6d8fbe/attachment-0012.html>

Miles Nordin

2009-Feb-10 20:21 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "rs" == Roman Shaposhnik <rvs at sun.com>
writes:   
    rs>    1. as a forensics tool that would let you retrieve as much
    rs> information as possible from a physically ill device

a nit, but I''ve never foudn fsck alone useful for this.  Maybe for ``a
filesystem trashed by bad RAM/CPU/bugs'''' it is useful, but for
a
physically bad disk I''ve always had to use dd_rescue or ''dd
bs=512
conv=noerror,sync'' onto a good disk before pulling out the fsck.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/c2d82a6b/attachment-0010.bin>

D. Eckert

2009-Feb-10 20:31 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

(...)
You don''t move a pool with ''zfs umount'', that only
unmounts a single zfs
filesystem within a pool, but the pool is still active.. ''zpool
export''
releases the pool from the OS, then ''zpool import'' on the
other machine.
(...)

with all respect: I never read such a non logic ridiculous .

I have a single zpool set up over the entire available disk space on an external
USB drive without any other filesystems inside this particular pool.

so how on earth should I be sure, that the pool is still a live pool inside the
operating system if the output of ''mount'' cmd tells me, the
pool is no longer attached to the root FS????

this doesn''t make sense at all and it is a vulnerability of ZFS.

so if the output of the mount cmd tells you the FS / ZPOOL is not mounted I
can''t face any reason why the filesystem should be still up and
running, because I just unmounted the only one available ZPOOL.

And by the way: After performing: ''zpool umount usbhdd1'' I can
NOT access any single file inside /usbhdd1.

What else should be released from the OS FS than a single zpool containing no
other sub Filesystems?

Why? The answer is quite simple: The pool is unmounted and no longer hooked up
to the system''s filesystem. so what should me prevent from unplugging
the usb wire?

Regards,
DE
-- 
This message posted from opensolaris.org

D. Eckert

2009-Feb-10 20:37 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

(...)
Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0
spanning removable drives, you probably wouldn''t have been so lucky.
(...)

we are not talking about a RAID 5 array or an LVM. We are talking about a single
FS setup as a zpool over the entire available disk space on an external USB HDD.

I decided to do so due to the read/write speed performance of zfs comparing to
UFS/ReiserFS.

Regards,

DE.
-- 
This message posted from opensolaris.org

Nicolas Williams

2009-Feb-10 20:38 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Tue, Feb 10, 2009 at 12:31:05PM -0800, D. Eckert
wrote:> (...)
> You don''t move a pool with ''zfs umount'', that
only unmounts a single zfs
> filesystem within a pool, but the pool is still active.. ''zpool
export''
> releases the pool from the OS, then ''zpool import'' on the
other machine.
> (...)
> 
> with all respect: I never read such a non logic ridiculous .
It''s not "logic" -- it''s what ZFS does.  It lets you
have N filesystems
in one pool.  The price you pay is that unmounting one such filesystem
is insufficient to quiesce the pool in which that filesystem lives: you
must export the pool in order to quiesce it.

Perhaps what you want to argue is that unmounting the root filesystem of
a pool should cause the pool to be exported.

Nico
--

Ian Collins

2009-Feb-10 20:49 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

D. Eckert wrote:> (...)
> Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0
> spanning removable drives, you probably wouldn''t have been so
lucky.
> (...)
>
> we are not talking about a RAID 5 array or an LVM. We are talking about a
single FS setup as a zpool over the entire available disk space on an external
USB HDD.
>
>   You are missing the point.  A ZFS filesystem is not the same as a UFS 
filesystem on a device, the extra layer of the pool makes it closer to a 
RAID volume.  You have to halt the pool before removing the device.

These posts do sound like someone who is blaming their parents after 
breaking a new toy before reading the instructions.

-- 
Ian.

Dave

2009-Feb-10 20:51 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

D. Eckert wrote:> (...)
> You don''t move a pool with ''zfs umount'', that
only unmounts a single zfs
> filesystem within a pool, but the pool is still active.. ''zpool
export''
> releases the pool from the OS, then ''zpool import'' on the
other machine.
> (...)
> 
> with all respect: I never read such a non logic ridiculous .
You are not listening and you are not learning. You do not seem to 
understand the fundamentals of ZFS.
> 
> I have a single zpool set up over the entire available disk space on an
external USB drive without any other filesystems inside this particular pool.
> 
> so how on earth should I be sure, that the pool is still a live pool inside
the operating system if the output of ''mount'' cmd tells me,
the pool is no longer attached to the root FS????
> 
> this doesn''t make sense at all and it is a vulnerability of ZFS.
''mount'' is not designed to know anything about the storage
*pools*. Yes,
you unmounted the filesystem and mount shows it is not mounted. This 
does not mean the zpool is not still imported and active.
> 
> so if the output of the mount cmd tells you the FS / ZPOOL is not mounted I
can''t face any reason why the filesystem should be still up and
running, because I just unmounted the only one available ZPOOL.
No, you did not unmount the zpool.
> And by the way: After performing: ''zpool umount usbhdd1''
I can NOT access any single file inside /usbhdd1.
There is no ''zpool unmount'' command.
> 
> What else should be released from the OS FS than a single zpool containing
no other sub Filesystems?
Again, you have not ''released'' the zpool.
> 
> Why? The answer is quite simple: The pool is unmounted and no longer hooked
up to the system''s filesystem. so what should me prevent from
unplugging the usb wire?
> 
Again, you are not understanding the fundamentals of ZFS. You may have 
unmounted the *filesystem*, but not the zpool. You yanked a disk 
containing a live, imported zpool.

Since the advice and information offered to you in this thread has been 
completely disregarded, the only thing left to say is: RTFM.

Toby Thain

2009-Feb-10 20:54 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 10-Feb-09, at 1:05 PM, Peter Schuller wrote:
>> YES! I recently discovered that VirtualBox apparently defaults to
>> ignoring flushes, which would, if true, introduce a failure mode
>> generally absent from real hardware (and eventually resulting in
>> consistency problems quite unexpected to the user who carefully
>> configured her journaled filesystem or transactional RDBMS!)
>
> I recommend everyone to be extremely hesitant to assume that any
> particular storage setup actually honors write barriers and cache
> flushes. ...
+1.
>
> You need the underlying device to do the right thing, the driver to do
> the right thing, the operating system in general to do the right thing
> (which includes the file system, block device layer if any etc - for
> example, if use md on Linux with RAID5/6 you''re toast).
Absolutely.
>
> So again I cannot stress enough - do not assume things behave in a
> non-broken fashion with respect to write barriers and flushes.
That''s why I believe there is no substitute for pull-plug tests, and  
I would perform quite a few on a loaded system before being confident  
about it.

The last time I did that in anger was against a Sun X2200 + LVM  
mirror + Ubuntu + reiser3fs + MySQL InnoDB, and it performed  
flawlessly (although I agree there may be a weak link in LVM; not my  
choice. I''d have chosen Solaris+ZFS).

> I can''t
> speak to expensive integrated hardware solutions; I HOPE, though at
> this point my level of paranoid does not allow me to assume, that if
> you buy boxed systems from companies like Sun/HP/etc you get decent
> stuff. But I can definitely say that paying non-trivial amounts of
> money for hardware is not a guarantee that you won''t get
completely
> broken behavior.
+1.

--Toby
> ...
>
> -- 
> / Peter Schuller
>
> PGP userID: 0xE9758B7D or ''Peter Schuller  
> <peter.schuller at infidyne.com>''
> Key retrieval: Send an E-Mail to getpgpkey at scode.org
> E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org
>

Mario Goebbels

2009-Feb-10 20:57 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> The good news is that ZFS is getting popular enough on consumer-grade
> hardware.  The bad news is that said hardware has a different set of
> failure modes, so it takes a bit of work to become resilient to them.
> This is pretty high on my short list.
One thing I''d like to see is an _easy_ option to fall back onto older
uberblocks when the zpool went belly up for a silly reason. Something
that doesn''t involve esoteric parameters supplied to zdb.

-mg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 225 bytes
Desc: OpenPGP digital signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/ca942bc6/attachment-0007.bin>

Charles Binford

2009-Feb-10 21:13 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

DE - could you please post the output of your ''zpool umount
usbhdd1''
command?  I believe the output will prove useful to the point being
discussed below.

Charles

D. Eckert wrote:> (...)
> You don''t move a pool with ''zfs umount'', that
only unmounts a single zfs
> filesystem within a pool, but the pool is still active.. ''zpool
export''
> releases the pool from the OS, then ''zpool import'' on the
other machine.
> (...)
>
> with all respect: I never read such a non logic ridiculous .
>
> I have a single zpool set up over the entire available disk space on an
external USB drive without any other filesystems inside this particular pool.
>
> so how on earth should I be sure, that the pool is still a live pool inside
the operating system if the output of ''mount'' cmd tells me,
the pool is no longer attached to the root FS????
>
> this doesn''t make sense at all and it is a vulnerability of ZFS.
>
> so if the output of the mount cmd tells you the FS / ZPOOL is not mounted I
can''t face any reason why the filesystem should be still up and
running, because I just unmounted the only one available ZPOOL.
>
> And by the way: After performing: ''zpool umount usbhdd1''
I can NOT access any single file inside /usbhdd1.
>
> What else should be released from the OS FS than a single zpool containing
no other sub Filesystems?
>
> Why? The answer is quite simple: The pool is unmounted and no longer hooked
up to the system''s filesystem. so what should me prevent from
unplugging the usb wire?
>
> Regards,
> DE
>

D. Eckert

2009-Feb-10 21:14 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

I think you are not reading carefully enough, and I 
can trace from your reply a typically American 
arrogant behavior.

WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE
a mistake. It is just the stupid user who did not read the
fucking manual carefully enough.

????

Hello? Did you already recognized the sound of the shot??

No, you didn''t. If you would, than you''d know, that we are
not talking about HOW TO PREVENT SUCH EVENTS IN FUTURE but
of recovering the data.

I learned my lesson well, and in future this won''t happen
again, because we will no longer use zfs, but we have a legal
interest, to get back our data we stored in trust on a non reap
Filesystem developed and introduced by Sun.

And that Sun has a big problem regarding version numbers and
supported options is not a secret.

e. g.: On Solaris 10 generic 10-2008, latest updates, running
zfs Version 10 the ''t'' option in zdb is missing.

But on SNV 107, same zfs Version 10 ''t'' option of zdb is
available.

AND: it is not acceptable, that having on 2 systems the same
zfs version running, that the output of zdb -u <pool> differs.

Even if a UFS/ReiserFS is corrupted, you have chances to access
even a part of the date.

On ZFS you can''t. You are lost inside the castle someone has the
key just thrown away. And the key just seems to held by the developers
of Sun.

If you have any idea of IT Security, you should know well the expression
and meaning of "The key of the kingdom".

And as more postings we have to read in the sound of yours as more we are
thinking to raise a court trail against Sun just to stop that
american arrogance and to withhold technologies and methods to recover
a filesystem.

However, just tell me how to get the data back from the hard drive zfs
just messed up with, and you are the king, and we are happy, and this
issue us closed.

I hope I''ve made myself very clear.

Regards from Germany.
DE.
-- 
This message posted from opensolaris.org

Peter Schuller

2009-Feb-10 21:26 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>     ps> This is a recommendation I would give even when you purchase
>     ps> non-cheap battery backed hardware RAID controllers (I
won''t
>     ps> mention any names or details to avoid bashing as I''m
sure it''s
>     ps> not specific to the particular vendor I had problems with most
>     ps> recently).
> 
> This again?  If you''re sure the device is broken, then I think
others
> would like to know it, even if all devices are broken.  
The problem is that I even had help from the vendor in question, and
it was not for me personally but for a company, and I don''t want to
use information obtained that way to do any public bashing.

But I have no particular indication that there is any problem with the
vendor in general; it was a combination of choices made by Linux
kernel developers and the behavior of the RAID controller. My
interpretation was that no one was there looking at the big picture,
and the end result was that if you followed the instructions
specifically given by the vendor, you would have a setup whereby you
would loose correctness whenever the BBU was
overheated/broken/disabled.

The alternative was to get completely piss-poor performance by not
being able to take advantage of the battery backed nature of the cache
at all (which defeats most of the purpose of having the controller, if
you use it in any kind of transactional database environment or
similar).
> but, fine.  Anyway, how did you determine the device was broken? 
By performing timing tests as mentioned in the other post that you
answered separately, and after detecting the problem confirming the
status with respect to caching at the different levels as claimed by
the administrative tool for the controller.

While timing tests cannot conclusively prove correct behavior, it can
definitely proove incorrect behavior in cases where your timings are
simply theoretically impossible given the physical nature of the
underlying drives.
> At
> least you can tell us that much without fear of retaliation (whether
> baseless or founded), and maybe others can use the same test to
> independently discover what you did which would be both fair and safe
> for you.
The test was trivial; in my case a ~10 line Python script or something
along those lines. Perhaps I should just go ahead and release
something which non-programmers can easily run and draw conclusions
from.
> This is the real problem as I see it---a bunch of FUD, without any
> actual resolution beyond ``it''s working, I _think_, and in any
case
> the random beatings have stopped so D''OH-NT TOUCH *ANY*THING! 
THAR BE
> DEMONZ IN THE BOWELS O DIS DISK SHELF!''''
I''d love to go on a public rant, because I think the whole situation
was a perfect example of a case where a single competent person who
actually cares about correctness could have pinpointed this problem
trivially. But instead you have different camps doing their own stuff
and not considering the big picture.
> If anyone asks questions, they get no actual information, but a huge
> amount of blame heaped on the sysadmin.  Your post is a great example
> of the typical way this problem is handled because it does both: deny
> information and blame the sysadmin.  Though I''m really picking on
you
> way too much here.  Hopefully everyone''s starting to agree,
though, we
> do need a real way out of this mess!
I''m not quite sure what you''re referring to here. I''m
not blaming any
sysadmin. I was trying to point out *TO* sysadmins, to help them, that
I recommend being paranoid about correctness.

If you mean the original poster in the thread having issues, I am not
blaming him *at all* in the post you responded to. It was strictly
meant as a comment in response to the poster who noted that he
discovered, to his surprise, the problems with VirtualBox. I wanted to
make the point that while I completely understand his surprise, I have
come to expect that these things are broken by default (regardless of
whether you''re using virtualbox or not, or vendor X or Y etc), and
that care should be taken if you do want to have correctness when it
comes to write barriers and/or honoring fsync().

However, that said, as I stated in another post I wouldn''t be
surprised if it turns out the USB device was ignoring sync
commands. But I have no idea what the case was for the original
poster, nor have I even followed the thread in detail enough to know
if that would even be a possible explanation for his problems.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/21e73ccf/attachment-0007.bin>

Marcelo H Majczak

2009-Feb-10 21:29 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

I''ll make a meta comment on the thread itself, not on the ZFS issue.

There is more bashing and broad accusations than it would normally happen on a
"professional usage" situation. Maybe a board admin can run a script
on the ip addresses logged and find a more subtle meaning... I don''t
know, I''m just a bit skeptical by nature.
-- 
This message posted from opensolaris.org

D. Eckert

2009-Feb-10 21:39 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

if you are interested in my IP Address: no problem:

83.236.164.80

it just exactly approves my assumption, that''s best and easier for
someone - if he''s in the right position - to adhere a big pavement on
someone''s mouth to avoid hearing a legal critique instead of discussing
out the problem to find a proper solution.

My honest congratulations!
-- 
This message posted from opensolaris.org

Miles Nordin

2009-Feb-10 21:45 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "de" == D Eckert <contact at desystems.cc>
writes:
de> from your reply a typically American arrogant behavior.

de> WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE a
de> mistake.

Maybe I should speak up since I defended you at the start. To my
view:

REASONABLE:

* expect that ZFS lose almost nothing when yanking the power cord,
or when uncleanly dismounting.

* expect that ``always consistent on disk'''' mean something
in
practice, even given the real hardware and the non-ZFS parts of
the storage stack which exist right now.

* where the first two are impossible have both a real answer as to
why, and a workable way forward, rather than obstructionist FUD.
Especially when cord-yanking/unclean-dismount causes ZFS to lose
more than other filesystems. not point to dragons and FUD and
blame whatever is difficult to exhonerate, especially hindsight
surrounding inexpensive devices, and the sysadmin himself.

UNREASONABLE:

* say ``any filesystem will lose arbitrary amounts of data when
uncleanly dismounted because filesystems do not `like'' that. you
were `asking'' for it.'''' This is flatly untrue
of every
non-Microsoft filesystem, even very old ones. Also it directly
contradicts the most central claims made by the ZFS kool-aid
pushers.

* say that the central claims don''t apply to single-vdev pools.

* belief in ''copies=2''

* be outraged that ZFS maintenance commands differ from other
filesystems. Refuse to listen when the reasonable,
easily-described, and documented differences in this mount/umount
import/export interface are repeatedly explained to you.

You seem to think the mere fact that new commands exist at all
means they are badly-designed, and this is way too conservative.
This is lazy and boring and unconvincing, especially to me who
feels maybe too much has already been sacrificed to the mantra of
keeping the user interface simple.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090210/09e9d055/attachment-0007.bin>

Roman V. Shaposhnik

2009-Feb-10 21:48 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, 2009-02-11 at 09:49 +1300, Ian Collins wrote:> These posts do sound like someone who is blaming their parents after 
> breaking a new toy before reading the instructions.
It looks like there''s a serious denial of the fact that "bad
things
do happen to even the best of people" on this thread.

Thanks,
Roman.

Richard Elling

2009-Feb-10 21:53 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Mario Goebbels wrote:>> The good news is that ZFS is getting popular enough on consumer-grade
>> hardware.  The bad news is that said hardware has a different set of
>> failure modes, so it takes a bit of work to become resilient to them.
>> This is pretty high on my short list.
>>     
>
> One thing I''d like to see is an _easy_ option to fall back onto
older
> uberblocks when the zpool went belly up for a silly reason. Something
> that doesn''t involve esoteric parameters supplied to zdb.
>   
This is CR 6667683
http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
 -- richard

dick hoogendijk

2009-Feb-10 21:59 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Tue, 10 Feb 2009 13:14:57 PST
"D. Eckert" <contact at desystems.cc> wrote:
> Hello? Did you already recognized the sound of the shot??
> I learned my lesson well, and in future this won''t happen
> again, because we will no longer use zfs, but we have a legal
> interest, to get back our data we stored in trust on a non reap
> Filesystem developed and introduced by Sun.
> 
> And that Sun has a big problem regarding version numbers and
> supported options is not a secret.
It''s time we learn ours too. I can understand that you want your data
back. You can''t. You made a big mistake. Soi. Also, you?e messages are
full of anti-SUN, anti-ZFS, anti-ALL (but you).

I''m convinced you won''t learn. You just did what you intended
to do.
Kick some (sun)ash. If you don''t like SUN, ZFS then don''t use
it. If
you -DO- learn to use it right. You sound too much like a troll at
times. If you are, just say so. If you''re not, then read the advice
you''ve been given more carefully. Otherwise, you''re just
wasting
peoples time and energy.

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv107 ++
+ All that''s really worth doing is what we do for others (Lewis Carrol)

Ian Collins

2009-Feb-10 22:01 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Roman V. Shaposhnik wrote:> On Wed, 2009-02-11 at 09:49 +1300, Ian Collins wrote:
>   
>> These posts do sound like someone who is blaming their parents after 
>> breaking a new toy before reading the instructions.
>>     
>
> It looks like there''s a serious denial of the fact that "bad
things
> do happen to even the best of people" on this thread.
>
>   Sure. 

I think most here would agree that some form of recovery tool for ZFS is 
long overdue.  I''ve rebuilt a UFS filesystem after it was damaged by an
exploding power supply and it was a strangely rewarding experience. 
I''m
not sure how ZFS would survive this type of failure and I doubt I''d be 
able to recover a broken pool without help.

It''s also clear that the OP has failed to grasp the principles of ZFS 
and he appears reluctant to acknowledge this. 

USB removable devices are not the most reliable storage media.  I have 
been using USB sticks as a high speed data link between home and office 
for over a year now and I''ve never had any corruption, but I have had 
two sticks fail.  If I''m travelling, I always back up to a two stick
ZFS
mirror.

-- 
Ian.

David Champion

2009-Feb-10 22:29 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

DE:

I think that a big part of the reason you''re getting the responses you
do is not arrogance from Sun or us kool-aid drinkers, but your own tone
and attitude.  You didn''t ask for help in your initial message at all.
The entire post was a diatribe against Sun and ZFS which was based on
your experience of using ZFS in a way that the ZFS documentation tells
you not to use it.  You have some legitimate concerns, but you began by
insulting a lot of people''s work instead of by asking questions.  Since
then you''ve asked for help, but your tone has only gotten angrier.  In
your most recent post you even threatened legal action against Sun.

Where I work, as soon as someone makes a legal threat, we move a support
case from technical staff directly to our lawyers.  If Sun is like us,
that means you can expect no more free, voluntary support from Sun''s
engineering team; it will be mediated by counsel, if at all.  This is
not good for you.

I apologize if I seem arrogant, but I think you need to reconsider your
approach.  All in all I think the people in this forum who work at Sun
have treated you very well.

-- 
 -D.    dgc at uchicago.edu    NSIT    University of Chicago

Frank Cusack

2009-Feb-10 22:56 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 10, 2009 1:14:57 PM -0800 "D. Eckert" <contact at
desystems.cc>
wrote:>
> I hope I''ve made myself very clear.
>
Very.  Rarely has the adage "what one says reveals more about the
speaker than the subject" been more evident.
> And as more postings we have to read in the sound of yours as more we are
> thinking to raise a court trail against Sun just to stop that
> american arrogance and to withhold technologies and methods to recover
> a filesystem.
Comments like this are especially laughable (and revealing).

In spite of your arrogant tone (perhaps amplified by translation,
but still clearly present), many here have tried to be helpful.
However you have already made your decision and aren''t listening.

The validity (or not) of your problem is overshadowed by the presentation.
Are you sure D Eckert isn''t a pseudonym for Al Viro?

 From your original post:
> after working for 1 month with ZFS on 2 external USB drives I have
> experienced, that the all new zfs filesystem is the most unreliable FS I
> have ever seen.
To the contrary, after working with ZFS for a few years (since it has
been publicly available), I have found that it is the most reliable FS
ever known.  Well, who am I anyway.  Just my 0.02.

Of course it has some warts -- all complex software does -- and you
have revealed a big one.  But you would choose to throw the baby
out with the bathwater.

The problem you have experienced is mitigated in the real world by
the fact that data you actually care about requires replication.

-frank

Uwe Dippel

2009-Feb-11 05:43 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

We have seen some unfortunate miscommunication here, and misinterpretation. This
extends into differences of culture. One of the vocal person in here is surely
not ''Anti-xyz''; rather I sense his intense desire to further
the progress by pointing his finger to some potential wounds.
May I repeat my request, to run a hardware diagnosis on the drives concerned
(being aware of the ambiguities involved). If the hardware passes with flying
colours, we need to look deeper into the underlying matter. Many in here
administrate professional systems, with SCSI, RAID and whatnot. If ZFS does a
great service to them, we are happy. On the other hand, though, and, again,
management decisions come into perspective, OpenSolaris tries to appeal to the
mass market and enter the end-user scene. Then remarks like one had to RTFM the
man-pages of zfs, zpool, up and down, is out of place. USB disk drives are
common, ubiquitous even. To discourage their use is out of question. To add
another layer to ''mount'' likewise. Now we are in heavy seas:
ZFS might lose all data irrecoverably? Not fine, but what''s the
alternative? UFS is sparsely supported elsewhere (and probably considered
''legacy'' by SUN), extn is supported read-only. The only and
last other file system is vfat/pcfs. Alas, when I wrote in finding it failing on
a larger drive, I was told (search the archives), that is was a
''hack'' built into the kernel only. Now what? vfat is the
crappiest of all. UFS obsolete and not widely available, ZFS is currently
discussed to lose all data irreversibly on USB-drives.
I repeat that I have never lost a single drive - despite of usually using cheapo
crap outside of my production boxes - in the last 10 years, aside of complete
hardware failure. All my other drives, ext2, ext3, ffs, have always allowed to
salvage some stuff and recover the larger part of data, despite of some of my
users yanking out drives in the most inconvenient moments. Back to where I
started from, with some questions:
1. Can the relevant people confirm that drives might turn dead when leaving a
pool at unfortunate moments? Despite of complete physical integrity?
[I''d really appreciate an answer here, because this is what I am
starting to implement here: ZFS on USB drives.]
2. Are those drives in unrecoverable state passing their integrity/diagnosis
tests (r/w)?
3. If what has been mentioned, that a pool is an entity like RAID in between and
hurting the pool might as well destruct data, if this is the case, can this
destruction of a pool not also happen within the confines of a server, without
any physical yanking of the drive, by a dying controller?

Thanks,

Uwe
-- 
This message posted from opensolaris.org

Fredrich Maney

2009-Feb-11 05:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Tue, Feb 10, 2009 at 4:14 PM, D. Eckert <contact at desystems.cc>
wrote:> I think you are not reading carefully enough, and I
> can trace from your reply a typically American
> arrogant behavior.
>
> WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE
> a mistake. It is just the stupid user who did not read the
> fucking manual carefully enough.
>
> ????
Ah... an illiterate AND idiotic bigot. Have you even read the manual
or *ANY* of the replies to your posts? *YOU* caused the situation that
resulted in your data being corrupted. Not Sun, not OpenSolaris, not
ZFS and not anyone on this list. Yet you feel the need to blame ZFS
and insult the people that have been trying to help you understand
what happened and why you shouldn''t do what you did.

ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like
SVM or VxVM. It is both a filesystem and a logical volume manager. As
such, like all LVM solutions, there are two steps that you must
perform to safely remove a disk: unmount the filesystem and quiesce
the volume. That means you *MUST*, in the case of ZFS, issue ''umount
filesystem'' *AND* ''zpool export'' before you yank the
USB stick out of
the machine.

Effectively what you did was create a one-sided mirrored volume with
one filesystem on it, then put your very important (but not important
enough to bother mirroring or backing up) data on it. Then you
unmounted the filesystem and ripped the active volume out of the
machine. You got away with it a couple of times because just how good
of a job the ZFS developers did at idiot proofing it, but when it
finally got to the point where you lost your data, you came here to
bitch and point fingers at everyone but the responsible party (hint,
it''s you). When your ignorance (and fault) was pointed out to you, you
then resorted to personal attacks and slurs. Nice. Very professional.
Welcome to the bit-bucket.

fpsm

Fredrich Maney

2009-Feb-11 05:56 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Good. It looks like this thread can finally die. I received the
following in response to my message below:




This is an automatically generated Delivery Status Notification

Delivery to the following recipient failed permanently:

    contact at desystems.cc

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the
recipient domain. We recommend contacting the other email provider for
further information about the cause of this error. The error that the
other server returned was: 553 553 5.3.0 <contact at desystems.cc>...
Your spam was rejected! (state 14).




On Wed, Feb 11, 2009 at 12:44 AM, Fredrich Maney
<fredrichmaney at gmail.com> wrote:> On Tue, Feb 10, 2009 at 4:14 PM, D. Eckert <contact at desystems.cc>
wrote:
>> I think you are not reading carefully enough, and I
>> can trace from your reply a typically American
>> arrogant behavior.
>>
>> WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE
>> a mistake. It is just the stupid user who did not read the
>> fucking manual carefully enough.
>>
>> ????
>
> Ah... an illiterate AND idiotic bigot. Have you even read the manual
> or *ANY* of the replies to your posts? *YOU* caused the situation that
> resulted in your data being corrupted. Not Sun, not OpenSolaris, not
> ZFS and not anyone on this list. Yet you feel the need to blame ZFS
> and insult the people that have been trying to help you understand
> what happened and why you shouldn''t do what you did.
>
> ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like
> SVM or VxVM. It is both a filesystem and a logical volume manager. As
> such, like all LVM solutions, there are two steps that you must
> perform to safely remove a disk: unmount the filesystem and quiesce
> the volume. That means you *MUST*, in the case of ZFS, issue
''umount
> filesystem'' *AND* ''zpool export'' before you yank
the USB stick out of
> the machine.
>
> Effectively what you did was create a one-sided mirrored volume with
> one filesystem on it, then put your very important (but not important
> enough to bother mirroring or backing up) data on it. Then you
> unmounted the filesystem and ripped the active volume out of the
> machine. You got away with it a couple of times because just how good
> of a job the ZFS developers did at idiot proofing it, but when it
> finally got to the point where you lost your data, you came here to
> bitch and point fingers at everyone but the responsible party (hint,
> it''s you). When your ignorance (and fault) was pointed out to you,
you
> then resorted to personal attacks and slurs. Nice. Very professional.
> Welcome to the bit-bucket.
>
> fpsm
>

Jan.Dreyer at bertelsmann.de

2009-Feb-11 07:35 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

In other words:

Dont feed the troll.

Greets
Jan Dreyer

zfs-discuss-bounces at opensolaris.org <> wrote :
> Good. It looks like this thread can finally die. I received the
> following in response to my message below:
> 
> 
> 
> 
> This is an automatically generated Delivery Status Notification
> 
> Delivery to the following recipient failed permanently:
> 
>     contact at desystems.cc
> 
> Technical details of permanent failure:
> Google tried to deliver your message, but it was rejected by the
> recipient domain. We recommend contacting the other email provider for
> further information about the cause of this error. The error that the
> other server returned was: 553 553 5.3.0 <contact at desystems.cc>...
> Your spam was rejected! (state 14).
> 
> 
> 
> 
> On Wed, Feb 11, 2009 at 12:44 AM, Fredrich Maney
> <fredrichmaney at gmail.com> wrote:
>> On Tue, Feb 10, 2009 at 4:14 PM, D. Eckert
> <contact at desystems.cc> wrote:
>>> I think you are not reading carefully enough, and I
>>> can trace from your reply a typically American
>>> arrogant behavior.
>>> 
>>> WE, THE PROUDEST AND infallibles on earth DID NEVER MAKE
>>> a mistake. It is just the stupid user who did not read the
>>> fucking manual carefully enough.
>>> 
>>> ????
>> 
>> Ah... an illiterate AND idiotic bigot. Have you even read the manual
>> or *ANY* of the replies to your posts? *YOU* caused the situation
>> that resulted in your data being corrupted. Not Sun, not
>> OpenSolaris, not ZFS and not anyone on this list. Yet you feel the
>> need to blame ZFS and insult the people that have been trying to
>> help you understand what happened and why you shouldn''t do
what you
>> did. 
>> 
>> ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like
>> SVM or VxVM. It is both a filesystem and a logical volume manager. As
>> such, like all LVM solutions, there are two steps that you must
>> perform to safely remove a disk: unmount the filesystem and quiesce
>> the volume. That means you *MUST*, in the case of ZFS, issue
''umount
>> filesystem'' *AND* ''zpool export'' before you
yank the USB stick out
>> of the machine. 
>> 
>> Effectively what you did was create a one-sided mirrored volume with
>> one filesystem on it, then put your very important (but not important
>> enough to bother mirroring or backing up) data on it. Then you
>> unmounted the filesystem and ripped the active volume out of the
>> machine. You got away with it a couple of times because just how good
>> of a job the ZFS developers did at idiot proofing it, but when it
>> finally got to the point where you lost your data, you came here to
>> bitch and point fingers at everyone but the responsible party (hint,
>> it''s you). When your ignorance (and fault) was pointed out to
you,
>> you then resorted to personal attacks and slurs. Nice. Very
>> professional. Welcome to the bit-bucket. 
>> 
>> fpsm
>> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Anton B. Rang

2009-Feb-11 07:36 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> Fsck can only repair known faults; known
> discrepancies in the meta data.
> Since ZFS doesn''t have such known discrepancies,
> there''s nothing to repair.
I''m rather tired of hearing this mantra.

If ZFS detects an error in part of its data structures, then there is clearly
something to repair.

The choice ZFS presently makes is effectively to prune the entire pool hierarchy
from the point of error downward. If the error found is near the root of the
pool, this renders all files inaccessible.

This is rather as if fsck, when finding a corrupted UFS directory, removed all
of the files within it instead of either (a) trying to repair the directory, or
(b) placing them in lost+found; or, when it found a doubly-allocated block,
chose to reformat the filesystem.

ZFS could do *much* better here both in on-line and off-line operation. 
It''s misdirection to say that, because ZFS is intended to keep its pool
always consistent, there are no inconsistencies possible, and no way to repair
them.  Almost every file system has adopted journaling for at least its
metadata, which is a time-honored way to keep consistency; but almost every file
system has a repair utility for when the journal is damaged or the file system
is damaged in some other way. I haven''t heard of a NetApp box (with its
tree-structured WAFL system) suddenly making all of its data permanently
inaccessible because of a disk error or software bug, but I have heard of them
requiring file system repair on rare occasions.

I''ve described before a number of checks which ZFS could perform, and
the repair operations possible.  I''ll add a couple more.  ZFS could
keep track of where its internal nodes are stored, perhaps using a bitmap
journaled in a traditional way or perhaps using the ZIL; this would make
recovery of individual files much easier in the event of total file system loss.
ZFS could segregate data and metadata sufficiently to make it easy to identify
its metadata, or use self-checksums in additional areas, which would allow much
of a filesystem to be reconstructed even if top-level metadata were corrupted.

Every file system needs a repair utility, even if the only expected use case is
for the elephant tripping over the fibre cables.
-- 
This message posted from opensolaris.org

Greg Palmer

2009-Feb-11 08:08 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Uwe Dippel wrote:> We have seen some unfortunate miscommunication here, and misinterpretation.
This extends into differences of culture. One of the vocal person in here is
surely not ''Anti-xyz''; rather I sense his intense desire to
further the progress by pointing his finger to some potential wounds.I really don''t have a dog in this fight but I think what we''ve
seen here
is the behavior of a person who is too lazy to read the manual, unable
to understand the technology they are working with, and unwilling to
face the consequences of their own behavior. As the Solaris user base
increases though, the number of people like this will increase. The
general population do not read the manuals nor do they care how the
magic box works, they just want it to work. This is entirely appropriate
for a business user who is using the computer as a means to an end. They
have their area of expertise, which isn''t computers. Of course, it
really isn''t appropriate for a system administrator so I can''t
generate
a lot of sympathy for DE personally, especially after the manner in
which he has behaved in this thread.

Turning Solaris into something that can be used with the same amount of
thought as a toaster is one of the challenges facing the Sun and the
community in the future. Designing guards to prevent the ignorant from
harming themselves is a challenge (see quote below).

"There are 2 things that are infinite in this world, the universe and
human stupidity. I''m not sure about the first one" - Albert
Einstein

Regards,
Greg

Jeff Bonwick

2009-Feb-11 09:27 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> I''m rather tired of hearing this mantra.
> [...]
> Every file system needs a repair utility
Hey, wait a minute -- that''s a mantra too!

I don''t think there''s actually any substantive disagreement
here -- stating
that one doesn''t need a separate program called /usr/sbin/fsck is not
the
same as saying that filesystems don''t need error detection and
recovery.
There''s quite a bit of that in the current code, and more in the works.
Like performance, it is never really "done" -- we can always do
better.
> I''ve described before a number of checks which ZFS could perform
[...]
Well, ZFS is open source.  I would love to see your passion for this topic
directed at the source code.  Seriously.

Jeff

Gino

2009-Feb-11 09:50 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> Mario Goebbels wrote:
> >> The good news is that ZFS is getting popular
> enough on consumer-grade
> >> hardware.  The bad news is that said hardware has
> a different set of
> >> failure modes, so it takes a bit of work to become
> resilient to them.
> >> This is pretty high on my short list.
> >>     
> >
> > One thing I''d like to see is an _easy_ option to
> fall back onto older
> > uberblocks when the zpool went belly up for a silly
> reason. Something
> > that doesn''t involve esoteric parameters supplied
> to zdb.
> >   
> 
> This is CR 6667683
> http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
I think that would solve 99% of ZFS corruption problems!
Is there any EDT for this patch?

tnx
gino
-- 
This message posted from opensolaris.org

Gino

2009-Feb-11 10:07 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> >>>>> "g" == Gino  <dandr.ch at gmail.com>
writes:
> 
> g> we lost many zpools with multimillion$ EMC,
> Netapp and
> g> HDS arrays just simulating fc switches power
>  fails.  
> g> The problem is that ZFS can''t properly
>  recover itself.
> I don''t like what you call ``the problem''''---I
think
> it assumes too
> much.  You mistake *A* fix for *THE* problem, before
> we can even agree
> for sure on, what is the problem.  The problem may be
> in the solaris
> FC initiator, in a corner case of the FC protocol
> itself, or in ZFS''s
> exception handling when a ``SYNCHRONIZE CACHE''''
> command returns
> failure.
> 
> It''s likely other filesystems are affected by ``the
> problem'''' as I
> define it, just much less so.  If that''s the case,
> it''d be much better
> IMHO to fix the real problem once and for all, and
> find it so that it
> stays fixed, than to make ZFS work around it by
> losing a tiny bit of
> data instead of the whole pool.  I don''t think ZFS
> should feel
> entitled to brag about protection from Silent
> Corruption, if it were
> at the same time willing to silently boot without a
> slog, or silently
> rollback to an earlier ueberblock, or if it acts like
> a cheap USB
> stick when an FC switch reboots (by quietly losing
> things that were
> written long ago).  
I agree but I''d like to point out that the MAIN problem with ZFS is
that because of a corruption you-ll loose ALL your data and there is no way to
recover it.
Consider an example where you have 100TB of data and a fc switch fails or other
hw problem happens during I/O on a single file.
With UFS you''ll probably get corruption on that single file. With ZFS
you''ll loose all your data.
I totally agree that ZFS is theoretically much much much much much better than
UFS but in real world application having a risk to loose access to an entire
pool is not acceptable.

gino
-- 
This message posted from opensolaris.org

Jeff Bonwick

2009-Feb-11 10:27 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> > This is CR 6667683
> > http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
> 
> I think that would solve 99% of ZFS corruption problems!
Based on the reports I''ve seen to date, I think you''re right.
> Is there any EDT for this patch?
Well, because of this thread, this has gone from "on my list" to
"I''m currently working on it."  And I''d like to take
moment to
thank everyone who''s weighed in, because it really does make a
difference in setting priorities.

As for a date, I would estimate "weeks, not months".

Jeff

Uwe Dippel

2009-Feb-11 10:47 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

[Still waiting for answers on my earlier questions]

So I take it that ZFS solves one problem perfectly well: Integrity of data
blocks. It uses CRC and atomic writes for this purpose, and as far as I could
follow this list, nobody has ever had any problems in this respect.
However, it also - at least to me - looks like that there is a chance that you
have a disk in your hands with 100% correct data blocks, but no way to retrieve
a single one; under the unfortunate circumstances that the semantics of these
blocks is lost. From what I can gather here, and correct me if I am wrong, the
problem is not so much on the individual file system to which these 100% correct
blocks belong, than on the level of the overall structure of those filesystems.
If this was the case, a copy/mirror like it is used in FAT32 might be one
solution, though maybe not the most elegant one. Could another approach be, to
provide each file system a (virtual) self-contained, basic, pool to which it
belongs, and from that it could be recovered? A pool that is over-ruled by the
existence of a consistent higher-level pool (the one that the user has created
and the user interacts with)?
I concede that these might be impossible one way or another, but conceptually at
least, a fall-back pool is thinkable. Nobody expects consistency of a file that
sees the drive yanked while the writing is going on. But an
''atomic'' update before and after could be useful; one that
propagates through to the upper level, so that the state of the pool is
consistent at any moment, with or without the changes of the underlying file
system.
-- 
This message posted from opensolaris.org

Gino

2009-Feb-11 11:23 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> > > This is CR 6667683
> > >
> http://bugs.opensolaris.org/view_bug.do?bug_id=6667683
> > 
> > I think that would solve 99% of ZFS corruption
> problems!
> 
> Based on the reports I''ve seen to date, I think
> you''re right.
> 
> > Is there any EDT for this patch?
> 
> Well, because of this thread, this has gone from "on
> my list" to
> "I''m currently working on it."  And I''d like to
take
> moment to
> thank everyone who''s weighed in, because it really
> does make a
> difference in setting priorities.
> 
> As for a date, I would estimate "weeks, not months".
Excellent news!
-- 
This message posted from opensolaris.org

Kyle McDonald

2009-Feb-11 13:10 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 2/10/2009 3:37 PM, D. Eckert wrote:> (...)
> Possibly so. But if you had that ufs/reiserfs on a LVM or on a RAID0
> spanning removable drives, you probably wouldn''t have been so
lucky.
> (...)
>
> we are not talking about a RAID 5 array or an LVM. We are talking about a
single FS setup as a zpool over the entire available disk space on an external
USB HDD.
>
>    Ok then the parallel on linux would still be something like running 
reiserfs on a single disk LVM (which I think redhat still installs with 
by default?)

And my real point is that with ZFS even though you only wany a single FS 
on a single disk, you can''t treat it like the LVM/RAID level of
software
isn''t there just because you only have one disk. It is still there, and
you need to understand it''s commands and how to use them when you want 
to diconnect the disk.> I decided to do so due to the read/write speed performance of zfs comparing
to UFS/ReiserFS.
>
>    That''s fine. If you have reasons to use a single disk that option is 
still available. Again that doesn''t mean you can treat it like a FS on
a
raw device.

    -Kyle
> Regards,
>
> DE.
>

Kyle McDonald

2009-Feb-11 13:15 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 2/10/2009 4:48 PM, Roman V. Shaposhnik wrote:> On Wed, 2009-02-11 at 09:49 +1300, Ian Collins wrote:
>    
>> These posts do sound like someone who is blaming their parents after
>> breaking a new toy before reading the instructions.
>>      
>
> It looks like there''s a serious denial of the fact that "bad
things
> do happen to even the best of people" on this thread.
>    No one is denying that that can happen.

However there are many things that were done here that increased the 
chance (or things that weren''t done that could have decreased the 
chance) of this happenning.

I''m not saying the OP should have known better. Everyone learns from 
mistakes. I''m just trying to explain to him both why what happenned 
might have happenned, and what he could have done that might have 
avoided it.

Is it still possible that something like this could have happenned? 
sure. Should there be a better way to handle it when it does? you bet!

   -Kyle
> Thanks,
> Roman.
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

dick hoogendijk

2009-Feb-11 14:42 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Tue, 10 Feb 2009 21:43:00 PST
Uwe Dippel <udippel at gmail.com> wrote:
> Back to where I started from, with some questions:
> 1. Can the relevant people confirm that drives might turn dead when
> leaving a pool at unfortunate moments? Despite of complete physical
> integrity?
I have not experienced this. I -DID- experience a dead UFS formatted
(usb) drive once when I unplugged it without unmounting it first.
(Shit can happen).. the filesystem was beyond repair. I had to reformat
the drive. Never complaiend though. It -was- my fault ;-)

With ZFS I mirror all my drives.

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv107 ++
+ All that''s really worth doing is what we do for others (Lewis Carrol)

David Dyer-Bennet

2009-Feb-11 15:08 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Tue, February 10, 2009 23:43, Uwe Dippel wrote:
> 1. Can the relevant people confirm that drives might turn dead when
> leaving a pool at unfortunate moments? Despite of complete physical
> integrity? [I''d really appreciate an answer here, because this is
what I
> am starting to implement here: ZFS on USB drives.]
> 2. Are those drives in unrecoverable state passing their
> integrity/diagnosis tests (r/w)?
> 3. If what has been mentioned, that a pool is an entity like RAID in
> between and hurting the pool might as well destruct data, if this is the
> case, can this destruction of a pool not also happen within the confines
> of a server, without any physical yanking of the drive, by a dying
> controller?
Seems like a power failure, controller failure, or processor failure could
all produce the equivalent of yanking a USB cable.  As could a cat
knocking an external drive off the desk :-).  All of those things are
real-world issues that we must contend with.  The two hardware failures
are entirely possible even in a top-end commercial machine-room
installation.  (For that matter, the power failure is, too; I''ve seen
places where the UPS came on fine when the power failed, and the generator
cut in fine before the UPS failed...and then the automatic fail-BACK
failed, and everything went dark when the generator ran out of fuel).

I confess to not being adequately reassured right now that my external USB
backup disks are reasonably secure.

This all-or-nothing behavior of ZFS pools is kinda scary.  Turns out
I''d
rather have 99% of my data than 0% -- who knew?  :-)  I''d much rather
have
100.00% than either of course, and I''m running ZFS with mirroring, and
doing regular backups, because of that.
-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Toby Thain

2009-Feb-11 15:47 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 11-Feb-09, at 10:08 AM, David Dyer-Bennet wrote:
>
> On Tue, February 10, 2009 23:43, Uwe Dippel wrote:
>
>> 1. Can the relevant people confirm that drives might turn dead when
>> leaving a pool at unfortunate moments? Despite of complete physical
>> integrity? [I''d really appreciate an answer here, because this
is
>> what I
>> am starting to implement here: ZFS on USB drives.]
>> 2. Are those drives in unrecoverable state passing their
>> integrity/diagnosis tests (r/w)?
>> 3. If what has been mentioned, that a pool is an entity like RAID in
>> between and hurting the pool might as well destruct data, if this  
>> is the
>> case, can this destruction of a pool not also happen within the  
>> confines
>> of a server, without any physical yanking of the drive, by a dying
>> controller?
>
> Seems like a power failure, controller failure, or processor  
> failure could
> all produce the equivalent of yanking a USB cable.  As could a cat
> knocking an external drive off the desk :-).  All of those things are
> real-world issues that we must contend with.
And journaled/transactional systems are designed to deal with that  
just fine.

The exception was clearly noted by Jeff.

> The two hardware failures
> are entirely possible even in a top-end commercial machine-room
> installation.  (For that matter, the power failure is, too; I''ve
seen
> places where the UPS came on fine when the power failed, and the  
> generator
> cut in fine before the UPS failed...and then the automatic fail-BACK
> failed, and everything went dark when the generator ran out of fuel).
Yes, this happens in *every* data centre eventually. Data centres are  
also subject to many of the usual human errors.

--Toby
>
> I confess to not being adequately reassured right now that my  
> external USB
> backup disks are reasonably secure.
>
> This all-or-nothing behavior of ZFS pools is kinda scary.  Turns  
> out I''d
> rather have 99% of my data than 0% -- who knew?  :-)  I''d much  
> rather have
> 100.00% than either of course, and I''m running ZFS with mirroring,
and
> doing regular backups, because of that.
> -- 
> David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
> Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
> Photos: http://dd-b.net/photography/gallery/
> Dragaera: http://dragaera.info
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Tim

2009-Feb-11 16:19 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Tue, Feb 10, 2009 at 11:44 PM, Fredrich Maney <fredrichmaney at
gmail.com>wrote:
>
> Ah... an illiterate AND idiotic bigot. Have you even read the manual
> or *ANY* of the replies to your posts? *YOU* caused the situation that
> resulted in your data being corrupted. Not Sun, not OpenSolaris, not
> ZFS and not anyone on this list. Yet you feel the need to blame ZFS
> and insult the people that have been trying to help you understand
> what happened and why you shouldn''t do what you did.
>

#1 English is clearly not his native tongue.  Calling someone idiotic and
illiterate when they''re doing as well as he is in a second language is
not
only inaccurate, it''s "idiotic".

>
> ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like
> SVM or VxVM. It is both a filesystem and a logical volume manager. As
> such, like all LVM solutions, there are two steps that you must
> perform to safely remove a disk: unmount the filesystem and quiesce
> the volume. That means you *MUST*, in the case of ZFS, issue
''umount
> filesystem'' *AND* ''zpool export'' before you yank
the USB stick out of
> the machine.
>
> Effectively what you did was create a one-sided mirrored volume with
> one filesystem on it, then put your very important (but not important
> enough to bother mirroring or backing up) data on it. Then you
> unmounted the filesystem and ripped the active volume out of the
> machine. You got away with it a couple of times because just how good
> of a job the ZFS developers did at idiot proofing it, but when it
> finally got to the point where you lost your data, you came here to
> bitch and point fingers at everyone but the responsible party (hint,
> it''s you). When your ignorance (and fault) was pointed out to you,
you
> then resorted to personal attacks and slurs. Nice. Very professional.
> Welcome to the bit-bucket.
>
All that and yet the fact remains: I''ve never "ejected" a USB
drive from OS
X or Windows, I simply pull it and go, and I''ve never once lost data,
or had
it become unrecoverable or even corrupted.

And yes, I do keep checksums of all the data sitting on them and
periodically check it.  So, for all of your ranting and raving, the fact
remains even a *crappy* filesystem like fat32 manages to handle a hot unplug
without any prior notice without going belly up.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090211/7ebe5031/attachment-0005.html>

Steven Sim

2009-Feb-11 16:33 UTC

head link

Re: ZFS: unreliable for professional usage?

Tim;

The proper procedure for ejecting a USB drive in Windows is to right
click the device icon and eject the appropriate listed device.

I''ve done this before without ejecting and lost data before.

My personal experience with ZFS is that it is very reliable FS. I''ve
not lost data on it yet even after several hardware upgrades, abrupt
failures and recently an unofficial unsanction expansion technique.

The folks at Sun who developed this earnestly believe in their product.
Sometimes, these belief can translate to an uneven reply.

For my own reasons, I too believe whole heartedly in ZFS. (I don''t work
in Sun nor do I own any share in Sun).

Perhaps we can all work together and find the proper solution here.

Logic dictates that ZFS can survive an abrupt failure far better than
traditional VM/FS combination. The end to end checking summing simply
do not exist in traditional methodologies.

Could you describe in detail the kind of IO access you were generating
prior to pulling out the USB?

Warmest Regards

Steven Sim

Tim wrote:

  On Tue, Feb 10, 2009 at 11:44 PM, Fredrich
Maney &lt;fredrichmaney@gmail.com&gt;
wrote:

Ah... an illiterate AND idiotic bigot. Have you even read the manual

or *ANY* of the replies to your posts? *YOU* caused the situation that

resulted in your data being corrupted. Not Sun, not OpenSolaris, not

ZFS and not anyone on this list. Yet you feel the need to blame ZFS

and insult the people that have been trying to help you understand

what happened and why you shouldn''t do what you did.

#1 English is clearly not his native tongue.  Calling someone idiotic
and illiterate when they''re doing as well as he is in a second language
is not only inaccurate, it''s "idiotic".

ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like

SVM or VxVM. It is both a filesystem and a logical volume manager. As

such, like all LVM solutions, there are two steps that you must

perform to safely remove a disk: unmount the filesystem and quiesce

the volume. That means you *MUST*, in the case of ZFS, issue ''umount

filesystem'' *AND* ''zpool export'' before you yank the
USB stick out of

the machine.

Effectively what you did was create a one-sided mirrored volume with

one filesystem on it, then put your very important (but not important

enough to bother mirroring or backing up) data on it. Then you

unmounted the filesystem and ripped the active volume out of the

machine. You got away with it a couple of times because just how good

of a job the ZFS developers did at idiot proofing it, but when it

finally got to the point where you lost your data, you came here to

bitch and point fingers at everyone but the responsible party (hint,

it''s you). When your ignorance (and fault) was pointed out to you, you

then resorted to personal attacks and slurs. Nice. Very professional.

Welcome to the bit-bucket.

All that and yet the fact remains: I''ve never "ejected" a USB
drive
from OS X or Windows, I simply pull it and go, and I''ve never once lost
data, or had it become unrecoverable or even corrupted.

And yes, I do keep checksums of all the data sitting on them and
periodically check it.  So, for all of your ranting and raving, the
fact remains even a *crappy* filesystem like fat32 manages to handle a
hot unplug without any prior notice without going belly up.

--Tim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Tim

2009-Feb-11 16:39 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, Feb 11, 2009 at 10:33 AM, Steven Sim <unixandme at gmail.com>
wrote:
>  Tim;
>
> The proper procedure for ejecting a USB drive in Windows is to right click
> the device icon and eject the appropriate listed device.
>
I''m well aware of what the proper procedure is.  My point is,
I''ve done it
for years without for various reasons, and never lost data.

>
>
> I''ve done this before without ejecting and lost data before.
>
Congratulations?  You''re honestly the first person I''ve *EVER*
heard of
losing data from it.  Now if we''re talking windows98 with it''s
beta support
of USB that''s another story entirely.  But anything from XP on... that
takes
an awful lot of work.

>
>
> My personal experience with ZFS is that it is very reliable FS.
I''ve not
> lost data on it yet even after several hardware upgrades, abrupt failures
> and recently an unofficial unsanction expansion technique.
>
> The folks at Sun who developed this earnestly believe in their product.
> Sometimes, these belief can translate to an uneven reply.
>
> For my own reasons, I too believe whole heartedly in ZFS. (I don''t
work in
> Sun nor do I own any share in Sun).
>
> Perhaps we can all work together and find the proper solution here.
>
> Logic dictates that ZFS can survive an abrupt failure far better than
> traditional VM/FS combination. The end to end checking summing simply do
not
> exist in traditional methodologies.
>
But it doesn''t, and that''s the problem.

>
> Could you describe in detail the kind of IO access you were generating
> prior to pulling out the USB?
>
I personally wouldn''t even think of putting ZFS on a USB drive. 
There''s
someone posting here weekly about losing data to ZFS on a USB solution, no
thanks.  Not only that, the complete lack of cross platform support makes it
essentially useless in my world.  I would like to believe it has more to do
with Solaris''s support of USB than ZFS, but the fact remains
it''s a pretty
glaring deficiency in 2009, no matter which part of the stack is at fault.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090211/a8fcc5f2/attachment-0005.html>

D. Eckert

2009-Feb-11 16:39 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

(...)
Good. It looks like this thread can finally die. I received the
following in response to my message below:
(...)

I apologize that your eMail could not be delivered.

This is to either the mail server you use is considered as a machine from a
dynamic ip pool or your mail server is anywhere on official lists blacklisted.

pls. check the IP of your mailserver e. g. with spamcop or spamhouse.

Regards.
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Feb-11 16:49 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, 11 Feb 2009, David Dyer-Bennet wrote:> This all-or-nothing behavior of ZFS pools is kinda scary.  Turns out
I''d
> rather have 99% of my data than 0% -- who knew?  :-)  I''d much
rather have
> 100.00% than either of course, and I''m running ZFS with mirroring,
and
> doing regular backups, because of that.
It seems to me that this level of terror is getting out of hand.  I am 
glad to see that you made it to work today since statistics show that 
you might have gotten into a deadly automobile accident on the way to 
the office and would no longer care about your data.  In fact, quite a 
lot of people get in serious automobile accidents yet we rarely hear 
such levels of terror regarding taking a drive in an automobile.

Most people are far more afraid of taking a plane flight than taking a 
drive in their car, even though taking a drive in their car is far 
more risky.

It is best to put risks in perspective.  People are notoriously poor 
at evaluating risks and paranoia is often the result.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

D. Eckert

2009-Feb-11 16:52 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

(...)
Ah... an illiterate AND idiotic bigot.
(...)

I apologize for my poor English. Yes, it''s not my mother tongue, but I
have no doubt at all, that this
discussion could be continued in German as well.

But just to make it clear:

Finally I did understand very well were I went wrong. But it wasn''t
something I did expect.

Due to the fact, that I was using a single zpool with no other filesystems
inside I thought, unmounting it with the command ''zfs umount
usbhdd1'' and checking if usbhdd1 is still shown in the output of
''mount'' (it wasn''t), I expected, that the pool was
clearly unmounted and there is no risk to yank the USB wire.

Even from the view of logic, that ''zpool export usbhdd1'' will
release the entire pool from the system should ''zfs umount
usbhdd1'' do the same in case no other filesystem exists inside this
particular pool.

if the output of the mount cmd doesn''t show your zfs pool anymore what
else should be there what can be unmounted?

This is just what caused confusion on my side, and that''s human, but I
learned for the future.

Regards.
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Feb-11 17:21 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, 11 Feb 2009, Tim wrote:>
> All that and yet the fact remains: I''ve never "ejected"
a USB drive from OS
> X or Windows, I simply pull it and go, and I''ve never once lost
data, or had
> it become unrecoverable or even corrupted.
>
> And yes, I do keep checksums of all the data sitting on them and
> periodically check it.  So, for all of your ranting and raving, the fact
> remains even a *crappy* filesystem like fat32 manages to handle a hot
unplug
> without any prior notice without going belly up.
This seems like another one of your trolls.  Any one of us who have 
used USB drives under OS-X or Windows knows that the OS complains 
quite a lot if you just unplug the drive so we all learn how to do 
things properly.

You must have very special data if you compute independent checksums 
for each one of your files, and it leaves me wondering why you think 
that data is correct due to being checksummed.  Checksumming incorrect 
data does not make that data correct.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Toby Thain

2009-Feb-11 17:35 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 11-Feb-09, at 11:19 AM, Tim wrote:
> ...
> And yes, I do keep checksums of all the data sitting on them and  
> periodically check it.  So, for all of your ranting and raving, the  
> fact remains even a *crappy* filesystem like fat32 manages to  
> handle a hot unplug without any prior notice without going belly up.
By chance, certainly not design.

--Toby
>
>
> --Tim
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Kyle McDonald

2009-Feb-11 17:46 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 2/11/2009 12:35 PM, Toby Thain wrote:>
> On 11-Feb-09, at 11:19 AM, Tim wrote:
>
>> ...
>> And yes, I do keep checksums of all the data sitting on them and 
>> periodically check it.  So, for all of your ranting and raving, the 
>> fact remains even a *crappy* filesystem like fat32 manages to handle 
>> a hot unplug without any prior notice without going belly up.
>
> By chance, certainly not design.Yep. I''ve never unplugged a USB drive on purpose, but I have left a 
drive plugged into the docking station, Hibernated windows XP 
professional, undocked the laptop, and then woken it up later undocked. 
It routinely would pop up windows saying that a ''delayed
write'' was not
successful on the now missing drive.

I''ve always counted myself lucky that any new data written to that
drive
was written long long before I hibernated, becuase have yet to find any 
problems with that data, (but I don''t read it very often if at all.)
But
it is luck only!

   -Kyle
>
> --Toby
>
>>
>>
>> --Tim
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

David Dyer-Bennet

2009-Feb-11 18:11 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 11:21, Bob Friesenhahn wrote:> On Wed, 11 Feb 2009, Tim wrote:
>>
>> All that and yet the fact remains: I''ve never
"ejected" a USB drive from
>> OS
>> X or Windows, I simply pull it and go, and I''ve never once
lost data, or
>> had
>> it become unrecoverable or even corrupted.
>>
>> And yes, I do keep checksums of all the data sitting on them and
>> periodically check it.  So, for all of your ranting and raving, the
fact
>> remains even a *crappy* filesystem like fat32 manages to handle a hot
>> unplug
>> without any prior notice without going belly up.
>
> This seems like another one of your trolls.  Any one of us who have
> used USB drives under OS-X or Windows knows that the OS complains
> quite a lot if you just unplug the drive so we all learn how to do
> things properly.
Then again, I''ve never lost data during the learning period, nor on the
rare occasions where I just get it wrong.  This is good; not quite
remembering to eject a USB memory stick is *so* easy.

We do all know why violating protocols here works so much of the time,
right?  It''s because Windows is using very simple, old-fashioned
strategies to write to the USB devices.  Write caching is nonexistent, or
of very short duration, for example.  So if IO has quiesced to the device,
it''s been several seconds since the last IO, it''s nearly
certain to just
pull it.  Nearly.

ZFS is applying much more modern, much more aggressive, optimizing
strategies.  This is entirely good; ZFS is intended for a space where
that''s important a lot of the time.  But one tradeoff is that those
rules
become more important.
> You must have very special data if you compute independent checksums
> for each one of your files, and it leaves me wondering why you think
> that data is correct due to being checksummed.  Checksumming incorrect
> data does not make that data correct.
Can''t speak for him, but I have par2 checksums and redundant data for
lots
of my old photos on disk.  I created them before writing archival optical
disks of the data, to give me some additional hope of recovering the data
in the long run.

I don''t, in fact, know that most of those photos are actually valid
data;
only the ones I''ve viewed after creating the par2 checksums (and I
can''t
rule out weird errors that don''t result in corrupting the whole rest of
the image even then).  Still, once I''ve got the checksum on file, I can
at
least determine that I''ve had a disk error in many cases (not quite
identical to determining that the data is still valid; after all, the data
and the checksum could have been corrupted in such a way that I get a
false positive on the checksum).

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

David Dyer-Bennet

2009-Feb-11 18:12 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 11:35, Toby Thain wrote:>
> On 11-Feb-09, at 11:19 AM, Tim wrote:
>
>> ...
>> And yes, I do keep checksums of all the data sitting on them and
>> periodically check it.  So, for all of your ranting and raving, the
>> fact remains even a *crappy* filesystem like fat32 manages to
>> handle a hot unplug without any prior notice without going belly up.
>
> By chance, certainly not design.
No, I do think it''s by design -- it''s because the design
isn''t
aggressively exploiting possible performance.
-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

David Dyer-Bennet

2009-Feb-11 18:21 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 10:49, Bob Friesenhahn wrote:> On Wed, 11 Feb 2009, David Dyer-Bennet wrote:
>> This all-or-nothing behavior of ZFS pools is kinda scary.  Turns out
I''d
>> rather have 99% of my data than 0% -- who knew?  :-)  I''d much
rather
>> have
>> 100.00% than either of course, and I''m running ZFS with
mirroring, and
>> doing regular backups, because of that.
>
> It seems to me that this level of terror is getting out of hand.  I am
> glad to see that you made it to work today since statistics show that
> you might have gotten into a deadly automobile accident on the way to
> the office and would no longer care about your data.  In fact, quite a
> lot of people get in serious automobile accidents yet we rarely hear
> such levels of terror regarding taking a drive in an automobile.
>
> Most people are far more afraid of taking a plane flight than taking a
> drive in their car, even though taking a drive in their car is far
> more risky.
>
> It is best to put risks in perspective.  People are notoriously poor
> at evaluating risks and paranoia is often the result.
All true (and I''m certainly glad I made it to work myself; I did drive,
which is one of the most dangerous things most people do).

I think you''re overstating my terror level, though; I''d say
I''m at yellow;
not even orange.

I''ve spent $2000 on hardware and, by now, hundreds of hours of my time
trying to get and keep a ZFS-based home NAS working.  Because it''s the
only affordable modern practice, my backups are on external drives (USB
drives because that''s "the" standard for consumer external
drives, they
were much cheaper when I bought them than any that supported Firewire at
the 1TB size).  So hearing how easy it is to muck up a ZFS pool on USB is
leading me, again, to doubt this entire enterprise.  Am I really better
off than I would be with an Infrant Ready NAS, or a Drobo?  I''m
certainly
far behind financially and with my time.

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Bob Friesenhahn

2009-Feb-11 18:23 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, 11 Feb 2009, David Dyer-Bennet wrote:>
> Then again, I''ve never lost data during the learning period, nor
on the
> rare occasions where I just get it wrong.  This is good; not quite
> remembering to eject a USB memory stick is *so* easy.
With Windows and OS-X, it is up to the *user* to determine if they 
have lost data.  This is because they are designed to be user-friendly 
operating systems.  If the disk can be loaded at all, Windows and OS-X 
will just go with what is left.  If Windows and OS-X started to tell 
users that they lost some data, then those users would be in a panic 
(just like we see here).

The whole notion of "journaling" is to intentionally lose data by 
rolling back to a known good point.  More data might be lost than if 
the task was left to a tool like ''fsck'' but the journaling
approach is
much faster.  Windows and OS-X are highly unlikely to inform you that 
some data was lost due to the filesystem being rolled back.

Your comments about write caching being a factor seem reasonable.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

David Dyer-Bennet

2009-Feb-11 18:38 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 12:23, Bob Friesenhahn wrote:> On Wed, 11 Feb 2009, David Dyer-Bennet wrote:
>>
>> Then again, I''ve never lost data during the learning period,
nor on the
>> rare occasions where I just get it wrong.  This is good; not quite
>> remembering to eject a USB memory stick is *so* easy.
>
> With Windows and OS-X, it is up to the *user* to determine if they
> have lost data.  This is because they are designed to be user-friendly
> operating systems.  If the disk can be loaded at all, Windows and OS-X
> will just go with what is left.  If Windows and OS-X started to tell
> users that they lost some data, then those users would be in a panic
> (just like we see here).
I don''t carry much on my memory stick -- mostly stuff in transit from
one
place to another.   Two things that live there constantly are my encrypted
password database, and some private keys (encrypted under passphrases).

So the stuff on the memory stick tends to get looked at, and the stuff
that lives there is in a format where corruption is very likely to get
noticed.

So while I can''t absolutely swear that I never lost data I
didn''t notice
losing, I''m fairly confident that no data was lost.  And I''m
absolutely
sure no data THAT I CARED ABOUT was lost, which is all that really
matters.
> The whole notion of "journaling" is to intentionally lose data by
> rolling back to a known good point.  More data might be lost than if
> the task was left to a tool like ''fsck'' but the
journaling approach is
> much faster.  Windows and OS-X are highly unlikely to inform you that
> some data was lost due to the filesystem being rolled back.
True about journaling.

This applies to NTFS disks for Windows, but not to FAT systems (which
aren''t journaled); and memory sticks for me are always FAT systems.

Databases have something of an all-or-nothing problem as well, for that
matter, and for something of the same reasons.
-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Frank Cusack

2009-Feb-11 19:22 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 11, 2009 12:21:03 PM -0600 David Dyer-Bennet <dd-b at
dd-b.net>
wrote:> I''ve spent $2000 on hardware and, by now, hundreds of hours of my
time
> trying to get and keep a ZFS-based home NAS working.  Because it''s
the
> only affordable modern practice, my backups are on external drives (USB
> drives because that''s "the" standard for consumer
external drives, they
> were much cheaper when I bought them than any that supported Firewire at
> the 1TB size).  So hearing how easy it is to muck up a ZFS pool on USB is
> leading me, again, to doubt this entire enterprise.
Same here, except I have no doubts.  As I only use the USB for backup,
I''m quite happy with it.  I have a 4-disk enclosure that accepts SATA
drives.

My main storage is a 12-bay SAS/SATA enclosure.

After my own experience with USB (I still have the problem that I cannot
create new pools while another USB drive is present with a zpool on it,
whether or not that zpool is active ... no response on that thread yet
and I expect never), I''m not thrilled with it and suspect some of the
problem lies in the way that USB is handled differently than other
physical connections (can''t use ''format'', e.g.). 
Anyway to get back to
the point I wouldn''t want to use it for primary storage, even if it
were only 2 drives.  That''s unfortunate, but in line with
Solaris''
hardware support, historically.

-frank

Fredrich Maney

2009-Feb-11 19:32 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, Feb 11, 2009 at 11:19 AM, Tim <tim at tcsac.net>
wrote:> On Tue, Feb 10, 2009 at 11:44 PM, Fredrich Maney <fredrichmaney at
gmail.com>
> wrote:
>> Ah... an illiterate AND idiotic bigot. Have you even read the manual
>> or *ANY* of the replies to your posts? *YOU* caused the situation that
>> resulted in your data being corrupted. Not Sun, not OpenSolaris, not
>> ZFS and not anyone on this list. Yet you feel the need to blame ZFS
>> and insult the people that have been trying to help you understand
>> what happened and why you shouldn''t do what you did.
> #1 English is clearly not his native tongue.  Calling someone idiotic and
> illiterate when they''re doing as well as he is in a second
language is not
> only inaccurate, it''s "idiotic".
I have a great deal of respect for his command of more than one
language. What I don''t have any respect for is his complete
unwillingness to actually read the dozens of responses that have all
said the same thing, namely that his problems are self inflicted due
his refusal to read the documentation. I refrained from calling him an
idiot until after he proved himself one by spewing his blind bigotry
against the US. All in all, I''d say he got far better treatment than
he gave and infinitely better than he deserved.
>> ZFS is not a filesystem like UFS or Reiserfs, nor is it an LVM like
>> SVM or VxVM. It is both a filesystem and a logical volume manager. As
>> such, like all LVM solutions, there are two steps that you must
>> perform to safely remove a disk: unmount the filesystem and quiesce
>> the volume. That means you *MUST*, in the case of ZFS, issue
''umount
>> filesystem'' *AND* ''zpool export'' before you
yank the USB stick out of
>> the machine.
>>
>> Effectively what you did was create a one-sided mirrored volume with
>> one filesystem on it, then put your very important (but not important
>> enough to bother mirroring or backing up) data on it. Then you
>> unmounted the filesystem and ripped the active volume out of the
>> machine. You got away with it a couple of times because just how good
>> of a job the ZFS developers did at idiot proofing it, but when it
>> finally got to the point where you lost your data, you came here to
>> bitch and point fingers at everyone but the responsible party (hint,
>> it''s you). When your ignorance (and fault) was pointed out to
you, you
>> then resorted to personal attacks and slurs. Nice. Very professional.
>> Welcome to the bit-bucket.
>
> All that and yet the fact remains: I''ve never "ejected"
a USB drive from OS
> X or Windows, I simply pull it and go, and I''ve never once lost
data, or had
> it become unrecoverable or even corrupted.
You''ve been lucky then. I''ve lost data and had corrupted
filesystems
on USB sticks on both of those OSes, as well as several Linux and BSD
variants, from doing just that.

[...]

fpsm

Frank Cusack

2009-Feb-11 19:36 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 11, 2009 2:07:47 AM -0800 Gino <dandr.ch at gmail.com>
wrote:> I agree but I''d like to point out that the MAIN problem with ZFS
is that
> because of a corruption you-ll loose ALL your data and there is no way to
> recover it. Consider an example where you have 100TB of data and a fc
> switch fails or other hw problem happens during I/O on a single file.
> With UFS you''ll probably get corruption on that single file. With
ZFS
> you''ll loose all your data.  I totally agree that ZFS is
theoretically
> much much much much much better than UFS but in real world application
> having a risk to loose access to an entire pool is not acceptable.
if you have 100TB of data, wouldn''t you have a completely redundant
storage network -- dual FC switches on different electrical supplies,
etc.  i''ve never designed or implemented a storage network before but
such designs seem common in the literature and well supported by
Solaris.  i have done such designs with data networks and such
redundancy is quite common.

i mean, that''s a lot of data to go missing due to a single device
failing -- which it will.

not to say it''s not a problem with zfs, just that in the real world,
it should be mitigated since your storage network design would overcome
a single failure *anyway* -- regardless of zfs.

-frank

Ian Collins

2009-Feb-11 19:45 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

David Dyer-Bennet wrote:> I''ve spent $2000 on hardware and, by now, hundreds of hours of my
time
> trying to get and keep a ZFS-based home NAS working.  
Hundreds of hours doing what?  I just plugged in the drives, built the 
pool and left the box in a corner for the past couple of years.  It''s 
been upgraded twice, from build 62 to 72 to get the SATA framework and 
then to b101 for CIFS.

-- 
Ian.

Thommy M. Malmström

2009-Feb-11 20:01 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> after working for 1 month with ZFS on 2 external USB
> drives I have experienced, that the all new zfs
> filesystem is the most unreliable FS I have ever
> seen.
Troll.
-- 
This message posted from opensolaris.org

David Dyer-Bennet

2009-Feb-11 20:01 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 13:45, Ian Collins wrote:> David Dyer-Bennet wrote:
>> I''ve spent $2000 on hardware and, by now, hundreds of hours of
my time
>> trying to get and keep a ZFS-based home NAS working.
>
> Hundreds of hours doing what?  I just plugged in the drives, built the
> pool and left the box in a corner for the past couple of years. 
It''s
> been upgraded twice, from build 62 to 72 to get the SATA framework and
> then to b101 for CIFS.
Well, good for you.  It took me a lot of work to get it working in the
first place (and then with only 4 of my 8 hot-swap bays, 4 of my 6 eSATA
connections on the motherboard) working.  Before that, I''d spent quite
a
lot of time trying to get VMWare to run Solaris, which it wouldn''t back
then.  I did manage to get Parallels, I think it was, to let me create a
Solaris system and then a ZFS pool to play with (this was back before
OpenSolaris and before any sort of LiveCD I could find).  Then I had a
series of events starting in December of last year that, in hindsight, I
think were mainly or entirely one memory SIMM going bad, which caused me
to upgrade to 2008.11 and also have to restore my main pool from backup. 
Oh, and converted from using Samba to using CIFS.   I''m just now
getting
close to having things up working again usably and stably, still working
on backup.  I do still have some problems with file access permissions I
know, due to the new different handling of ACLs I guess.

And I wasn''t a Solaris admin to begin with.  I guess SunOS back when
was
the first Unix I had root on, but since then I''ve mostly worked with
Linux
(including my time as news admin for a local ISP, and my years as an
engineer with Sun, where I was in the streaming video server group).  In
some ways a completely UNfamiliar system might have been easier :-).

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Tim

2009-Feb-11 20:55 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, Feb 11, 2009 at 11:46 AM, Kyle McDonald <KMcDonald at
egenera.com>wrote:
>
> Yep. I''ve never unplugged a USB drive on purpose, but I have left
a drive
> plugged into the docking station, Hibernated windows XP professional,
> undocked the laptop, and then woken it up later undocked. It routinely
would
> pop up windows saying that a ''delayed write'' was not
successful on the now
> missing drive.
>
> I''ve always counted myself lucky that any new data written to that
drive
> was written long long before I hibernated, becuase have yet to find any
> problems with that data, (but I don''t read it very often if at
all.) But it
> is luck only!
>
>  -Kyle
>
Right, except the OP stated he unmounted the filesystem in question, and it
was the *ONLY* one on the drive, meaning there is absolutely 0 chance of
their being pending writes.  There''s nothing to write to.

 I don''t know what exactly it is you put on your USB drives, but
I''m
certainly aware of whether or not things on mine are in use before pulling
the drive out.  If a picture is open and in an editor, I''m obviously
not
going to save it then pull the drive mid-save.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090211/c54b6c59/attachment-0005.html>

Tim

2009-Feb-11 21:02 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, Feb 11, 2009 at 1:36 PM, Frank Cusack <fcusack at fcusack.com>
wrote:
>
> if you have 100TB of data, wouldn''t you have a completely
redundant
> storage network -- dual FC switches on different electrical supplies,
> etc.  i''ve never designed or implemented a storage network before
but
> such designs seem common in the literature and well supported by
> Solaris.  i have done such designs with data networks and such
> redundancy is quite common.
>
> i mean, that''s a lot of data to go missing due to a single device
> failing -- which it will.
>
> not to say it''s not a problem with zfs, just that in the real
world,
> it should be mitigated since your storage network design would overcome
> a single failure *anyway* -- regardless of zfs.
>
It''s hardly uncommon for an entire datacenter to go down, redundant
power or
not.  When it does, if it means I have to restore hundreds of terabytes if
not petabytes from tape instead of just restoring the files that were
corrupted or running an fsck, we''ve got issues.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090211/d1bf8b21/attachment-0005.html>

Frank Cusack

2009-Feb-11 21:51 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 11, 2009 3:02:48 PM -0600 Tim <tim at tcsac.net>
wrote:> On Wed, Feb 11, 2009 at 1:36 PM, Frank Cusack <fcusack at
fcusack.com> wrote:
>
>>
>> if you have 100TB of data, wouldn''t you have a completely
redundant
>> storage network -- dual FC switches on different electrical supplies,
>> etc.  i''ve never designed or implemented a storage network
before but
>> such designs seem common in the literature and well supported by
>> Solaris.  i have done such designs with data networks and such
>> redundancy is quite common.
>>
>> i mean, that''s a lot of data to go missing due to a single
device
>> failing -- which it will.
>>
>> not to say it''s not a problem with zfs, just that in the real
world,
>> it should be mitigated since your storage network design would overcome
>> a single failure *anyway* -- regardless of zfs.
>>
>
> It''s hardly uncommon for an entire datacenter to go down,
redundant power
> or not.  When it does, if it means I have to restore hundreds of
> terabytes if not petabytes from tape instead of just restoring the files
> that were corrupted or running an fsck, we''ve got issues.
Isn''t this easily worked around by having UPS power in addition to
whatever
the data center supplies?

I''ve been there with entire data center shutdown (or partial, but
entire
as far as my gear is concerned), but for really critical stuff we''ve
had
our own UPS.

I don''t know if that really works for 100TB and up though. 
That''s a lot
of disk == a lot of UPS capacity.  And again, I''m not trying to take
away
from the fact that this is a significant zfs problem.

-frank

Bob Friesenhahn

2009-Feb-11 21:52 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, 11 Feb 2009, Tim wrote:>
> Right, except the OP stated he unmounted the filesystem in question, and it
> was the *ONLY* one on the drive, meaning there is absolutely 0 chance of
> their being pending writes.  There''s nothing to write to.
This is an interesting assumption leading to a wrong conclusion.  If 
the file is updated and the filesystem is "unmounted", it is still 
possible for there to be uncommitted data in the pool.  If you pay 
closer attention you will see that "mounting" the filesystem basically
just adds a logical path mapping since the filesystem is already 
available under /poolname/filesystemname regardless.  So doing the 
mount makes /poolname/filesystemname available as /filesystemname, or 
whatever mount path you specify.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

David Dyer-Bennet

2009-Feb-11 22:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 15:51, Frank Cusack wrote:> On February 11, 2009 3:02:48 PM -0600 Tim <tim at tcsac.net> wrote:
>> It''s hardly uncommon for an entire datacenter to go down,
redundant
>> power
>> or not.  When it does, if it means I have to restore hundreds of
>> terabytes if not petabytes from tape instead of just restoring the
files
>> that were corrupted or running an fsck, we''ve got issues.
>
> Isn''t this easily worked around by having UPS power in addition to
> whatever the data center supplies?
Well, that covers some of the cases (it does take a fairly hefty UPS to
deal with 100TB levels of redundant disk).
> I''ve been there with entire data center shutdown (or partial, but
entire
> as far as my gear is concerned), but for really critical stuff
we''ve had
> our own UPS.
I knew people once who had pretty careful power support; UPS where needed,
then backup generator that would cut in automatically, and cut back when
power was restored.

Unfortunately, the cut back failed to happen automatically.  On a weekend.
 So things sailed along fine until the generator ran out of fuel, and then
shut down MOST uncleanly.

Best laid plans of mice and men gang aft agley, or some such (from memory,
and the spelling seems unlikely).  Sure, human error was a factor.  But
human error is a MAJOR factor in the real world, and one of the things
we''re trying to protect our data from.

Certainly, if a short power glitch on the normal mains feed (to lapse into
Brit for a second) brings down your data server in an uncontrolled
fashion, you didn''t do a very good job of protecting it.  My home NAS
is
protected to the point of one UPS, anyway.  But real-world problems a few
steps more severe can produce the same power cut, practically anywhere,
just not as often.
> I don''t know if that really works for 100TB and up though. 
That''s a lot
> of disk == a lot of UPS capacity.  And again, I''m not trying to
take away
> from the fact that this is a significant zfs problem.
We''ve got this UPS in our server room that''s about, oh, 4
washing machines
in size.  It''s wired into building power, and powers the outlets the
servers are plugged into, and the floor outlets out here the development
PCs are plugged into also.

I never got the tour, but I heard about the battery backup system at the
old data center Northwest Airlines had back when they ran their own
reservations system.  Enough lead-acid batteries to keep an IBM mainframe
running for three hours.

One can certainly do it if one wants to badly enough, which one should if
the data is important.  I can''t imagine anybody investing in 100TB of
enterprise-grade storage if the data WASN''T important!

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

David Dyer-Bennet

2009-Feb-11 22:52 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 15:52, Bob Friesenhahn wrote:> On Wed, 11 Feb 2009, Tim wrote:
>>
>> Right, except the OP stated he unmounted the filesystem in question,
and
>> it
>> was the *ONLY* one on the drive, meaning there is absolutely 0 chance
of
>> their being pending writes.  There''s nothing to write to.
>
> This is an interesting assumption leading to a wrong conclusion.  If
> the file is updated and the filesystem is "unmounted", it is
still
> possible for there to be uncommitted data in the pool.  If you pay
> closer attention you will see that "mounting" the filesystem
basically
> just adds a logical path mapping since the filesystem is already
> available under /poolname/filesystemname regardless.  So doing the
> mount makes /poolname/filesystemname available as /filesystemname, or
> whatever mount path you specify.
As a practical matter, it seems unreasonable to me that there would be
uncommitted data in the pool after some quite short period of time when
there''s no new IO activity to the pool (not just the filesystem).  5 or
10
seconds, maybe?  (Possibly excepting if there was a HUGE spike of IO for a
while just before this; there could be considerable stuff in the ZIL not
yet committed then, I would think.)

That is, if I plug in a memory stick with ZFS on it, read and write for a
while, then when I''m done and IO appears to have quiesced, observe that
the IO light on the drive is inactive for several seconds, I''d be kinda
disappointed if I got actual corrution if I pulled it.  Complaints about
not being exported next time I tried to import it, sure.  Maybe other
complaints.  I wouldn''t do this deliberately (other than for testing). 
But it seems wrong to leave things uncommitted significantlylonger than
necessary (seconds are huge time units to a computer, after all), and if
the device is sitting there not doing IO, there''s no reason it
shouldn''t
have been writing anything uncommitted instead.

Conversely, anybody who is pulling disks / memory sticks off while IO is
visibly incomplete really SHOULD expect to lose everything on them, even
if sometimes they''ll be luckier than that.  I suppose we''re
dealing with
people who didn''t work with floppies here, where that lesson got pretty
solidly beaten in to people :-.

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Bob Friesenhahn

2009-Feb-11 23:25 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, 11 Feb 2009, David Dyer-Bennet wrote:>
> As a practical matter, it seems unreasonable to me that there would be
> uncommitted data in the pool after some quite short period of time when
> there''s no new IO activity to the pool (not just the filesystem). 
5 or 10
> seconds, maybe?  (Possibly excepting if there was a HUGE spike of IO for a
> while just before this; there could be considerable stuff in the ZIL not
> yet committed then, I would think.)
I agree.  ZFS apparently syncs uncommitted writes every 5 seconds. 
If there has been no filesystem I/O (including read I/O due to atime) 
for at least 10 seconds, and there has not been more data 
burst-written into RAM than can be written to disk in 10 seconds, then 
there should be nothing remaining to write.

Regardless, it seems that the ZFS problems with crummy hardware are 
primarily due to the crummy hardware writting the data to the disk in 
a different order than expected.  ZFS expects that after a sync that 
all pending writes are committed.

The lesson is that unprofessional hardware may prove to be unreliable 
for professional usage.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Uwe Dippel

2009-Feb-12 00:16 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

I need to disappoint you here, LED inactive for a few seconds is a very bad
indicator of pending writes. Used to experience this on a stick on Ubuntu, which
was silent until the ''umount'' and then it started to write for
some 10 seconds.

On the other hand, you are spot-on w.r.t. ''umount''. Once the
command is through, there is no more write to be expected. And if there was, it
would be a serious bug. So this ''umount''ed system needs to be
in perfectly consistent states. (Which is why I wrote further up that the
structure above the file system, that is the pool, is probably the culprit for
all this misery.)

[i]Conversely, anybody who is pulling disks / memory sticks off while IO is
visibly incomplete really SHOULD expect to lose everything on them[/i]
I hope you don''t mean this. Not in a filesystem much hyped and much
advanced. Of course, we expect corruption of all files whose
''write'' has been boldly interrupted. But I for one, expect the
metadata of all other files to be readily available. Kind of, at the next use,
telling me:"You idiot removed the plug last, while files were still in the
process of writing. Don''t expect them to be available now. Here is the
list of all other files: [list of all files not being written then]"

Uwe
-- 
This message posted from opensolaris.org

Toby Thain

2009-Feb-12 00:25 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 11-Feb-09, at 5:52 PM, David Dyer-Bennet wrote:
>
> On Wed, February 11, 2009 15:52, Bob Friesenhahn wrote:
>> On Wed, 11 Feb 2009, Tim wrote:
>>>
>>> Right, except the OP stated he unmounted the filesystem in  
>>> question, and
>>> it
>>> was the *ONLY* one on the drive, meaning there is absolutely 0  
>>> chance of
>>> their being pending writes.  There''s nothing to write to.
>>
>> This is an interesting assumption leading to a wrong conclusion.  If
>> the file is updated and the filesystem is "unmounted", it is
still
>> possible for there to be uncommitted data in the pool. ...
>
> As a practical matter, it seems unreasonable to me that there would be
> uncommitted data in the pool after some quite short period of time ...
>
> That is, if I plug in a memory stick with ZFS on it, read and write  
> for a
> while, then when I''m done and IO appears to have quiesced, observe
> that
> the IO light on the drive is inactive for several seconds, I''d be
> kinda
> disappointed if I got actual corrution if I pulled it.
Absolutely. You should never get "actual corruption" (inconsistency)  
at any time *except* in the case Jeff Bonwick explained: i.e. faulty/ 
misbehaving hardware! (That''s one meaning of "always consistent on
disk".)

I think this is well understood, is it not?

Write barriers are not a new concept, and nor is the necessity. For  
example, they are a clearly described feature of DEC''s MSCP  
protocol*, long before ATA or SCSI - presumably so that transactional  
systems could actually be built at all. Devices were held to a high  
standard of conformance since DEC''s customers (like Sun''s)
were
traditionally those whose data was of very high value. Storage  
engineers across the industry were certainly implementing them long  
before MSCP.

--Toby


* - The related patent that I am looking at is #4,449,182, filed 5  
Oct, 1981.
"Interface between a pair of processors, such as host and peripheral- 
controlling processors in data processing systems."

Also the MSCP document released with the UDA50 mass storage  
subsystem, dated April 1982:

"4.5 Command Categories and Execution Order
...
Sequential commands are those commands that, for the same unit, must  
be executed in precise order. ... All sequential commands for a  
particular unit that are received on the same connection must be  
executed in the exact order that the MSCP server receives them. The  
execution of a sequential command may not be interleaved with the  
execution of any other sequential or non-sequential commands for the  
same unit. Furthermore, any non-sequential commands received before  
and on the same connection as a particular sequential command must be  
completed before execution of that sequential command begins, and any  
non-sequential commands received after and on the same conection as a  
particular sequential command must not begin execution until after  
that sequential command is completed. Sequential commands are, in  
effect, a barrier than non-sequential commands cannot pass or penetrate.
    Non-sequential commands are those commands that controllers may  
re-order so as to optimize performance. Controllers may furthermore  
interleave the execution of several non-sequential commands among  
themselves, ..."


> Complaints about
> not being exported next time I tried to import it, sure.  Maybe other
> complaints.  I wouldn''t do this deliberately (other than for
testing).
> ...
>
> -- 
> David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
> Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
> Photos: http://dd-b.net/photography/gallery/
> Dragaera: http://dragaera.info
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Toby Thain

2009-Feb-12 00:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 11-Feb-09, at 7:16 PM, Uwe Dippel wrote:
> I need to disappoint you here, LED inactive for a few seconds is a  
> very bad indicator of pending writes. Used to experience this on a  
> stick on Ubuntu, which was silent until the ''umount'' and
then it
> started to write for some 10 seconds.
>
> On the other hand, you are spot-on w.r.t. ''umount''. Once
the
> command is through, there is no more write to be expected. And if  
> there was, it would be a serious bug.
Yes; though at the risk of repetition - the bug here can be in the  
drive...
> So this ''umount''ed system needs to be in perfectly
consistent
> states. (Which is why I wrote further up that the structure above  
> the file system, that is the pool, is probably the culprit for all  
> this misery.)
>
> [i]Conversely, anybody who is pulling disks / memory sticks off  
> while IO is
> visibly incomplete really SHOULD expect to lose everything on them[/i]
> I hope you don''t mean this. Not in a filesystem much hyped and
much
> advanced. Of course, we expect corruption of all files whose  
> ''write'' has been boldly interrupted. But I for one,
expect the
> metadata of all other files to be readily available. Kind of, at  
> the next use, telling me:"You idiot removed the plug last, while  
> files were still in the process of writing. Don''t expect them to
be
> available now. Here is the list of all other files: [list of all  
> files not being written then]"
That hope is a little naive. AIUI, it cannot be known, thanks to the  
many indeterminacies of the I/O path, which ''files'' were
partially
written (since a whole slew of copy-on-writes to many objects could  
have been in flight, and absent a barrier it cannot be known post  
facto which succeeded). What is known, is the last checkpoint. Hence  
the feasible recovery mode is a partial, automatic rollback to a past  
consistent state.

Somebody correct me if I am wrong.

--Toby
>
> Uwe
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Uwe Dippel

2009-Feb-12 02:30 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Toby,

sad that you fall for the last resort of the marketing droids here. All
manufactures (and there are only a few left) will sue the hell out of you if you
state that their drives don''t ''sync''. And each and
every drive I have ever used did. So the talk about a distinct borderline
between ''enterprise'' and ''home'' is just
cheap and not sustainable.

Also, if you were correct, and ZFS allowed for compromising the metadata of
dormant files (folders) by writing metadata for other files (folders), we would
not have advanced beyond FAT, and ZFS would be but a short episode in the
history of file systems. Or am I the last to notice that atomic writes have been
dropped? Especially with atomic writes you either have the last consistent state
of the file structure, or the updated one. So what would be the meaning of
''always consistent on the drive'' if metadata were allowed to
hang in between; in an inconsistent state? You write "What is known, is the
last checkpoint." Exactly, and here a contradiction shows: the last
checkpoint of all untouched files (plus those read only) does contain exactly
all untouched files. How could one allow to compromise the last checkpoint by
writing a new one?
You are correct with "the feasible recovery mode is a partial". Though
here we have heard some stories of total loss. Nobody has questioned that the
recovery of an interrupted ''write'' must necessarily be
partial. What is questioned is the complete loss of semantics.

Uwe
-- 
This message posted from opensolaris.org

David Dyer-Bennet

2009-Feb-12 02:49 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 17:25, Bob Friesenhahn wrote:
> Regardless, it seems that the ZFS problems with crummy hardware are
> primarily due to the crummy hardware writting the data to the disk in
> a different order than expected.  ZFS expects that after a sync that
> all pending writes are committed.
Which is something Unix has been claiming (or pretending) to provide for
some time now, yes.
> The lesson is that unprofessional hardware may prove to be unreliable
> for professional usage.
Or any other usage.  And the question is how can we tell them apart?

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

David Dyer-Bennet

2009-Feb-12 02:53 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 18:25, Toby Thain wrote:
>
> Absolutely. You should never get "actual corruption"
(inconsistency)
> at any time *except* in the case Jeff Bonwick explained: i.e. faulty/
> misbehaving hardware! (That''s one meaning of "always
consistent on
> disk".)
>
> I think this is well understood, is it not?
Perhaps.  I think the consensus seems to be settling down this direction
(as I filter for reliability of people posting, not by raw count :-)).

The shocker is how much hardware that doesn''t behave to spec in this
area
seems to be out there -- or so people claim; the other problem is that we
can''t sort out which is which.
> Write barriers are not a new concept, and nor is the necessity. For
> example, they are a clearly described feature of DEC''s MSCP
> protocol*, long before ATA or SCSI - presumably so that transactional
> systems could actually be built at all. Devices were held to a high
> standard of conformance since DEC''s customers (like
Sun''s) were
> traditionally those whose data was of very high value. Storage
> engineers across the industry were certainly implementing them long
> before MSCP.
>
> --Toby
>
>
> * - The related patent that I am looking at is #4,449,182, filed 5
> Oct, 1981.
> "Interface between a pair of processors, such as host and peripheral-
> controlling processors in data processing systems."
While I was working for LCG in Marlboro, in fact.  (Not on hardware,
nowhere near that work.)
-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Toby Thain

2009-Feb-12 03:20 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 11-Feb-09, at 9:30 PM, Uwe Dippel wrote:
> Toby,
>
> sad that you fall for the last resort of the marketing droids here.  
> All manufactures (and there are only a few left) will sue the hell  
> out of you if you state that their drives don''t
''sync''. And each
> and every drive I have ever used did. So the talk about a distinct  
> borderline between ''enterprise'' and
''home'' is just cheap and not
> sustainable.
They have existed. This thread has shown a motive to verify COTS  
drives for this property, if the data is valuable.
>
> Also, if you were correct, and ZFS allowed for compromising the  
> metadata of dormant files (folders) by writing metadata for other  
> files (folders), we would not have advanced beyond FAT, and ZFS  
> would be but a short episode in the history of file systems. Or am  
> I the last to notice that atomic writes have been dropped?  
> Especially with atomic writes you either have the last consistent  
> state of the file structure, or the updated one. So what would be  
> the meaning of ''always consistent on the drive'' if
metadata were
> allowed to hang in between; in an inconsistent state? You write  
> "What is known, is the last checkpoint." Exactly, and here a  
> contradiction shows: the last checkpoint of all untouched files  
> (plus those read only) does contain exactly all untouched files.  
> How could one allow to compromise the last checkpoint by writing a  
> new one?
ZFS claims that the last checkpoint (my term, sorry, not an official  
one) is fully consistent (metadata *and* data! Unlike other  
filesystems). Since consistency is achievable by thousands of other  
transactional systems I have no reason to doubt that it is achieved  
by ZFS.
> You are correct with "the feasible recovery mode is a partial".  
> Though here we have heard some stories of total loss. Nobody has  
> questioned that the recovery of an interrupted ''write''
must
> necessarily be partial. What is questioned is the complete loss of  
> semantics.
Only an incomplete transaction would be lost, AIUI. That is the  
''atomic'' property of all journaled and transactional systems.
(All of
it, or none of it.)

--Toby

>
> Uwe
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Uwe Dippel

2009-Feb-12 04:54 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

May I doubt that there are drives that don''t ''sync''?
That means you have a good chance of corrupted data at a normal
''reboot''; or just at a ''umount'' (without
considering ZFS here).
May I doubt the marketing drab that you need to buy a USCSI or whatnot to have
functional ''sync'' at a shutdown or umount? There are millions
if not billions of drives out there that come up with consistent data structures
after a clean shutdown.
This means that a proper ''umount'' flushes everything on those
drives, and we need not expect corrupted data, and no further writes. And that
was the topic further up to which I tried to answer. As well as to the notion
that a file system that encounters interrupted writes may well and legally be
completely unreadable. That is what I refuted, nothing else.

Uwe
-- 
This message posted from opensolaris.org

David Dyer-Bennet

2009-Feb-12 15:16 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Wed, February 11, 2009 18:16, Uwe Dippel wrote:> I need to disappoint you here, LED inactive for a few seconds is a very
> bad indicator of pending writes. Used to experience this on a stick on
> Ubuntu, which was silent until the ''umount'' and then it
started to write
> for some 10 seconds.
Yikes, that''s bizarre.
> On the other hand, you are spot-on w.r.t. ''umount''. Once
the command is
> through, there is no more write to be expected. And if there was, it would
> be a serious bug. So this ''umount''ed system needs to be
in perfectly
> consistent states. (Which is why I wrote further up that the structure
> above the file system, that is the pool, is probably the culprit for all
> this misery.)
Yeah, once it''s unmounted it really REALLY should be in a consistent
state.
> [i]Conversely, anybody who is pulling disks / memory sticks off while IO
> is
> visibly incomplete really SHOULD expect to lose everything on them[/i]
> I hope you don''t mean this. Not in a filesystem much hyped and
much
> advanced. Of course, we expect corruption of all files whose
''write'' has
> been boldly interrupted. But I for one, expect the metadata of all other
> files to be readily available. Kind of, at the next use, telling
me:"You
> idiot removed the plug last, while files were still in the process of
> writing. Don''t expect them to be available now. Here is the list
of all
> other files: [list of all files not being written then]"
It''s good to have hopes, certainly.  I''m just kinda cynical.
-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

D. Eckert

2009-Feb-12 15:30 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

after all statements read here I just want to highlight another issue regarding
ZFS.

It was here many times recommended to set copies=2.

Installing Solaris 10 10/2008 or snv_107 you can choose either to use UFS or
ZFS.

If you choose ZFS by default, the rpool will be created by default with
''copies=1''.

If someone does not mention this and you have a hanging system with no chance to
access or to shutdown properly and you have no other chance than to press the
power button of your notebook through the desk plate, couldn''t it be
that there happens the same with my external usb drive?

This is the same sudden power off event what seems to damage my pool.

And it would be a nice to have that ZFS could handle this.

Another issue what I miss in this thread is, that ZFS is a layer on an EFI
lable. What about that in case of a sudden power off event?

Regards,

Dave.
-- 
This message posted from opensolaris.org

Ross

2009-Feb-12 16:10 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

> All that and yet the fact
> remains: I&#39;ve never &quot;ejected&quot; a USB
> drive from OS X or Windows, I simply pull it and go,
> and I&#39;ve never once lost data, or had it become
> unrecoverable or even corrupted.<br>
> <br>And yes, I do keep checksums of all the data
> sitting on them and periodically check it. &nbsp;So,
> for all of your ranting and raving, the fact remains
> even a *crappy* filesystem like fat32 manages to
> handle a hot unplug without any prior notice without
> going belly up.<br>
> <br>--Tim<br></div></div>
Just wanted to chime in with my 2c here.  I''ve also *never* unmounted a
USB drive from windows, and have been using them regularly since memory sticks
became available.  So that''s 2-3 years of experience and I''ve
never lost work on a memory stick, nor had a file corrupted.

I can also state with confidence that very, very few of the 100 staff working
here will even be aware that it''s possible to unmount a USB volume in
windows.  They will all just pull the plug when their work is saved, and since
they all come to me when they have problems, I think I can safely say that
pulling USB devices really doesn''t tend to corrupt filesystems in
Windows.  Everybody I know just waits for the light on the device to go out.

And while this isn''t really what ZFS is designed to do, I do think it
should be able to cope.  First of all, some kind of ZFS recovery tools are
needed.  There''s going to be an awful lot of good data on that disk,
making all of that inaccessible just because the last write failed
isn''t really on.  It''s a copy on write filesystem, "zpool
import" really should be able to take advantage of that for recovering
pools!

I don''t know the technicalities of how it works on disk, but my feeling
is that the last successful mount point should be saved, and the last few
uberblocks should also be available, so barring complete hardware failure, some
kind of pool should be available for mounting.

Also, if a drive is removed while writes are pending, some kind of error or
warning is needed, either in the console, or the GUI.  It should be possible to
prompt the user to re-insert the device so that the remaining writes can be
completed.  Recovering the pool in that situation should be easy - you can keep
the location of the uberblock you''re using in memory, and just re-write
everything.

Of course, that does assume that devices are being truthful when they say that
data has been committed, but a little data loss from badly designed hardware is
I feel acceptable, so long as ZFS can have a go at recovering corrupted pools
when it does happen, instead of giving up completely like it does now.

Yes, these problems happen more often with consumer level hardware, but recovery
tools like this are going to be very much appreciated by anybody who encounters
problems like this on a server!
-- 
This message posted from opensolaris.org

Robert Milkowski

2009-Feb-12 16:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Hello Bob,

Wednesday, February 11, 2009, 11:25:12 PM, you wrote:

BF> I agree.  ZFS apparently syncs uncommitted writes every 5 seconds. 
BF> If there has been no filesystem I/O (including read I/O due to atime) 
BF> for at least 10 seconds, and there has not been more data 
BF> burst-written into RAM than can be written to disk in 10 seconds, then
BF> there should be nothing remaining to write.

That''s not entirely true. After recent changes writes could be delayed
even up-to 30s by default.


-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

Greg Palmer

2009-Feb-12 16:53 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Ross wrote:> I can also state with confidence that very, very few of the 100 staff
working here will even be aware that it''s possible to unmount a USB
volume in windows.  They will all just pull the plug when their work is saved,
and since they all come to me when they have problems, I think I can safely say
that pulling USB devices really doesn''t tend to corrupt filesystems in
Windows.  Everybody I know just waits for the light on the device to go out.
>   The key here is that Windows does not cache writes to the USB drive 
unless you go in and specifically enable them. It caches reads but not 
writes. If you enable them you will lose data if you pull the stick out 
before all the data is written. This is the type of safety measure that 
needs to be implemented in ZFS if it is to support the average user 
instead of just the IT professionals.

Regards,
  Greg

David Dyer-Bennet

2009-Feb-12 17:31 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Thu, February 12, 2009 10:10, Ross wrote:
> Of course, that does assume that devices are being truthful when they say
> that data has been committed, but a little data loss from badly designed
> hardware is I feel acceptable, so long as ZFS can have a go at recovering
> corrupted pools when it does happen, instead of giving up completely like
> it does now.
Well; not "acceptable" as such.  But I''d agree it''s
outside ZFS''s purview.
 The blame for data lost due to hardware actively lying and not working to
spec goes to the hardware vendor, not to ZFS.

If ZFS could easily and reliably warn about such hardware I''d want it
to,
but the consensus seems to be that we don''t have a reliable
qualification
procedure.  In terms of upselling people to a Sun storage solution, having
ZFS diagnose problems with their cheap hardware early is clearly desirable
:-).

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

Tim

2009-Feb-12 20:02 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet <dd-b at dd-b.net>
wrote:
>
> On Thu, February 12, 2009 10:10, Ross wrote:
>
> > Of course, that does assume that devices are being truthful when they
say
> > that data has been committed, but a little data loss from badly
designed
> > hardware is I feel acceptable, so long as ZFS can have a go at
recovering
> > corrupted pools when it does happen, instead of giving up completely
like
> > it does now.
>
> Well; not "acceptable" as such.  But I''d agree
it''s outside ZFS''s purview.
>  The blame for data lost due to hardware actively lying and not working to
> spec goes to the hardware vendor, not to ZFS.
>
> If ZFS could easily and reliably warn about such hardware I''d want
it to,
> but the consensus seems to be that we don''t have a reliable
qualification
> procedure.  In terms of upselling people to a Sun storage solution, having
> ZFS diagnose problems with their cheap hardware early is clearly desirable
> :-).
>
>
Right, well I can''t imagine it''s impossible to write a small
app that can
test whether or not drives are honoring correctly by issuing a commit and
immediately reading back to see if it was indeed committed or not.  Like a
"zfs test cXtX".  Of course, then you can''t just blame the
hardware
everytime something in zfs breaks ;)

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090212/43a343c7/attachment.html>

Gary Mills

2009-Feb-12 20:35 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer
wrote:> Ross wrote:
> >I can also state with confidence that very, very few of the 100 staff 
> >working here will even be aware that it''s possible to unmount
a USB volume
> >in windows.  They will all just pull the plug when their work is saved,
> >and since they all come to me when they have problems, I think I can 
> >safely say that pulling USB devices really doesn''t tend to
corrupt
> >filesystems in Windows.  Everybody I know just waits for the light on
the
> >device to go out.
> >  
> The key here is that Windows does not cache writes to the USB drive 
> unless you go in and specifically enable them. It caches reads but not 
> writes. If you enable them you will lose data if you pull the stick out 
> before all the data is written. This is the type of safety measure that 
> needs to be implemented in ZFS if it is to support the average user 
> instead of just the IT professionals.
That implies that ZFS will have to detect removable devices and treat
them differently than fixed devices.  It might have to be an option
that can be enabled for higher performance with reduced data security.

-- 
-Gary Mills-    -Unix Support-    -U of M Academic Computing and Networking-

Mattias Pantzare

2009-Feb-12 20:45 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>
> Right, well I can''t imagine it''s impossible to write a
small app that can
> test whether or not drives are honoring correctly by issuing a commit and
> immediately reading back to see if it was indeed committed or not.  Like a
> "zfs test cXtX".  Of course, then you can''t just blame
the hardware
> everytime something in zfs breaks ;)
A read of data in the disk cache will be read from the disk cache. You
can''t tell the disk to ignore its cache and read directly from the
plater.

 The only way to test this is to write and the remove the power from
the disk. Not easy in software.

Ross Smith

2009-Feb-12 20:51 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

That would be the ideal, but really I''d settle for just improved error
handling and recovery for now.  In the longer term, disabling write
caching by default for USB or Firewire drives might be nice.


On Thu, Feb 12, 2009 at 8:35 PM, Gary Mills <mills at cc.umanitoba.ca>
wrote:> On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote:
>> Ross wrote:
>> >I can also state with confidence that very, very few of the 100
staff
>> >working here will even be aware that it''s possible to
unmount a USB volume
>> >in windows.  They will all just pull the plug when their work is
saved,
>> >and since they all come to me when they have problems, I think I
can
>> >safely say that pulling USB devices really doesn''t tend to
corrupt
>> >filesystems in Windows.  Everybody I know just waits for the light
on the
>> >device to go out.
>> >
>> The key here is that Windows does not cache writes to the USB drive
>> unless you go in and specifically enable them. It caches reads but not
>> writes. If you enable them you will lose data if you pull the stick out
>> before all the data is written. This is the type of safety measure that
>> needs to be implemented in ZFS if it is to support the average user
>> instead of just the IT professionals.
>
> That implies that ZFS will have to detect removable devices and treat
> them differently than fixed devices.  It might have to be an option
> that can be enabled for higher performance with reduced data security.
>
> --
> -Gary Mills-    -Unix Support-    -U of M Academic Computing and
Networking-
>

bdebelius at intelesyscorp.com

2009-Feb-12 21:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Is this the crux of the problem?

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510

''For usb devices, the driver currently ignores DKIOCFLUSHWRITECACHE.
This can cause catastrophic data corruption in the event of power loss,
even for filesystems like ZFS that are designed to survive it.
Dropping a flush-cache command is just as bad as dropping a write.
It violates the interface that software relies on to use the device.''
-- 
This message posted from opensolaris.org

Blake

2009-Feb-12 22:35 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

That does look like the issue being discussed.

It''s a little alarming that the bug was reported against snv54 and is
still not fixed :(

Does anyone know how to push for resolution on this?  USB is pretty
common, like it or not for storage purposes - especially amongst the
laptop-using dev crowd that OpenSolaris apparently targets.



On Thu, Feb 12, 2009 at 4:44 PM, bdebelius at intelesyscorp.com
<bdebelius at intelesyscorp.com> wrote:> Is this the crux of the problem?
>
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510
>
> ''For usb devices, the driver currently ignores
DKIOCFLUSHWRITECACHE.
> This can cause catastrophic data corruption in the event of power loss,
> even for filesystems like ZFS that are designed to survive it.
> Dropping a flush-cache command is just as bad as dropping a write.
> It violates the interface that software relies on to use the
device.''
> --
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

David Dyer-Bennet

2009-Feb-12 22:38 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Thu, February 12, 2009 14:02, Tim wrote:
>
> Right, well I can''t imagine it''s impossible to write a
small app that can
> test whether or not drives are honoring correctly by issuing a commit and
> immediately reading back to see if it was indeed committed or not.  Like a
> "zfs test cXtX".  Of course, then you can''t just blame
the hardware
> everytime something in zfs breaks ;)
>
I can imagine it fairly easily.  All you''ve got to work with is what
the
drive says about itself, and how fast, and the what we''re trying to
test
is whether it lies.  It''s often very hard to catch it out on this sort
of
thing.

We need somebody who really understands the command sets available to send
to modern drives (which is not me) to provide a test they think would
work, and people can argue or try it.  My impression, though, is that the
people with the expertise are so far consistently saying it''s not
possible.   I think at this point somebody who thinks it''s possible
needs
to do the work to at least propose a specific test, or else we have to
give up on the idea.

I''m still hoping for at least some kind of qualification procedure
involving manual intervention (hence not something that could be embodied
in a simple command you just typed), but we''re not seeing even this so
far.

Of course, the other side of this is that, if people "know" that
drives
have these problems, there must in fact be some way to demonstrate it, or
they wouldn''t know.
-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

bdebelius at intelesyscorp.com

2009-Feb-12 22:47 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

I just tried putting a pool on a USB flash drive, writing a file to it, and then
yanking it.  I did not lose any data or the pool, but I had to reboot before I
could get any zpool command to complete without freezing.  I also had OS reboot
once on its own, when I tried to issue a zpool command to the pool.

OS did noticed the disk was yanked until i tried to status it.
-- 
This message posted from opensolaris.org

Bill Sommerfeld

2009-Feb-12 22:57 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Thu, 2009-02-12 at 17:35 -0500, Blake wrote:> That does look like the issue being discussed.
> 
> It''s a little alarming that the bug was reported against snv54 and
is
> still not fixed :(
bugs.opensolaris.org''s information about this bug is out of date.

It was fixed in snv_54:

changeset:   3169:1dea14abfe17
user:        phitran
date:        Sat Nov 25 11:05:17 2006 -0800
files:       usr/src/uts/common/io/scsi/targets/sd.c

6424510 usb ignores DKIOCFLUSHWRITECACHE

						- Bill

Toby Thain

2009-Feb-12 23:03 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 12-Feb-09, at 3:02 PM, Tim wrote:
>
>
> On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet <dd-b at
dd-b.net>
> wrote:
>
> On Thu, February 12, 2009 10:10, Ross wrote:
>
> > Of course, that does assume that devices are being truthful when  
> they say
> > that data has been committed, but a little data loss from badly  
> designed
> > hardware is I feel acceptable, so long as ZFS can have a go at  
> recovering
> > corrupted pools when it does happen, instead of giving up  
> completely like
> > it does now.
>
> Well; not "acceptable" as such.  But I''d agree
it''s outside ZFS''s
> purview.
>  The blame for data lost due to hardware actively lying and not  
> working to
> spec goes to the hardware vendor, not to ZFS.
>
> If ZFS could easily and reliably warn about such hardware I''d want
> it to,
> but the consensus seems to be that we don''t have a reliable  
> qualification
> procedure.  In terms of upselling people to a Sun storage solution,  
> having
> ZFS diagnose problems with their cheap hardware early is clearly  
> desirable
> :-).
>
>
>
> Right, well I can''t imagine it''s impossible to write a
small app
> that can test whether or not drives are honoring correctly by  
> issuing a commit and immediately reading back to see if it was  
> indeed committed or not.
You do realise that this is not as easy as it looks? :) For one  
thing, the drive will simply serve the read from cache.

It''s hard to imagine a test that doesn''t involve literally
pulling
plugs; even better, a purpose built hardware test harness.

Nonetheless I hope that someone comes up with a brilliant test. But  
if the ZFS team hasn''t found one yet... it looks grim :)

--Toby
> Like a "zfs test cXtX".  Of course, then you can''t just
blame the
> hardware everytime something in zfs breaks ;)
>
> --Tim
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090212/78b32510/attachment.html>

Blake

2009-Feb-12 23:21 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

I''m sure it''s very hard to write good error handling code for
hardware
events like this.

I think, after skimming this thread (a pretty wild ride), we can at
least decide that there is an RFE for a recovery tool for zfs -
something to allow us to try to pull data from a failed pool.  That
seems like a reasonable tool to request/work on, no?


On Thu, Feb 12, 2009 at 6:03 PM, Toby Thain <toby at telegraphics.com.au>
wrote:>
> On 12-Feb-09, at 3:02 PM, Tim wrote:
>
>
> On Thu, Feb 12, 2009 at 11:31 AM, David Dyer-Bennet <dd-b at
dd-b.net> wrote:
>>
>> On Thu, February 12, 2009 10:10, Ross wrote:
>>
>> > Of course, that does assume that devices are being truthful when
they
>> > say
>> > that data has been committed, but a little data loss from badly
designed
>> > hardware is I feel acceptable, so long as ZFS can have a go at
>> > recovering
>> > corrupted pools when it does happen, instead of giving up
completely
>> > like
>> > it does now.
>>
>> Well; not "acceptable" as such.  But I''d agree
it''s outside ZFS''s purview.
>>  The blame for data lost due to hardware actively lying and not working
to
>> spec goes to the hardware vendor, not to ZFS.
>>
>> If ZFS could easily and reliably warn about such hardware I''d
want it to,
>> but the consensus seems to be that we don''t have a reliable
qualification
>> procedure.  In terms of upselling people to a Sun storage solution,
having
>> ZFS diagnose problems with their cheap hardware early is clearly
desirable
>> :-).
>>
>
>
> Right, well I can''t imagine it''s impossible to write a
small app that can
> test whether or not drives are honoring correctly by issuing a commit and
> immediately reading back to see if it was indeed committed or not.
>
> You do realise that this is not as easy as it looks? :) For one thing, the
> drive will simply serve the read from cache.
> It''s hard to imagine a test that doesn''t involve
literally pulling plugs;
> even better, a purpose built hardware test harness.
> Nonetheless I hope that someone comes up with a brilliant test. But if the
> ZFS team hasn''t found one yet... it looks grim :)
> --Toby
>
> Like a "zfs test cXtX".  Of course, then you can''t just
blame the hardware
> everytime something in zfs breaks ;)
>
> --Tim
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>

Eric D. Mudama

2009-Feb-13 00:02 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Thu, Feb 12 at 21:45, Mattias Pantzare wrote:>A read of data in the disk cache will be read from the disk cache. You
>can''t tell the disk to ignore its cache and read directly from the
>plater.
>
> The only way to test this is to write and the remove the power from
>the disk. Not easy in software.
Not true with modern SATA drives that support NCQ, as there is a FUA
bit that can be set by the driver on NCQ reads.  If the device
implements the spec, any overlapped write cache data will be flushed,
invalidated, and a fresh read done from the non-volatile media for the
FUA read command.

--eric



-- 
Eric D. Mudama
edmudama at mail.bounceswoosh.org

Dave

2009-Feb-13 00:26 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Blake wrote:> I''m sure it''s very hard to write good error handling code
for hardware
> events like this.
> 
> I think, after skimming this thread (a pretty wild ride), we can at
> least decide that there is an RFE for a recovery tool for zfs -
> something to allow us to try to pull data from a failed pool.  That
> seems like a reasonable tool to request/work on, no?
> 
The ability to force a roll back to an older uberblock in order to be 
able to access the pool (in the case of corrupt current uberblock) 
should be ZFS developer''s very top priority, IMO. I''d offer to
do it
myself, but I have nowhere near the ability to do so.

--
Dave

Toby Thain

2009-Feb-13 00:43 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 12-Feb-09, at 7:02 PM, Eric D. Mudama wrote:
> On Thu, Feb 12 at 21:45, Mattias Pantzare wrote:
>> A read of data in the disk cache will be read from the disk cache.  
>> You
>> can''t tell the disk to ignore its cache and read directly from
the
>> plater.
>>
>> The only way to test this is to write and the remove the power from
>> the disk. Not easy in software.
>
> Not true with modern SATA drives that support NCQ, as there is a FUA
> bit that can be set by the driver on NCQ reads.  If the device
> implements the spec,
^^ Spec compliance is what we''re testing for... We wouldn''t
know if
this special variant is working correctly either. :)

--T
> any overlapped write cache data will be flushed,
> invalidated, and a fresh read done from the non-volatile media for the
> FUA read command.
>
> --eric
>
>
>
> -- 
> Eric D. Mudama
> edmudama at mail.bounceswoosh.org
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Sanjeev

2009-Feb-13 03:37 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Blake,

On Thu, Feb 12, 2009 at 05:35:14PM -0500, Blake wrote:> That does look like the issue being discussed.
> 
> It''s a little alarming that the bug was reported against snv54 and
is
> still not fixed :(
Looks like the bug-report is out of sync.
I see that the bug has been fixed in B54. Here is the link to
source gate which shows that the fix is in the gate :
http://src.opensolaris.org/source/search?q=&defs=&refs=&path=&hist=6424510&project=%2Fonnv

And here are the diffs :
http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/uts/common/io/scsi/targets/sd.c?r2=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403169&r1=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Fio%2Fscsi%2Ftargets%2Fsd.c%403138

Thanks and regards,
Sanjeev.> 
> Does anyone know how to push for resolution on this?  USB is pretty
> common, like it or not for storage purposes - especially amongst the
> laptop-using dev crowd that OpenSolaris apparently targets.
> 
> 
> 
> On Thu, Feb 12, 2009 at 4:44 PM, bdebelius at intelesyscorp.com
> <bdebelius at intelesyscorp.com> wrote:
> > Is this the crux of the problem?
> >
> > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510
> >
> > ''For usb devices, the driver currently ignores
DKIOCFLUSHWRITECACHE.
> > This can cause catastrophic data corruption in the event of power
loss,
> > even for filesystems like ZFS that are designed to survive it.
> > Dropping a flush-cache command is just as bad as dropping a write.
> > It violates the interface that software relies on to use the
device.''
> > --
> > This message posted from opensolaris.org
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
----------------
Sanjeev Bagewadi
Solaris RPE 
Bangalore, India

Uwe Dippel

2009-Feb-13 03:44 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

bcirvin,

you proposed "something to allow us to try to pull data from a failed
pool".
Yes and no. ''Yes'' as a pragmatic solution;
''no'' for what ZFS was ''sold'' to be: the last
filesystem mankind would need. It was conceived as a filesystem that does not
need recovery, due to its guaranteed consistent states on the/any drive - or
better: at any moment. If this was truly the case, a recovery program was not
needed, and I don''t think SUN will like one neither.
It also is more then suboptimal to prevent caching as proposed by others; this
is but a very ugly hack.

Again, and I have yet to receive comments on this, the original poster claimed
to have done a proper flash/sync, and left a 100% consistent file system behind
on his drive. At reboot, the pool, the higher entity, failed miserably.
Of course, now one can conceive a program that scans the whole drive, like in
the good ole days on ancient file systems to recover all those 100% correct file
system(s).
Or, one could - as proposed - add an ?berblock, like we had the FAT-mirror in
the last millennium.

The alternative, and engineering-wise much better solution, would be to diagnose
the weakness on the contextual or semantical level: Where 100% consistent file
systems cannot be communicated to by the operating system. This - so it seems -
is (still) a shortcoming of the concept of ZFS. Which might be solved by means
of yesterday, I agree.
Or, by throwing more work into the level of the volume management, the pools.
Without claiming to have the solution, conceptually I might want to propose to
do away with the static, look-up-table-like structure of the pool, as stored in
a mirror or ?berblock. Could it be feasible to associate pools dynamically?
Could it be feasible, that the filesystems in a pool create a (new) handle once
they are updated in a consistent manner? And when the drive is plugged/turned
on, the software simply collects all the handles of all file systems on that
drive? Then the export/import is possible, but not required any longer, since
the filesystems form their own entities. They can still have associated
contextual/semantic (stored) structures into which they are
''plugged'' once the drive is up; if one wanted to
(''logical volume''). But with or without, the pool would
self-configure when the drive starts by picking up all file system handles.

Uwe
-- 
This message posted from opensolaris.org

Frank Cusack

2009-Feb-13 06:17 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 12, 2009 1:44:34 PM -0800 bdebelius at intelesyscorp.com
wrote:> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424510
...> Dropping a flush-cache command is just as bad as dropping a write.
Not that it matters, but it seems obvious that this is wrong or
anyway an exaggeration.  Dropping a flush-cache just means that you
have to wait until the device is quiesced before the data is consistent.

Dropping a write is much much worse.

-frank

Jiawei Zhao

2009-Feb-13 10:42 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

I am wondering if the usb storage device is not reliable for ZFS usage, can the
situation be improved if I put the intent log on internal sata disk to avoid
corruption and utilize the convenience of usb storage
at the same time?
-- 
This message posted from opensolaris.org

Ross

2009-Feb-13 10:58 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

huh?  but that looses the convenience of USB.

I''ve used USB drives without problems at all, just remember to
"zpool export" them before you unplug.
-- 
This message posted from opensolaris.org

Jiawei Zhao

2009-Feb-13 13:29 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

While mobility could be lost, usb storage still has the advantage of being cheap
and easy to install comparing to install internal disks on pc, so if I just want
to use it to provide zfs storage space for home file server, can a small  intent
log located on internal sata disk prevent the pool corruption caused by a power
cut?
-- 
This message posted from opensolaris.org

Kyle McDonald

2009-Feb-13 15:14 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On 2/13/2009 5:58 AM, Ross wrote:> huh?  but that looses the convenience of USB.
>
> I''ve used USB drives without problems at all, just remember to
"zpool export" them before you unplug.
>    I think there is a subcommand of cfgaadm you should run to to notify 
Solariss that you intend to unplug the device. I don''t use USB, and my 
familiarity with cfgadm (for FC and SCSI) is limited.

   -Kyle

Neil Perrin

2009-Feb-13 16:14 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Having a separate intent log on good hardware will not prevent corruption
on a pool with bad hardware. By "good" I mean hardware that correctly
flush their write caches when requested.

Note, a pool is always consistent (again when using good hardware).
The function of the intent log is not to provide consistency (like a journal),
but to speed up synchronous requests like fsync and O_DSYNC.

Neil.

On 02/13/09 06:29, Jiawei Zhao wrote:> While mobility could be lost, usb storage still has the advantage of being
cheap
> and easy to install comparing to install internal disks on pc, so if I just
want to
> use it to provide zfs storage space for home file server, can a small 
intent log
> located on internal sata disk prevent the pool corruption caused by a power
cut?

Eric D. Mudama

2009-Feb-13 16:53 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13 at  9:14, Neil Perrin wrote:> Having a separate intent log on good hardware will not prevent corruption
> on a pool with bad hardware. By "good" I mean hardware that
correctly
> flush their write caches when requested.
Can someone please name a specific piece of bad hardware?

--eric


-- 
Eric D. Mudama
edmudama at mail.bounceswoosh.org

Eric D. Mudama

2009-Feb-13 17:09 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Thu, Feb 12 at 19:43, Toby Thain wrote:> ^^ Spec compliance is what we''re testing for... We
wouldn''t know if this
> special variant is working correctly either. :)
Time the difference between NCQ reads with and without FUA in the
presence of overlapped cached write data.  That should have a
significant performance penalty, compared to a device servicing the
reads from a volatile buffer cache.

FYI, there are semi-commonly-available power control units that take
serial port or USB as an input, and have a whole bunch of SATA power
connectors on them.  These are the sorts of things that drive vendors
use to bounce power unexpectedly in their testing, if you need to
perform that same validation, it makes sense to invest in that bit of
infrastructure.

Something like this:
http://www.ulinktech.com/products/hw_power_hub.html

or just roll your own in a few days like this guy did for his printer:
http://chezphil.org/slugpower/


It should be pretty trivial to perform a few thousand cached writes,
issue a flush cache ext, and turn off power immediately after that
command completes.  Then go back and figure out how many of those
writes were successfully written as the device claimed.

-- 
Eric D. Mudama
edmudama at mail.bounceswoosh.org

Miles Nordin

2009-Feb-13 17:10 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "gm" == Gary Mills <mills at
cc.umanitoba.ca> writes:
gm> That implies that ZFS will have to detect removable devices
gm> and treat them differently than fixed devices.

please, no more of this garbage, no more hidden unchangeable automatic
condescending behavior. The whole format vs rmformat mess is just
ridiculous. And software and hardware developers alike have both
proven themselves incapable of settling on a definition of
``removeable'''' that fits with actual use-cases like: FC/iSCSI;
hot-swappable SATA; adapters that have removeable sockets on both ends
like USB-to-SD, firewire CD-ROM''s, SATA/SAS port multipliers, and so
on.

As we''ve said many times, if the devices are working properly, then
they can be unplugged uncleanly without corrupting the pool, and
without corrupting any other non-Microsoft filesystem. This is an
old, SOLVED, problem. It''s ridiculous hypocricy to make whole
filesystems DSYNC, to even _invent the possibility for the filesystem
to be DSYNC_, just because it is possible to remove something. Will
you do the same thing because it is possible for your laptop''s battery
to run out? just, STOP! If the devices are broken, the problem is
that they''re broken, not that they''re removeable.

personally, I think everything with a broken write cache should be
black-listed in the kernel and attach read-only by default, whether
it''s a USB bridge or a SATA disk. This will not be perfect because
USB bridges, RAID layers and iSCSI targets, will often hide the
identity of the SATA drive behind them, and of course people will
demand a way to disable it. but if you want to be ``safe'''',
then for
the sake of making the point, THIS is the right way to do it, not muck
around with these overloaded notions of ``removeable''''.

Also, the so-far unacknowledged ``iSCSI/FC Write Hole'''' should
be
fixed so that a copy of all written data is held in the initiator''s
buffer cache until it''s verified as *on the physical platter/NVRAM* so
that it can be replayed if necessary, and SYNC CACHE commands are
allowed to fail far enough that even *things which USE the initiator,
like ZFS* will understand what it means when SYNC CACHE fails, and
bounced connections are handled correctly---otherwise, when
connections bounce or SYNC CACHE returns failure, correctness requires
that the initiator pretend like its plug was pulled and panic. Short
of that the initiator system must forcibly unmount all filesystems
using that device and kill all processes that had files open on those
filesystems.

And sysadmins should have and know how to cleverly use a
tool that tests for both functioning barriers and working SYNC CACHE,
end-to-end.

NO more ``removeable'''' attributes, please! You are just
pretending to
solve a much bigger problem, and making things clumsy and disgusting
in the process.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090213/92ad0204/attachment.bin>

Miles Nordin

2009-Feb-13 17:20 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
    >> Dropping a flush-cache command is just as bad as dropping a
    >> write.

    fc> Not that it matters, but it seems obvious that this is wrong
    fc> or anyway an exaggeration.  Dropping a flush-cache just means
    fc> that you have to wait until the device is quiesced before the
    fc> data is consistent.

    fc> Dropping a write is much much worse.

backwards i think.  Dropping a flush-cache is WORSE than dropping the
flush-cache plus all writes after the flush-cache.  The problem that
causes loss of whole pools rather than loss of recently-written data
isn''t that you''re writing too little.  It''s that
you''re dropping the
barrier and misordering the writes.  consequently you lose *everything
you''ve ever written,* which is much worse than losing some recent
writes, even a lot of them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090213/14252466/attachment.bin>

Miles Nordin

2009-Feb-13 17:22 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "t" == Tim  <tim at tcsac.net> writes:
     t> I would like to believe it has more to do with Solaris''s
     t> support of USB than ZFS, but the fact remains it''s a pretty
     t> glaring deficiency in 2009, no matter which part of the stack
     t> is at fault.

maybe, but for this job I don''t much mind glaring deficiencies, as
long as it''s possible to assemble a working system without resorting
to trial-and-error, and possible to know it''s working before loading
data on it.  Right now, by following the ``best practices'''',
you don''t
know what to buy, and after you receive the hardware you don''t know if
it works until you lose a pool, at which time someone will tell you
``i guess it wasn''t ever working.''''  

Even if you order sun4v or an expensive FC disk shelf, you still don''t
know if it works.

(though, I''m starting to suspect, ni the case of FC or iSCSI the
 answer is always ``it does not work'''')

The only thing you know for sure is, if you lose a pool, someone will
blame it on hardware bugs surroudning cache flushes, or else try to
conflate the issue with a bunch of inapplicable garbage about
checksums and wire corruption.  This is unworkable.

I''m not saying glaring 2009 deficiencies are irrelevant---on my laptop
I do mind because I got out of a multi-year abusive relationship with
NetBSD/hpcmips, and now want all parts of my laptop to have drivers.
And I guess it applies to that neat timeslider / home-base--USB-disk
case we were talking about a month ago.  but for what I''m doing I will
actually accept the advice ``do not ever put ZFS on USB because ZFS is
a canary in the mine of USB bugs''''---it''s just, that
advice is not
really good enough to settle the whole issue.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090213/12016a5b/attachment.bin>

Miles Nordin

2009-Feb-13 17:41 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
fc> if you have 100TB of data, wouldn''t you have a completely
fc> redundant storage network

If you work for a ponderous leaf-eating brontosorous maybe. If your
company is modern I think having such an oddly large amount of data in
one pool means you''d more likely have 70 whitebox peecees using
motherboard ethernet/sata only, connected to a mesh of unmanaged L2
switches (of some peculiar brand that happens to work well.) There
will always be one or two peecees switched off, and constantly
something will be resilvering. The home user case is not really just
for home users. I think a lot of people are tired of paying quadruple
for stuff that still breaks, even serious people.

fc> Isn''t this easily worked around by having UPS power in
fc> addition to whatever the data center supplies?

In NYC over the last five years the power has been more reliable going
into my UPS than coming out of it. The main reason for having a UPS
is wiring maintenance. And the most important part of the UPS is the
externally-mounted bypass switch because the UPS also needs
maintenance. UPS has never _solved_ anything, it always just helps.
so in the end we have to count on the software''s graceful behavior,
not on absolutes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090213/3a36c742/attachment.bin>

Greg Palmer

2009-Feb-13 17:45 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Miles Nordin wrote:>     gm> That implies that ZFS will have to detect removable devices
>     gm> and treat them differently than fixed devices.
>
> please, no more of this garbage, no more hidden unchangeable automatic
> condescending behavior.  The whole format vs rmformat mess is just
> ridiculous.  And software and hardware developers alike have both
> proven themselves incapable of settling on a definition of
> ``removeable'''' that fits with actual use-cases like:
FC/iSCSI;
> hot-swappable SATA; adapters that have removeable sockets on both ends
> like USB-to-SD, firewire CD-ROM''s, SATA/SAS port multipliers, and
so
> on.
>   Since this discussion is taking place in the context of someone removing 
a USB stick I think you''re confusing the issue by dragging in other 
technologies. Let''s keep this in the context of the posts preceding it 
which is how USB devices are treated. I would argue that one of the 
first design goals in an environment where you can expect people who are 
not computer professionals to be interfacing with computers is to make 
sure that the appropriate safeties are in place and that the system does 
not behave in a manner which a reasonable person might find unexpected.

This is common practice for any sort of professional engineering effort. 
As an example, you aren''t going to go out there and find yourself a 
chainsaw being sold new without a guard. It might be removable, but the 
default is to include it. Why? Well because there is a considerable 
chance of damage to the user without it. Likewise with a file system on 
a device which might cache a data write for as long as thirty seconds 
while being easily removable. In this case, the user may write the file 
and seconds later remove the device. Many folks out there behave in this 
manner.

It really doesn''t matter to them that they have a copy of the last save
they did two hours ago, what they want and expect is that the most 
recent data they saved actually be on the USB stick for the to retrieve. 
What you are suggesting is that it is better to lose that data when it 
could have been avoided. I would personally suggest that it is better to 
have default behavior which is not surprising along with more advanced 
behavior for those who have bothered to read the manual. In Windows 
case, the write cache can be turned on, it is not "unchangeable" and 
those who have educated themselves use it. I seldom turn it on unless 
I''m doing heavy I/O to a USB hard drive, otherwise the performance 
difference is just not that great.

Regards,
  Greg

Frank Cusack

2009-Feb-13 17:54 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 13, 2009 12:20:21 PM -0500 Miles Nordin <carton at Ivy.NET>
wrote:>>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
>
>     >> Dropping a flush-cache command is just as bad as dropping a
>     >> write.
>
>     fc> Not that it matters, but it seems obvious that this is wrong
>     fc> or anyway an exaggeration.  Dropping a flush-cache just means
>     fc> that you have to wait until the device is quiesced before the
>     fc> data is consistent.
>
>     fc> Dropping a write is much much worse.
>
> backwards i think.  Dropping a flush-cache is WORSE than dropping the
> flush-cache plus all writes after the flush-cache.  The problem that
> causes loss of whole pools rather than loss of recently-written data
> isn''t that you''re writing too little.  It''s that
you''re dropping the
> barrier and misordering the writes.  consequently you lose *everything
> you''ve ever written,* which is much worse than losing some recent
> writes, even a lot of them.
Who said dropping a flush-cache means dropping any subsequent writes,
or misordering writes?  If you''re misordering writes isn''t
that a
completely different problem?  Even then, I don''t see how it''s
worse
than DROPPING a write.  The data eventually gets to disk, and at that
point in time, the disk is consistent.  When dropping a write, the data
never makes it to disk, ever.

In the face of a power loss, of course these result in the same problem,
but even without a power loss the drop of a write is "catastrophic".

-frank

Frank Cusack

2009-Feb-13 17:54 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 13, 2009 12:10:08 PM -0500 Miles Nordin <carton at Ivy.NET>
wrote:> please, no more of this garbage, no more hidden unchangeable automatic
> condescending behavior.  The whole format vs rmformat mess is just
> ridiculous.
thank you.

Frank Cusack

2009-Feb-13 18:07 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 13, 2009 12:41:12 PM -0500 Miles Nordin <carton at Ivy.NET>
wrote:
>>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
>
>     fc> if you have 100TB of data, wouldn''t you have a
completely
>     fc> redundant storage network
>
> If you work for a ponderous leaf-eating brontosorous maybe.  If your
> company is modern I think having such an oddly large amount of data in
> one pool means you''d more likely have 70 whitebox peecees using
> motherboard ethernet/sata only, connected to a mesh of unmanaged L2
> switches (of some peculiar brand that happens to work well.)  There
> will always be one or two peecees switched off, and constantly
> something will be resilvering.  The home user case is not really just
> for home users.  I think a lot of people are tired of paying quadruple
> for stuff that still breaks, even serious people.
oh i dunno.  i recently worked for a company that practically defines
modern and we had multiples of 100TB of data.  Like you said, not all
in one place, but any given piece was fully redundant (well, if you
count RAID-5 as "fully" ... but I''m really referring to the
infrastructure).

I can''t imagine it any other way ... the cost of not having redundancy
in the face of a failure is so much higher compared to the cost of
building in that redundancy.

Also I''m not sure how you get 1 pool with more than 1 peecee as zfs is
not a cluster fs.  So what you are talking about is multiple pools,
and in that case if you do lose one (not redundant for whatever reason)
you only have to restore a fraction of the 100TB from backup.
>     fc> Isn''t this easily worked around by having UPS power in
>     fc> addition to whatever the data center supplies?
>
> In NYC over the last five years the power has been more reliable going
> into my UPS than coming out of it.  The main reason for having a UPS
> is wiring maintenance.  And the most important part of the UPS is the
> externally-mounted bypass switch because the UPS also needs
> maintenance.  UPS has never _solved_ anything, it always just helps.
> so in the end we have to count on the software''s graceful
behavior,
> not on absolutes.
I can''t say I agree about the UPS, however I''ve already been
pretty
forthright that UPS, etc. isn''t the answer to the problem, just a
mitigating factor to the root problem.

-frank

Dick Hoogendijk

2009-Feb-13 18:09 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009 17:53:00 +0100, Eric D. Mudama  
<edmudama at bounceswoosh.org> wrote:
> On Fri, Feb 13 at  9:14, Neil Perrin wrote:
>> Having a separate intent log on good hardware will not prevent  
>> corruption
>> on a pool with bad hardware. By "good" I mean hardware that
correctly
>> flush their write caches when requested.
>
> Can someone please name a specific piece of bad hardware?
Or better still, name a few -GOOD- ones.

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv107++
+ All that''s really worth doing is what we do for others (Lewis Carrol)

Miles Nordin

2009-Feb-13 18:10 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
fc> If you''re misordering writes
fc> isn''t that a completely different problem?

no. ignoring the flush cache command causes writes to be misordered.

fc> Even then, I don''t see how it''s worse than DROPPING
a write.
fc> The data eventually gets to disk, and at that point in time,
fc> the disk is consistent. When dropping a write, the data never
fc> makes it to disk, ever.

If you drop the flush cache command and every write after the flush
cache command, yeah yeah it''s bad, but in THAT case, the disk is still
always consistent because no writes have been misordered.

fc> In the face of a power loss, of course these result in the
fc> same problem,

no, it''s completely different in a power loss, which is exactly the
point.

If you pull the cord while the disk is inconsistent, you may lose the
entire pool. If the disk is never inconsistent because you''ve never
misordered writes, you will only lose recent write activity. Losing
everything you''ve ever written is usually much worse than losing what
you''ve written recently.

yeah yeah some devil''s advocate will toss in, ``i *need* some
consistency promises or else it''s better that the pool its hand and
say `broken, restore backup please'' even if the hand-raising comes in
the form of losing the entire pool,'''' well in that case
neither one is
acceptable. But if your requirements are looser, then dropping a
flush cache command plus every write after the flush cache command is
much better than just ignoring the flush cache command. of course,
that is a weird kind of failure that never happens. I described it
just to make a point, to argue against this overly-simple idea ``every
write is precious. let''s do them as soon as possible because there
could be Valuable Business Data inside the writes! we don''t want to
lose anything Valuable!'''' The part of SYNC CACHE
that''s causing
people to lose entire pools isn''t the ``hurry up! write
faster!''''
part of the command, such that without it you still get your precious
writes, just a little slower. NO. It''s the ``control the order of
writes'''' part that''s important for integrity on a
single-device vdev.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090213/d9427a4c/attachment.bin>

Frank Cusack

2009-Feb-13 18:29 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 13, 2009 1:10:55 PM -0500 Miles Nordin <carton at Ivy.NET>
wrote:>>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
>
>     fc> If you''re misordering writes
>     fc> isn''t that a completely different problem?
>
> no.  ignoring the flush cache command causes writes to be misordered.
oh.  can you supply a reference or if you have the time, some more
explanation?  (or can someone else confirm this.)

my understanding (weak, admittedly) is that drives will reorder writes
on their own, and this is generally considered normal behavior.  so
to guarantee consistency *in the face of some kind of failure like a
power loss*, we have write barriers.  flush-cache is a stronger kind
of write barrier.

now that i think more, i suppose yes if you ignore the flush cache,
then writes before and after the flush cache could be misordered,
however it''s the same as if there were no flush cache at all, and
again as long as the drive has power and you can quiesce it then
the data makes it to disk, and all is consistent and well.  yes?

whereas if you drop a write, well it''s gone off into a black hole.
>     fc> Even then, I don''t see how it''s worse than
DROPPING a write.
>     fc> The data eventually gets to disk, and at that point in time,
>     fc> the disk is consistent.  When dropping a write, the data never
>     fc> makes it to disk, ever.
>
> If you drop the flush cache command and every write after the flush
> cache command, yeah yeah it''s bad, but in THAT case, the disk is
still
> always consistent because no writes have been misordered.
why would dropping a flush cache imply dropping every write after the
flush cache?
>     fc> In the face of a power loss, of course these result in the
>     fc> same problem,
>
> no, it''s completely different in a power loss, which is exactly
the point.
>
> If you pull the cord while the disk is inconsistent, you may lose the
> entire pool.  If the disk is never inconsistent because you''ve
never
> misordered writes, you will only lose recent write activity.  Losing
> everything you''ve ever written is usually much worse than losing
what
> you''ve written recently.
yeah, as soon as i wrote that i realized my error, so thank you and i
agree on that point.  *in the event of a power loss* being inconsistent
is a worse problem.

-frank

Frank Cusack

2009-Feb-13 18:33 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 13, 2009 10:29:05 AM -0800 Frank Cusack <fcusack at
fcusack.com>
wrote:> On February 13, 2009 1:10:55 PM -0500 Miles Nordin <carton at
Ivy.NET> wrote:
>>>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
>>
>>     fc> If you''re misordering writes
>>     fc> isn''t that a completely different problem?
>>
>> no.  ignoring the flush cache command causes writes to be misordered.
>
> oh.  can you supply a reference or if you have the time, some more
> explanation?  (or can someone else confirm this.)
uhh ... that question can be ignored as i answered it myself below.
sorry if i''m must being noisy now.
> my understanding (weak, admittedly) is that drives will reorder writes
> on their own, and this is generally considered normal behavior.  so
> to guarantee consistency *in the face of some kind of failure like a
> power loss*, we have write barriers.  flush-cache is a stronger kind
> of write barrier.
>
> now that i think more, i suppose yes if you ignore the flush cache,
> then writes before and after the flush cache could be misordered,
> however it''s the same as if there were no flush cache at all, and
> again as long as the drive has power and you can quiesce it then
> the data makes it to disk, and all is consistent and well.  yes?
-frank

Miles Nordin

2009-Feb-13 19:10 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

>>>>> "fc" == Frank Cusack <fcusack at
fcusack.com> writes:
    fc> why would dropping a flush cache imply dropping every write
    fc> after the flush cache?

it wouldn''t and probably never does.  It was an imaginary scenario
invented to argue with you and to agree with the guy in the USB bug
who said ``dropping a cache flush command is as bad as dropping a
write.''''

    fc> oh.  can you supply a reference or if you have the time, some
    fc> more explanation?  (or can someone else confirm this.)

I posted something long a few days ago that I need to revisit.  The
problem is, I don''t actually understand how the disk commands work, so
I was talking out my ass.  Although I kept saying, ``I''m not sure it
actually works this way,'''' my saying so doesn''t help
anyone who spends
the time to read it and then gets a bunch of mistaken garbage stuck in
his head, which people who actually recognize as garbage are too busy
to correct.  It''d be better for everyone if I didn''t do that.

On the other hand, I think there''s some worth to dreaming up several
possibilities of what I fantisize the various commands might mean or
do, rather than simply reading one of the specs to get the one right
answer, because from what people in here say it soudns as though
implementors of actual systems based on the SCSI commandset live in
this same imaginary world of fantastic and multiple realities without
any meaningful review or accountability that I do.  (disks, bridges,
iSCSI targets and initiators, VMWare/VBox storage, ...)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090213/c0f56fd4/attachment.bin>

Ross

2009-Feb-13 19:18 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Superb news, thanks Jeff.

Having that will really raise ZFS up a notch, and align it much better with
peoples expectations.  I assume it''ll work via zpool import, and let
the user know what''s gone wrong?

If you think back to this case, imagine how different the users response would
have been if instead of being unable to mount the pool, ZFS had turned around
and said:

"This pool was not unmounted cleanly, and data has been lost.  Do you want
to restore your pool to the last viable state: (timestamp goes here)?"

Something like that will have people praising ZFS'' ability to safeguard
their data, and the way it recovers even after system crashes or when hardware
has gone wrong.  You could even have a "common causes of this are..."
message, or a link to an online help article if you wanted people to be really
impressed.
-- 
This message posted from opensolaris.org

Bob Friesenhahn

2009-Feb-13 19:41 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009, Ross wrote:>
> Something like that will have people praising ZFS'' ability to 
> safeguard their data, and the way it recovers even after system 
> crashes or when hardware has gone wrong.  You could even have a 
> "common causes of this are..." message, or a link to an online
help
> article if you wanted people to be really impressed.
I see a career in politics for you.  Barring an operating system 
implementation bug, the type of problem you are talking about is due 
to improperly working hardware.  Irreversibly reverting to a previous 
checkpoint may or may not obtain the correct data.  Perhaps it will 
produce a bunch of checksum errors.

There are already people praising ZFS'' ability to safeguard their 
data, and the way it recovers even after system crashes or when 
hardware has gone wrong.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Nicolas Williams

2009-Feb-13 20:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13, 2009 at 10:29:05AM -0800, Frank Cusack
wrote:> On February 13, 2009 1:10:55 PM -0500 Miles Nordin <carton at
Ivy.NET> wrote:
> >>>>>>"fc" == Frank Cusack <fcusack at
fcusack.com> writes:
> >
> >    fc> If you''re misordering writes
> >    fc> isn''t that a completely different problem?
> >
> >no.  ignoring the flush cache command causes writes to be misordered.
> 
> oh.  can you supply a reference or if you have the time, some more
> explanation?  (or can someone else confirm this.)
Ordering matters for atomic operations, and filesystems are full of
those.

Now, if ordering is broken but the writes all eventually hit the disk
then no one will notice.  But if power failures and/or partitions
(cables get pulled, network partitions occur affecting an iSCSI
connection, ...) then bad things happen.

For ZFS the easiest way to ameliorate this is the txg fallback fix that
Jeff Bonwick has said is now a priority.  And if ZFS guarantees no block
re-use until N txgs pass after a block is freed, then the fallback can
be of up to N txgs, which gives you a decent chance that you''ll recover
your pool in the face of buggy devices, but for each discarded txg you
lose that transaction''s writes, you lose data incrementally.  (The
larger N is the better your chance that the oldest of the last N txg''s
writes will all hit the disk in spite of the disk''s lousy cache
behaviors.)

The next question is how to do the fallback, UI-wise.  Should it ever be
automatic?  A pool option for that would be nice (I''d use it on all-USB
pools).  If/when not automatic, how should the user/admin be informed of
the failure to open the pool and the option to fallback on an older txg
(with data loss)?  (For non-removable pools imported at boot time the
answer is that the service will fail, causing sulogin to be invoked so
you can fix the problem on console.  For removable pools there should be
a GUI.)

Nico
--

Ross Smith

2009-Feb-13 20:17 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13, 2009 at 7:41 PM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Fri, 13 Feb 2009, Ross wrote:
>>
>> Something like that will have people praising ZFS'' ability to
safeguard
>> their data, and the way it recovers even after system crashes or when
>> hardware has gone wrong.  You could even have a "common causes of
this
>> are..." message, or a link to an online help article if you wanted
people to
>> be really impressed.
>
> I see a career in politics for you.  Barring an operating system
> implementation bug, the type of problem you are talking about is due to
> improperly working hardware.  Irreversibly reverting to a previous
> checkpoint may or may not obtain the correct data.  Perhaps it will produce
> a bunch of checksum errors.
Yes, the root cause is improperly working hardware (or an OS bug like
6424510), but with ZFS being a copy on write system, when errors occur
with a recent write, for the vast majority of the pools out there you
still have huge amounts of data that is still perfectly valid and
should be accessible.  Unless I''m misunderstanding something,
reverting to a previous checkpoint gets you back to a state where ZFS
knows it''s good (or at least where ZFS can verify whether it''s
good or
not).

You have to consider that even with improperly working hardware, ZFS
has been checksumming data, so if that hardware has been working for
any length of time, you *know* that the data on it is good.

Yes, if you have databases or files there that were mid-write, they
will almost certainly be corrupted.  But at least your filesystem is
back, and it''s in as good a state as it''s going to be given
that in
order for your pool to be in this position, your hardware went wrong
mid-write.

And as an added bonus, if you''re using ZFS snapshots, now your pool is
accessible, you have a bunch of backups available so you can probably
roll corrupted files back to working versions.

For me, that is about as good as you can get in terms of handling a
sudden hardware failure.  Everything that is known to be saved to disk
is there, you can verify (with absolute certainty) whether data is ok
or not, and you have backup copies of damaged files.  In the old days
you''d need to be reverting to tape backups for both of these, with
potentially hours of downtime before you even know where you are.
Achieving that in a few seconds (or minutes) is a massive step
forwards.
> There are already people praising ZFS'' ability to safeguard their
data, and
> the way it recovers even after system crashes or when hardware has gone
> wrong.
Yes there are, but the majority of these are praising the ability of
ZFS checksums to detect bad data, and to repair it when you have
redundancy in your pool.  I''ve not seen that many cases of people
praising ZFS'' recovery ability - uberblock problems seem to have a
nasty habit of leaving you with tons of good, checksummed data on a
pool that you can''t get to, and while many hardware problems are dealt
with, others can hang your entire pool.

>
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
>

David Collier-Brown

2009-Feb-13 20:23 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Bob Friesenhahn wrote:> On Fri, 13 Feb 2009, Ross wrote:
>>
>> Something like that will have people praising ZFS'' ability to
>> safeguard their data, and the way it recovers even after system
>> crashes or when hardware has gone wrong.  You could even have a
>> "common causes of this are..." message, or a link to an
online help
>> article if you wanted people to be really impressed.
> 
> I see a career in politics for you.  Barring an operating system
> implementation bug, the type of problem you are talking about is due to
> improperly working hardware.  Irreversibly reverting to a previous
> checkpoint may or may not obtain the correct data.  Perhaps it will
> produce a bunch of checksum errors.
Actually that''s a lot like FMA replies when it sees a problem,
telling the person what happened and pointing them to a web page
which can be updated with the newest information on the problem.

That''s a good spot for "This pool was not unmounted cleanly due
to a hardware fault and data has been lost.  The "<name of
timestamp>"
line contains the date which can be recovered to.  Use the command
  # zfs reframbulocate <this> <that> -t <timestamp>
to revert to <timestamp>

--dave
-- 
David Collier-Brown            | Always do right. This will gratify
Sun Microsystems, Toronto      | some people and astonish the rest
davecb at sun.com                 |                      -- Mark Twain
cell: (647) 833-9377, bridge: (877) 385-4099 code: 506 9191#

Bob Friesenhahn

2009-Feb-13 20:24 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009, Ross Smith wrote:>
> You have to consider that even with improperly working hardware, ZFS
> has been checksumming data, so if that hardware has been working for
> any length of time, you *know* that the data on it is good.
You only know this if the data has previously been read.

Assume that the device temporarily stops pysically writing, but 
otherwise responds normally to ZFS.  Then the device starts writing 
again (including a recent uberblock), but with a large gap in the 
writes.  Then the system loses power, or crashes.  What happens then?

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ross Smith

2009-Feb-13 20:34 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>> You have to consider that even with improperly working hardware, ZFS
>> has been checksumming data, so if that hardware has been working for
>> any length of time, you *know* that the data on it is good.
>
> You only know this if the data has previously been read.
>
> Assume that the device temporarily stops pysically writing, but otherwise
> responds normally to ZFS.  Then the device starts writing again (including
a
> recent uberblock), but with a large gap in the writes.  Then the system
> loses power, or crashes.  What happens then?
Well in that case you''re screwed, but if ZFS is known to handle even
corrupted pools automatically, when that happens the immediate
response on the forums is going to be "something really bad has
happened to your hardware", followed by troubleshooting to find out
what.  Instead of the response now, where we all know there''s every
chance the data is ok, and just can''t be gotten to without zdb.

Also, that''s a pretty extreme situation since you''d need a
device that
is being written to but not read from to fail in this exact way.  It
also needs to have no scrubbing being run, so the problem has remained
undetected.

However, even in that situation, if we assume that it happened and
that these recovery tools are available, ZFS will either report that
your pool is seriously corrupted, indicating a major hardware problem
(and ZFS can now state this with some confidence), or ZFS will be able
to open a previous uberblock, mount your pool and begin a scrub, at
which point all your missing writes will be found too and reported.

And then you can go back to your snapshots.  :-D

>
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
>

Richard Elling

2009-Feb-13 20:47 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Greg Palmer wrote:> Miles Nordin wrote:
>> gm> That implies that ZFS will have to detect removable devices
>> gm> and treat them differently than fixed devices.
>>
>> please, no more of this garbage, no more hidden unchangeable automatic
>> condescending behavior. The whole format vs rmformat mess is just
>> ridiculous. And software and hardware developers alike have both
>> proven themselves incapable of settling on a definition of
>> ``removeable'''' that fits with actual use-cases like:
FC/iSCSI;
>> hot-swappable SATA; adapters that have removeable sockets on both ends
>> like USB-to-SD, firewire CD-ROM''s, SATA/SAS port multipliers,
and so
>> on.
> Since this discussion is taking place in the context of someone 
> removing a USB stick I think you''re confusing the issue by
dragging in
> other technologies. Let''s keep this in the context of the posts 
> preceding it which is how USB devices are treated. I would argue that 
> one of the first design goals in an environment where you can expect 
> people who are not computer professionals to be interfacing with 
> computers is to make sure that the appropriate safeties are in place 
> and that the system does not behave in a manner which a reasonable 
> person might find unexpected.
It has been my experience that USB sticks use FAT, which is an ancient
file system which contains few of the features you expect from modern
file systems. As such, it really doesn''t do any write caching. Hence,
it
seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs,
nor many of the other, high performance file systems are used by default
for USB devices. Could it be that anyone not using FAT for USB devices
is straining against architectural limits?
-- richard

Ross Smith

2009-Feb-13 20:49 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13, 2009 at 8:24 PM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>> You have to consider that even with improperly working hardware, ZFS
>> has been checksumming data, so if that hardware has been working for
>> any length of time, you *know* that the data on it is good.
>
> You only know this if the data has previously been read.
>
> Assume that the device temporarily stops pysically writing, but otherwise
> responds normally to ZFS.  Then the device starts writing again (including
a
> recent uberblock), but with a large gap in the writes.  Then the system
> loses power, or crashes.  What happens then?
Hey Bob,

Thinking about this a bit more, you''ve given me an idea:  Would it be
worth ZFS occasionally reading previous uberblocks from the pool, just
to check they are there and working ok?

I wonder if you could do this after a few uberblocks have been
written.  It would seem to be a good way of catching devices that
aren''t writing correctly early on, as well as a way of guaranteeing
that previous uberblocks are available to roll back to should a write
go wrong.

I wonder what the upper limits for this kind of write failure is going
to be.  I''ve seen 30 second delays mentioned in this thread.  How
often are uberblocks written?  Is there any guarantee that we''ll
always have more than 30 seconds worth of uberblocks on a drive?
Should ZFS be set so that it keeps either a given number of
uberblocks, or 5 minutes worth of uberblocks, whichever is the larger?

Ross

Bob Friesenhahn

2009-Feb-13 20:57 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009, Ross Smith wrote:>
> Also, that''s a pretty extreme situation since you''d need
a device that
> is being written to but not read from to fail in this exact way.  It
> also needs to have no scrubbing being run, so the problem has remained
> undetected.
On systems with a lot of RAM, 100% write is a pretty common situation 
since reads are often against data which are already cached in RAM. 
This is common when doing bulk data copies from one device to another 
(e.g. a backup from an "internal" pool to a USB-based pool) since the 
necessary filesystem information for the destination filesystem can be 
cached in memory for quick access rather than going to disk.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Bob Friesenhahn

2009-Feb-13 20:59 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009, Ross Smith wrote:
> Thinking about this a bit more, you''ve given me an idea:  Would it
be
> worth ZFS occasionally reading previous uberblocks from the pool, just
> to check they are there and working ok?
That sounds like a good idea.  However, how do you know for sure that 
the data returned is not returned from a volatile cache?  If the 
hardware is ignoring cache flush requests, then any data returned may 
be from a volatile cache.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Nicolas Williams

2009-Feb-13 21:09 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13, 2009 at 02:00:28PM -0600, Nicolas Williams
wrote:> Ordering matters for atomic operations, and filesystems are full of
> those.
Also, note that ignoring barriers is effectively as bad as dropping
writes if there''s any chance that some writes will never hit the disk
because of, say, power failures.  Imagine 100 txgs, but some writes from
the first txg never hitting the disk because the drive keeps them in the
cache without flushing them for too long, then you pull out the disk, or
power fails -- in that case not even fallback to older txgs will help
you, there''d be nothing that ZFS could do to help you.

Of course, presumably even with most lousy drives you''d still have to
be
quite unlucky to lose writes written more than N txgs ago, for some
value of N.  But the point stands; what you lose will be a matter of
chance (and it could well be whole datasets) given the kinds of devices
we''ve been discussing.

Nico
--

Ian Collins

2009-Feb-13 21:58 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Richard Elling wrote:> Greg Palmer wrote:
>> Miles Nordin wrote:
>>> gm> That implies that ZFS will have to detect removable devices
>>> gm> and treat them differently than fixed devices.
>>>
>>> please, no more of this garbage, no more hidden unchangeable
automatic
>>> condescending behavior. The whole format vs rmformat mess is just
>>> ridiculous. And software and hardware developers alike have both
>>> proven themselves incapable of settling on a definition of
>>> ``removeable'''' that fits with actual use-cases
like: FC/iSCSI;
>>> hot-swappable SATA; adapters that have removeable sockets on both
ends
>>> like USB-to-SD, firewire CD-ROM''s, SATA/SAS port
multipliers, and so
>>> on.
>> Since this discussion is taking place in the context of someone 
>> removing a USB stick I think you''re confusing the issue by
dragging
>> in other technologies. Let''s keep this in the context of the
posts
>> preceding it which is how USB devices are treated. I would argue that 
>> one of the first design goals in an environment where you can expect 
>> people who are not computer professionals to be interfacing with 
>> computers is to make sure that the appropriate safeties are in place 
>> and that the system does not behave in a manner which a reasonable 
>> person might find unexpected.
>
> It has been my experience that USB sticks use FAT, which is an ancient
> file system which contains few of the features you expect from modern
> file systems. As such, it really doesn''t do any write caching.
Hence, it
> seems to work ok for casual users. I note that neither NTFS, ZFS, 
> reiserfs,
> nor many of the other, high performance file systems are used by default
> for USB devices. Could it be that anyone not using FAT for USB devices
> is straining against architectural limits?
I''d follow that up by saying that those of us who do use something
other
that FAT with USB devices have a reasonable understanding of the 
limitations of those devices.

Using ZFS is non-trivial from a typical user''s perspective.  The device
has to be identified and the pool created.  When a USB device is 
connected, the pool has to be manually imported before it can be used.  
Import/export could be fully integrated with gnome.  Once that is in 
place, using a ZFS formatted USB stick should be just as "safe" as a
FAT
formatted one.

-- 
Ian.

Ross Smith

2009-Feb-13 22:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

You don''t, but that''s why I was wondering about time limits. 
You have
to have a cut off somewhere, but if you''re checking the last few
minutes of uberblocks that really should cope with a lot.  It seems
like a simple enough thing to implement, and if a pool still gets
corrupted with these checks in place, you can absolutely, positively
blame it on the hardware.  :D

However, I''ve just had another idea.  Since the uberblocks are pretty
vital in recovering a pool, and I believe it''s a fair bit of work to
search the disk to find them.  Might it be a good idea to allow ZFS to
store uberblock locations elsewhere for recovery purposes?

This could be as simple as a USB stick plugged into the server, a
separate drive, or a network server.  I guess even the ZIL device
would work if it''s separate hardware.  But knowing the locations of
the uberblocks would save yet more time should recovery be needed.

On Fri, Feb 13, 2009 at 8:59 PM, Bob Friesenhahn
<bfriesen at simple.dallas.tx.us> wrote:> On Fri, 13 Feb 2009, Ross Smith wrote:
>
>> Thinking about this a bit more, you''ve given me an idea: 
Would it be
>> worth ZFS occasionally reading previous uberblocks from the pool, just
>> to check they are there and working ok?
>
> That sounds like a good idea.  However, how do you know for sure that the
> data returned is not returned from a volatile cache?  If the hardware is
> ignoring cache flush requests, then any data returned may be from a
volatile
> cache.
>
> Bob
> =====================================> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
>
>

Greg Palmer

2009-Feb-13 22:04 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Richard Elling wrote:> Greg Palmer wrote:
>> Miles Nordin wrote:
>>> gm> That implies that ZFS will have to detect removable devices
>>> gm> and treat them differently than fixed devices.
>>>
>>> please, no more of this garbage, no more hidden unchangeable
automatic
>>> condescending behavior. The whole format vs rmformat mess is just
>>> ridiculous. And software and hardware developers alike have both
>>> proven themselves incapable of settling on a definition of
>>> ``removeable'''' that fits with actual use-cases
like: FC/iSCSI;
>>> hot-swappable SATA; adapters that have removeable sockets on both
ends
>>> like USB-to-SD, firewire CD-ROM''s, SATA/SAS port
multipliers, and so
>>> on.
>> Since this discussion is taking place in the context of someone 
>> removing a USB stick I think you''re confusing the issue by
dragging
>> in other technologies. Let''s keep this in the context of the
posts
>> preceding it which is how USB devices are treated. I would argue that 
>> one of the first design goals in an environment where you can expect 
>> people who are not computer professionals to be interfacing with 
>> computers is to make sure that the appropriate safeties are in place 
>> and that the system does not behave in a manner which a reasonable 
>> person might find unexpected.
>
> It has been my experience that USB sticks use FAT, which is an ancient
> file system which contains few of the features you expect from modern
> file systems. As such, it really doesn''t do any write caching.
Hence, it
> seems to work ok for casual users. I note that neither NTFS, ZFS, 
> reiserfs,
> nor many of the other, high performance file systems are used by default
> for USB devices. Could it be that anyone not using FAT for USB devices
> is straining against architectural limits?
> -- richardThe default disabling of caching with Windows I mentioned is the same 
for either FAT or NTFS file systems. My personal guess would be that 
it''s purely an effort to prevent software errors in the interface 
between the chair and keyboard. :-) I think a lot of users got trained 
in how to use a floppy disc and once they were trained, when they 
encountered the USB stick, they continued to treat it as an instance of 
the floppy class. This rubbed off on those around them. I can''t tell
you
how many users have given me a blank stare and told me "But the light 
was out" when I saw them yank a USB stick out and mentioned it was a bad 
idea.

Regards,
  Greg

Bob Friesenhahn

2009-Feb-13 22:21 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009, Ross Smith wrote:
> However, I''ve just had another idea.  Since the uberblocks are
pretty
> vital in recovering a pool, and I believe it''s a fair bit of work
to
> search the disk to find them.  Might it be a good idea to allow ZFS to
> store uberblock locations elsewhere for recovery purposes?
Perhaps it is best to leave decisions on these issues to the ZFS 
designers who know how things work.

Previous descriptions from people who do know how things work didn''t 
make it sound very difficult to find the last 20 uberblocks.  It 
sounded like they were at known points for any given pool.

Those folks have surely tired of this discussion by now and are 
working on actual code rather than reading idle discussion between 
several people who don''t know the details of how things work.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Tim

2009-Feb-13 22:43 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn <
bfriesen at simple.dallas.tx.us> wrote:
> On Fri, 13 Feb 2009, Ross Smith wrote:
>
>  However, I''ve just had another idea.  Since the uberblocks are
pretty
>> vital in recovering a pool, and I believe it''s a fair bit of
work to
>> search the disk to find them.  Might it be a good idea to allow ZFS to
>> store uberblock locations elsewhere for recovery purposes?
>>
>
> Perhaps it is best to leave decisions on these issues to the ZFS designers
> who know how things work.
>
> Previous descriptions from people who do know how things work
didn''t make
> it sound very difficult to find the last 20 uberblocks.  It sounded like
> they were at known points for any given pool.
>
> Those folks have surely tired of this discussion by now and are working on
> actual code rather than reading idle discussion between several people who
> don''t know the details of how things work.
>

People who "don''t know how things work" often aren''t
tied down by the
baggage of knowing how things work.  Which leads to creative solutions those
who are weighed down didn''t think of.  I don''t think it hurts
in the least
to throw out some ideas.  If they aren''t valid, it''s not hard
to ignore them
and move on.  It surely isn''t a waste of anyone''s time to
spend 5 minutes
reading a response and weighing if the idea is valid or not.

--Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090213/ca555f98/attachment.html>

Richard Elling

2009-Feb-13 23:09 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Tim wrote:>
>
> On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn 
> <bfriesen at simple.dallas.tx.us <mailto:bfriesen at
simple.dallas.tx.us>>
> wrote:
>
>     On Fri, 13 Feb 2009, Ross Smith wrote:
>
>         However, I''ve just had another idea.  Since the uberblocks
are
>         pretty
>         vital in recovering a pool, and I believe it''s a fair bit
of
>         work to
>         search the disk to find them.  Might it be a good idea to
>         allow ZFS to
>         store uberblock locations elsewhere for recovery purposes?
>
>
>     Perhaps it is best to leave decisions on these issues to the ZFS
>     designers who know how things work.
>
>     Previous descriptions from people who do know how things work
>     didn''t make it sound very difficult to find the last 20
>     uberblocks.  It sounded like they were at known points for any
>     given pool.
>
>     Those folks have surely tired of this discussion by now and are
>     working on actual code rather than reading idle discussion between
>     several people who don''t know the details of how things work.
>
>
>
> People who "don''t know how things work" often
aren''t tied down by the
> baggage of knowing how things work.  Which leads to creative solutions 
> those who are weighed down didn''t think of.  I don''t
think it hurts in
> the least to throw out some ideas.  If they aren''t valid,
it''s not
> hard to ignore them and move on.  It surely isn''t a waste of
anyone''s
> time to spend 5 minutes reading a response and weighing if the idea is 
> valid or not.
OTOH, anyone who followed this discussion the last few times, has looked
at the on-disk format documents, or reviewed the source code would know
that the uberblocks are kept in an 128-entry circular queue which is 4x
redundant with 2 copies each at the beginning and end of the vdev.
Other metadata, by default, is 2x redundant and spatially diverse.

Clearly, the failure mode being hashed out here has resulted in the defeat
of those protections. The only real question is how fast Jeff can roll 
out the
feature to allow reverting to previous uberblocks.  The procedure for doing
this by hand has long been known, and was posted on this forum -- though
it is tedious.
 -- richard

Bob Friesenhahn

2009-Feb-14 01:58 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009, Tim wrote:
> I don''t think it hurts in the least to throw out some ideas.  If 
> they aren''t valid, it''s not hard to ignore them and move
on.  It
> surely isn''t a waste of anyone''s time to spend 5 minutes
reading a
> response and weighing if the idea is valid or not.
Today I sat down at 9:00 AM to read the new mail for the day and did 
not catch up until five hours later.  Quite a lot of the reading was 
this (now) useless discussion thread.  It is now useless since after 
five hours of reading, there were no ideas expressed that had not been 
expressed before.

With this level of overhead, I am surprise that there is any remaining 
development motion on ZFS at all.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Frank Cusack

2009-Feb-14 04:46 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On February 13, 2009 7:58:51 PM -0600 Bob Friesenhahn 
<bfriesen at simple.dallas.tx.us> wrote:> With this level of overhead, I am surprise that there is any remaining
> development motion on ZFS at all.
come on now.  with all due respect, you are attempting to stifle
relevant discussion and that is, well, bordering on ridiculous.

i sure have learned a lot from this thread.  now of course that is
meaningless because i don''t and almost certainly never will contribute
to zfs, but i assume there are others who have learned from this thread.
that''s definitely a good thing.

this thread also appears to be the impetus to change priorities on
zfs development.
> Today I sat down at 9:00 AM to read the new mail for the day and did not
> catch up until five hours later.  Quite a lot of the reading was this
> (now) useless discussion thread.  It is now useless since after five
> hours of reading, there were no ideas expressed that had not been
> expressed before.
lastly, WOW!  if this thread is worthless to you, learn to use the
delete button.  especially if you read that slowly.  i know i certainly
couldn''t keep up with all my incoming mail if i read everything.

i''m sorry to berate you, as you do make very valuable contributions to
the discussion here, but i take offense at your attempts to limit
discussion simply because you know everything there is to know about
the subject.

great, now i am guilty of being "overhead".

-frank

James C. McPherson

2009-Feb-14 05:27 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Hi Bob,

On Fri, 13 Feb 2009 19:58:51 -0600 (CST)
Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> On Fri, 13 Feb 2009, Tim wrote:
> 
> > I don''t think it hurts in the least to throw out some ideas. 
If
> > they aren''t valid, it''s not hard to ignore them and
move on.  It
> > surely isn''t a waste of anyone''s time to spend 5
minutes reading a
> > response and weighing if the idea is valid or not.
> 
> Today I sat down at 9:00 AM to read the new mail for the day and did 
> not catch up until five hours later.  Quite a lot of the reading was 
> this (now) useless discussion thread.  It is now useless since after 
> five hours of reading, there were no ideas expressed that had not
> been expressed before.
I''ve found this thread to be like watching a car accident, and
also really frustrating due to the inability to use search engines
on the part of many posters. 
 > With this level of overhead, I am surprise that there is any
> remaining development motion on ZFS at all.
Good thing the ZFS developers have mail filters :-)


cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp	http://www.jmcp.homeunix.com/blog

Bob Friesenhahn

2009-Feb-14 18:00 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, 13 Feb 2009, Frank Cusack wrote:>
> i''m sorry to berate you, as you do make very valuable
contributions to
> the discussion here, but i take offense at your attempts to limit
> discussion simply because you know everything there is to know about
> the subject.
The point is that those of us in the chattering class (i.e. people 
like you and me) clearly know very little about the subject, and 
continuting to chatter among ourselves is soon no longer rewarding.

Bob
=====================================Bob Friesenhahn
bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

Ross Smith

2009-Feb-15 06:11 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Hey guys,

I''ll let this die in a sec, but I just wanted to say that I''ve
gone
and read the on disk document again this morning, and to be honest
Richard, without the description you just wrote, I really wouldn''t
have known that uberblocks are in a 128 entry circular queue that''s 4x
redundant.

Please understand that I''m not asking for answers to these notes, this
post is purely to illustrate to you ZFS guys that much as I appreciate
having the ZFS docs available, they are very tough going for anybody
who isn''t a ZFS developer.  I consider myself well above average in IT
ability, and I''ve really spent quite a lot of time in the past year
reading around ZFS, but even so I would definitely have come to the
wrong conclusion regarding uberblocks.

Richard''s post I can understand really easily, but in the on disk
format docs, that information is spread over 7 pages of really quite
technical detail, and to be honest, for a user like myself raises as
many questions as it answers:

On page 6 I learn that labels are stored on each vdev, as well as each
disk.  So there will be a label on the pool, mirror (or raid group),
and disk.  I know the disk ones are at the start and end of the disk,
and it sounds like the mirror vdev is in the same place, but where is
the root vdev label?  The example given doesn''t mention its location
at all.

Then, on page 7 it sounds like the entire label is overwriten whenever
on-disk data is updated - "any time on-disk data is overwritten, there
is potential for error".  To me, it sounds like it''s not a 128
entry
queue, but just a group of 4 labels, all of which are overwritten as
data goes to disk.

Then finally, on page 12 the uberblock is mentioned (although as an
aside, the first time I read these docs I had no idea what the
uberblock actually was).  It does say that only one uberblock is
active at a time, but with it being part of the label I''d just assume
these were overwritten as a group..

And that''s why I''ll often throw ideas out - I can either rely
on my
own limited knowledge of ZFS to say if it will work, or I can take
advantage of the excellent community we have here, and post the idea
for all to see.  It''s a quick way for good ideas to be improved upon,
and bad ideas consigned to the bin.  I''ve done it before in my rather
lengthly ''zfs availability'' thread.  My thoughts there were
thrashed
out nicely, with some quite superb additions (namely the concept of
lop sided mirrors which I think are a great idea).

Ross

PS.  I''ve also found why I thought you had to search for these blocks,
it was after reading this thread where somebody used mdb to search a
corrupt pool to try to recover data:
http://opensolaris.org/jive/message.jspa?messageID=318009

On Fri, Feb 13, 2009 at 11:09 PM, Richard Elling
<richard.elling at gmail.com> wrote:> Tim wrote:
>>
>>
>> On Fri, Feb 13, 2009 at 4:21 PM, Bob Friesenhahn
>> <bfriesen at simple.dallas.tx.us <mailto:bfriesen at
simple.dallas.tx.us>> wrote:
>>
>>    On Fri, 13 Feb 2009, Ross Smith wrote:
>>
>>        However, I''ve just had another idea.  Since the
uberblocks are
>>        pretty
>>        vital in recovering a pool, and I believe it''s a fair
bit of
>>        work to
>>        search the disk to find them.  Might it be a good idea to
>>        allow ZFS to
>>        store uberblock locations elsewhere for recovery purposes?
>>
>>
>>    Perhaps it is best to leave decisions on these issues to the ZFS
>>    designers who know how things work.
>>
>>    Previous descriptions from people who do know how things work
>>    didn''t make it sound very difficult to find the last 20
>>    uberblocks.  It sounded like they were at known points for any
>>    given pool.
>>
>>    Those folks have surely tired of this discussion by now and are
>>    working on actual code rather than reading idle discussion between
>>    several people who don''t know the details of how things
work.
>>
>>
>>
>> People who "don''t know how things work" often
aren''t tied down by the
>> baggage of knowing how things work.  Which leads to creative solutions
those
>> who are weighed down didn''t think of.  I don''t think
it hurts in the least
>> to throw out some ideas.  If they aren''t valid, it''s
not hard to ignore them
>> and move on.  It surely isn''t a waste of anyone''s
time to spend 5 minutes
>> reading a response and weighing if the idea is valid or not.
>
> OTOH, anyone who followed this discussion the last few times, has looked
> at the on-disk format documents, or reviewed the source code would know
> that the uberblocks are kept in an 128-entry circular queue which is 4x
> redundant with 2 copies each at the beginning and end of the vdev.
> Other metadata, by default, is 2x redundant and spatially diverse.
>
> Clearly, the failure mode being hashed out here has resulted in the defeat
> of those protections. The only real question is how fast Jeff can roll out
> the
> feature to allow reverting to previous uberblocks.  The procedure for doing
> this by hand has long been known, and was posted on this forum -- though
> it is tedious.
> -- richard
>
>

Tomasz Torcz

2009-Feb-18 09:14 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

On Fri, Feb 13, 2009 at 9:47 PM, Richard Elling
<richard.elling at gmail.com> wrote:> It has been my experience that USB sticks use FAT, which is an ancient
> file system which contains few of the features you expect from modern
> file systems. As such, it really doesn''t do any write caching.
Hence, it
> seems to work ok for casual users. I note that neither NTFS, ZFS, reiserfs,
> nor many of the other, high performance file systems are used by default
> for USB devices. Could it be that anyone not using FAT for USB devices
> is straining against architectural limits?
  There are no archtiectural limits. USB sticks can be used with whatever
you throw at them. On sticks I use to interchange data with Windows machines
I have NTFS, on others differente filesystems: ZFS, ext4, btrfs, often encrypted
on block level.
   USB sticks are generally very simple -- no discard commands and
other fancy stuff,
but overall they are block devices just like discs, arrays, SSDs...

-- 
Tomasz Torcz
xmpp: zdzichubg at chrome.pl

Moore, Joe

2009-Feb-23 19:23 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Mario Goebbels wrote:> One thing I''d like to see is an _easy_ option to fall back onto
older
> uberblocks when the zpool went belly up for a silly reason. Something
> that doesn''t involve esoteric parameters supplied to zdb.
Between uberblock updates, there may be many write operations to a data file,
each requiring a copy on write operation.  Some of those operations may reuse
blocks that were metadata blocks pointed to by the previous uberblock.

In which case the old uberblock points to a metadata tree full of garbage.

Jeff, you must have some idea on how to overcome this in your bugfix, would you
care to share?

--Joe

Robert Milkowski

2009-Feb-24 19:41 UTC

head link

[zfs-discuss] ZFS: unreliable for professional usage?

Hello Joe,

Monday, February 23, 2009, 7:23:39 PM, you wrote:

MJ> Mario Goebbels wrote:>> One thing I''d like to see is an _easy_ option to fall back
onto older
>> uberblocks when the zpool went belly up for a silly reason. Something
>> that doesn''t involve esoteric parameters supplied to zdb.
MJ> Between uberblock updates, there may be many write operations to
MJ> a data file, each requiring a copy on write operation.  Some of
MJ> those operations may reuse blocks that were metadata blocks
MJ> pointed to by the previous uberblock.

MJ> In which case the old uberblock points to a metadata tree full of
garbage.

MJ> Jeff, you must have some idea on how to overcome this in your bugfix,
would you care to share?

As was suggested on the list before ZFS could keep a list of freed
blocks for last N txgs and if there are still other blocks to be used
it would not allocated those from the last N transactions.

-- 
Best regards,
 Robert Milkowski
                                       http://milek.blogspot.com

zfs discuss - Feb 2009 - ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?

[zfs-discuss] ZFS: unreliable for professional usage?