Yusuf Goolamabbas
2006-May-09 05:56 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Hi, I am using Solaris Express 04/06 (snv_36) and tried to do the same tasks as in Dan Price''s Self Healing screencast bash-3.00# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# wget http://dlc.sun.com/osol/on/downloads/current/on-src-20060501.tar.bz2 bash-3.00# dd if=/dev/urandom of=/dev/dsk/c1t10d0 bs=1024 count=20480 20480+0 records in 20480+0 records out bash-3.00# digest -a md5 on-src-20060501.tar.bz2 2f68527e830540fd746feb03c4a825cd bash-3.00# cd / bash-3.00# zpool export tank bash-3.00# zpool import tank bash-3.00# cd /export/home/yusufg/ bash-3.00# digest -a md5 on-src-20060501.tar.bz2 2f68527e830540fd746feb03c4a825cd bash-3.00# zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t10d0 ONLINE 0 0 0 c1t13d0 ONLINE 0 0 0 errors: No known data errors Did I miss anything obvious ? This message posted from opensolaris.org
Jeff Bonwick
2006-May-09 09:13 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
> bash-3.00# dd if=/dev/urandom of=/dev/dsk/c1t10d0 bs=1024 count=20480A couple of things: (1) When you write to /dev/dsk, rather than /dev/rdsk, the results are cached in memory. So the on-disk state may have been unaltered. (2) When you write to /dev/rdsk/c-t-d, without specifying a slice, that actually refers to the entire disk *including* its EFI label. That was probably not your intent. When you give ZFS a c-t-d name with no slice, we format it with an EFI label and put all of the content (everything but the label) in s0. I personally hate this device naming semantic (/dev/rdsk/c-t-d not meaning what you''d logically expect it to). (It''s a generic Solaris bug, not a ZFS thing.) I''ll see if I can get it changed. Because almost everyone gets bitten by this. Jeff
Darren J Moffat
2006-May-09 09:28 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Jeff Bonwick wrote:> > I personally hate this device naming semantic (/dev/rdsk/c-t-d > not meaning what you''d logically expect it to). (It''s a generic > Solaris bug, not a ZFS thing.) I''ll see if I can get it changed. > Because almost everyone gets bitten by this.I''ve heard lots of people complain about this over the years. Some claim the SunOS model (or the slightly altered Linux one) was better, others hate what we have but don''t know what to do to fix it. So whats your proposal ? I just booted up Minix 3.1.1 today in Qemu and noticed to my surprise that it has a disk nameing scheme similar to what Solaris uses. It has c?d?p?s? note that both p (PC FDISK I assume) and s is used, something I always wished we had on Solaris when running on x86. I''m not suggesting they have a better system just another data point. -- Darren J Moffat
Joerg Schilling
2006-May-09 09:35 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Darren J Moffat <Darren.Moffat at Sun.COM> wrote:> Jeff Bonwick wrote: > > > > I personally hate this device naming semantic (/dev/rdsk/c-t-d > > not meaning what you''d logically expect it to). (It''s a generic > > Solaris bug, not a ZFS thing.) I''ll see if I can get it changed. > > Because almost everyone gets bitten by this. > > I''ve heard lots of people complain about this over the years. Some > claim the SunOS model (or the slightly altered Linux one) was better, > others hate what we have but don''t know what to do to fix it. > > So whats your proposal ? > > I just booted up Minix 3.1.1 today in Qemu and noticed to my surprise > that it has a disk nameing scheme similar to what Solaris uses. > It has c?d?p?s? note that both p (PC FDISK I assume) and s is used,HP-UX uses the same scheme. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Paul van der Zwan
2006-May-09 11:21 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
On 9-mei-2006, at 11:35, Joerg Schilling wrote:> Darren J Moffat <Darren.Moffat at Sun.COM> wrote: > >> Jeff Bonwick wrote: >>> >>> I personally hate this device naming semantic (/dev/rdsk/c-t-d >>> not meaning what you''d logically expect it to). (It''s a generic >>> Solaris bug, not a ZFS thing.) I''ll see if I can get it >>> changed. >>> Because almost everyone gets bitten by this. >> >> I''ve heard lots of people complain about this over the years. Some >> claim the SunOS model (or the slightly altered Linux one) was better, >> others hate what we have but don''t know what to do to fix it. >> >> So whats your proposal ? >> >> I just booted up Minix 3.1.1 today in Qemu and noticed to my surprise >> that it has a disk nameing scheme similar to what Solaris uses. >> It has c?d?p?s? note that both p (PC FDISK I assume) and s is used, > > HP-UX uses the same scheme. >I think any system descending from the old SysV branch has the c?t?d? s? naming convention. I don''t remember which version first used it but as far as I remember it was already used in the mid 80''s. Paul
Darren J Moffat
2006-May-09 11:34 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Paul van der Zwan wrote:>>> I just booted up Minix 3.1.1 today in Qemu and noticed to my surprise >>> that it has a disk nameing scheme similar to what Solaris uses. >>> It has c?d?p?s? note that both p (PC FDISK I assume) and s is used, >> >> HP-UX uses the same scheme. >> > > I think any system descending from the old SysV branch has the c?t?d?s? > naming convention. > I don''t remember which version first used it but as far as I remember it > was already used in the mid 80''s.There is a difference though as far as I can tell. Sometimes on Solaris we have p? for fdisk partitioning included and sometimes we don''t; similarly we sometimes don''t have t? for target. Personally I''d prefer us to be consistent always even if it leads to names like /dev/dsk/c0t0d0p0s0 if we are talking about the first Solaris VTOC slice c0t0d0 for the whole disk c0t0d0p0 for the whole Solaris VTOC. Then there is the issue of referencing FAT filesystems in size Windows Extended partitions which would give rise to stuff like this /dev/dsk/c0t0d0p0:1 at the moment :-) which is only really understood by pcfs. -- Darren J Moffat
Joerg Schilling
2006-May-09 11:34 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Paul van der Zwan <Paul.Vanderzwan at Sun.COM> wrote:> > HP-UX uses the same scheme. > > > > I think any system descending from the old SysV branch has the c?t?d? > s? naming convention. > I don''t remember which version first used it but as far as I remember > it was already used in the mid 80''s.I believe it did first appear in SVr2 ~ 1984 J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Frank Hofmann
2006-May-09 12:06 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
On Tue, 9 May 2006, Darren J Moffat wrote:> Paul van der Zwan wrote: >>>> I just booted up Minix 3.1.1 today in Qemu and noticed to my surprise >>>> that it has a disk nameing scheme similar to what Solaris uses. >>>> It has c?d?p?s? note that both p (PC FDISK I assume) and s is used, >>> >>> HP-UX uses the same scheme. >>> >> >> I think any system descending from the old SysV branch has the c?t?d?s? >> naming convention. >> I don''t remember which version first used it but as far as I remember it >> was already used in the mid 80''s. > > There is a difference though as far as I can tell. Sometimes on Solaris we > have p? for fdisk partitioning included and sometimes we don''t; similarly we > sometimes don''t have t? for target. Personally I''d prefer us to be > consistent always even if it leads to names like /dev/dsk/c0t0d0p0s0 if we > are talking about the first Solaris VTOC slice c0t0d0 for the whole disk > c0t0d0p0 for the whole Solaris VTOC.I second the call for consistency, but think that this means dumping partitions/slices from the actual device name. A disk is a disk - one unit of storage. How it is subdivided and how/whether the subdivisions are made available as device nodes should not be the worry of the disk driver, but rather that of an independent layer. The way it is now may have a history but that doesn''t make it less confusing to me :( The problem with the ''p'' and ''s'' nodes is that they''re _not_ used in consistent fashions. You already noticed that Solaris/SPARC doesn''t have ''p'' nodes, and if e.g. you take a Solaris/SPARC disk and attach it to a Solaris/x86 machine, you won''t see it''s ''s'' nodes either, and vice versa. How clean is that ? Why on earth do we use different names for ''whole disk'' on SPARC/x86 ? In short, why is it inevitabe to deal with disks _only_ if they have labels ? Why no separate labeling layer ?> > Then there is the issue of referencing FAT filesystems in size Windows > Extended partitions which would give rise to stuff like this > /dev/dsk/c0t0d0p0:1 at the moment :-) which is only really understood by > pcfs."understood" gives too much credit. PCFS acts on seeing this syntax, a well-trained animal. That it actually understands what it does (and worse, why it does so) is a bit far-fetched. And of course on SPARC, you''d rather use /dev/dsk/c0t0d0s2:1 ... if you know ...> > -- > Darren J Moffat > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Richard Elling
2006-May-09 17:43 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
On Tue, 2006-05-09 at 14:06 +0200, Frank Hofmann wrote:> I second the call for consistency, but think that this means dumping > partitions/slices from the actual device name. A disk is a disk - one unit > of storage. How it is subdivided and how/whether the subdivisions are made > available as device nodes should not be the worry of the disk driver, but > rather that of an independent layer. The way it is now may have a history > but that doesn''t make it less confusing to me :(Just a data point. We do that with dids in Sun Cluster. Nobody likes dids either. I usually file this in the "virtualization considered harmful" bucket. -- richard
Yusuf Goolamabbas
2006-May-11 07:56 UTC
[zfs-discuss] Re: Trying to replicate ZFS self-heal demo and not seeing fixed error
> > bash-3.00# dd if=/dev/urandom of=/dev/dsk/c1t10d0 > bs=1024 count=20480 > > A couple of things: > > (1) When you write to /dev/dsk, rather than > /dev/rdsk, the results > are cached in memory. So the on-disk state may > have been unaltered.That''s why I also did a zpool export <poolname> followed by a zpool import <poolname>. According to the demo, this makes it go to stable storage. My contention is that other than the pool name and the disk slice information, I did exactly as what was described in the self heal demo yet did not see the appropiate response from zfs> > (2) When you write to /dev/rdsk/c-t-d, without > specifying a slice, > that actually refers to the entire disk > *including* its EFI label. > That was probably not your intent. When you give > ZFS a c-t-d > name with no slice, we format it with an EFI label > and put all > of the content (everything but the label) in s0.You lost me here. I guess I will have to dwelve deeper into Solaris disk naming conventions. I would prefer to avoid thinking in terms of slices/partitions/disks etc. This message posted from opensolaris.org
Joerg Schilling
2006-May-15 12:18 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Darren J Moffat <Darren.Moffat at Sun.COM> wrote:> There is a difference though as far as I can tell. Sometimes on Solaris > we have p? for fdisk partitioning included and sometimes we don''t; > similarly we sometimes don''t have t? for target. Personally I''d prefer > us to be consistent always even if it leads to names like > /dev/dsk/c0t0d0p0s0 if we are talking about the first Solaris VTOC slice > c0t0d0 for the whole disk c0t0d0p0 for the whole Solaris VTOC.This is a sore point in OpenSolaris.... The fact that the kernel does make implicit assumptions on the way FDISK has to be interpretet (e.g. only one Solaris FDISK partition) prevents us to use some of the disks created on Linux> Then there is the issue of referencing FAT filesystems in size Windows > Extended partitions which would give rise to stuff like this > /dev/dsk/c0t0d0p0:1 at the moment :-) which is only really understood by > pcfs.Did you ever try to tell someone howto mount a specific FDISK partition (in case that expended partitions are in use) wihout asking him to use the try and error system? I hope that this problem could be fixed in the future. Backwards- compatiblity should not be an issue as the current system does not work well. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Joerg Schilling
2006-May-15 12:25 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Frank Hofmann <Frank.Hofmann at Sun.COM> wrote:> > There is a difference though as far as I can tell. Sometimes on Solaris we > > have p? for fdisk partitioning included and sometimes we don''t; similarly we > > sometimes don''t have t? for target. Personally I''d prefer us to be > > consistent always even if it leads to names like /dev/dsk/c0t0d0p0s0 if we > > are talking about the first Solaris VTOC slice c0t0d0 for the whole disk > > c0t0d0p0 for the whole Solaris VTOC. > > I second the call for consistency, but think that this means dumping > partitions/slices from the actual device name. A disk is a disk - one unit > of storage. How it is subdivided and how/whether the subdivisions are made > available as device nodes should not be the worry of the disk driver, but > rather that of an independent layer. The way it is now may have a history > but that doesn''t make it less confusing to me :(Thank you for posting this! The idea of having a separate partitioning layer is the only way to fix the various problems caused by the hacks in e.g. pcfs. The main problem I see with this idea is that it may cause to throw parts of the old naming scheme away. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Darren J Moffat
2006-May-15 12:54 UTC
[zfs-discuss] Trying to replicate ZFS self-heal demo and not seeing fixed error
Joerg Schilling wrote:> Darren J Moffat <Darren.Moffat at Sun.COM> wrote: > >> There is a difference though as far as I can tell. Sometimes on Solaris >> we have p? for fdisk partitioning included and sometimes we don''t; >> similarly we sometimes don''t have t? for target. Personally I''d prefer >> us to be consistent always even if it leads to names like >> /dev/dsk/c0t0d0p0s0 if we are talking about the first Solaris VTOC slice >> c0t0d0 for the whole disk c0t0d0p0 for the whole Solaris VTOC. > > This is a sore point in OpenSolaris.... > > The fact that the kernel does make implicit assumptions on the way FDISK > has to be interpretet (e.g. only one Solaris FDISK partition) prevents > us to use some of the disks created on LinuxBeen there, done that!>> Then there is the issue of referencing FAT filesystems in size Windows >> Extended partitions which would give rise to stuff like this >> /dev/dsk/c0t0d0p0:1 at the moment :-) which is only really understood by >> pcfs. > > Did you ever try to tell someone howto mount a specific FDISK partition > (in case that expended partitions are in use) wihout asking him to > use the try and error system?Too many times! -- Darren J Moffat