thr3ads.net - zfs discuss - [zfs-discuss] I/O patterns during a "zpool replace": why write to the disk being replaced? [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Bill Sommerfeld

2006-Nov-08 03:49 UTC

[zfs-discuss] I/O patterns during a "zpool replace": why write to the disk being replaced?

On a v40z running snv_51, I''m doing a "zpool replace z c1t4d0
c1t5d0".

(so, why am I doing the replace?  The outgoing disk has been reporting
read errors sporadically but with increasing frequency over time..)

zpool iostat -v shows writes going to the old (outgoing) disk as well as
to the replacement disk.  Is this intentional?  

Seems counterintuitive as I''d think you''d want to touch a
suspect disk
as little as possible and as nondestructively as possible...

A representative snapshot from "zpool iostat -v" :

                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
z               306G   714G  1.43K    658  23.5M  1.11M
  raidz1        109G   231G  1.08K    392  22.3M   497K
    replacing      -      -      0   1012      0  5.72M
      c1t4d0       -      -      0    753      0  5.73M
      c1t5d0       -      -      0    790      0  5.72M
    c2t12d0        -      -    339    177  9.46M   149K
    c2t13d0        -      -    317    177  9.08M   149K
    c3t12d0        -      -    330    181  9.27M   147K
    c3t13d0        -      -    352    180  9.45M   146K
  raidz1        100G   240G    117    101   373K   225K
    c1t3d0         -      -     65     33  3.99M  64.1K
    c2t10d0        -      -     60     44  3.77M  63.2K
    c2t11d0        -      -     62     42  3.87M  63.4K
    c3t10d0        -      -     63     42  3.88M  62.3K
    c3t11d0        -      -     65     35  4.06M  61.8K
  raidz1       96.2G   244G    234    164   768K   415K
    c1t2d0         -      -    129     49  7.85M   112K
    c2t8d0         -      -    133     54  8.05M   112K
    c2t9d0         -      -    132     56  8.08M   113K
    c3t8d0         -      -    132     52  8.01M   113K
    c3t9d0         -      -    132     49  8.16M   112K

						- Bill

Erblichs

2006-Nov-08 09:54 UTC

head link

[zfs-discuss] I/O patterns during a "zpool replace": why write tothe disk being replaced?

Bill Sommerfield,

	Are their any existing snaps?

	Can you have any scripts that may be 
	removing aged files?

	Mitchell Erblich
	------------------

Bill Sommerfeld wrote:> 
> On a v40z running snv_51, I''m doing a "zpool replace z c1t4d0
c1t5d0".
> 
> (so, why am I doing the replace?  The outgoing disk has been reporting
> read errors sporadically but with increasing frequency over time..)
> 
> zpool iostat -v shows writes going to the old (outgoing) disk as well as
> to the replacement disk.  Is this intentional?
> 
> Seems counterintuitive as I''d think you''d want to touch a
suspect disk
> as little as possible and as nondestructively as possible...
> 
> A representative snapshot from "zpool iostat -v" :
> 
>                   capacity     operations    bandwidth
> pool            used  avail   read  write   read  write
> -------------  -----  -----  -----  -----  -----  -----
> z               306G   714G  1.43K    658  23.5M  1.11M
>   raidz1        109G   231G  1.08K    392  22.3M   497K
>     replacing      -      -      0   1012      0  5.72M
>       c1t4d0       -      -      0    753      0  5.73M
>       c1t5d0       -      -      0    790      0  5.72M
>     c2t12d0        -      -    339    177  9.46M   149K
>     c2t13d0        -      -    317    177  9.08M   149K
>     c3t12d0        -      -    330    181  9.27M   147K
>     c3t13d0        -      -    352    180  9.45M   146K
>   raidz1        100G   240G    117    101   373K   225K
>     c1t3d0         -      -     65     33  3.99M  64.1K
>     c2t10d0        -      -     60     44  3.77M  63.2K
>     c2t11d0        -      -     62     42  3.87M  63.4K
>     c3t10d0        -      -     63     42  3.88M  62.3K
>     c3t11d0        -      -     65     35  4.06M  61.8K
>   raidz1       96.2G   244G    234    164   768K   415K
>     c1t2d0         -      -    129     49  7.85M   112K
>     c2t8d0         -      -    133     54  8.05M   112K
>     c2t9d0         -      -    132     56  8.08M   113K
>     c3t8d0         -      -    132     52  8.01M   113K
>     c3t9d0         -      -    132     49  8.16M   112K
> 
>                                                 - Bill
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Bill Sommerfeld

2006-Nov-08 13:56 UTC

head link

[zfs-discuss] I/O patterns during a "zpool replace": why write tothe disk being replaced?

On Wed, 2006-11-08 at 01:54 -0800, Erblichs wrote:> 
> Bill Sommerfield,
that''s not how my name is spelled> 
> 	Are their any existing snaps?
no.  why do you think this would matter?  > 
> 	Can you have any scripts that may be 
> 	removing aged files?no; there was essentially no other activity on the pool other than the
"replace".

why do you think this would matter?

					- Bill

Erblichs

2006-Nov-10 03:18 UTC

head link

[zfs-discuss] I/O patterns during a "zpool replace": why writetothe disk being replaced?

Bill Sommerfield,

	Because, first, I have seen alot of I/O
	occur while a snapshot is being aged out
	of a system.

	I don''t think that during the resilvering process 
	accesses (read, writes) are completely
	stopped to the orig_dev.

	I expect at least some meta reads are
	going on.

	With some normal sporadic read failure, accessing
	the whole spool may force repeated reads for
	the replace.

	So, I was thinking that a read access
	that could ALSO be updating the znode. This newer
	time/date stamp is causing alot of writes.

	Depending on how the fs meta and blocks are
	being accessed, the orig_dev may also 
	have some normal writes until it is offlined.

	Mitchell Erblich
	-----------------

Bill Sommerfeld wrote:> 
> On Wed, 2006-11-08 at 01:54 -0800, Erblichs wrote:
> >
> > Bill Sommerfield,
> 
> that''s not how my name is spelled
> >
> >       Are their any existing snaps?
> no.  why do you think this would matter?
> >
> >       Can you have any scripts that may be
> >       removing aged files?
> no; there was essentially no other activity on the pool other than the
> "replace".
> 
> why do you think this would matter?
> 
>                                         - Bill

Bill Sommerfeld

2006-Nov-10 03:42 UTC

head link

[zfs-discuss] I/O patterns during a "zpool replace": why writetothe disk being replaced?

On Thu, 2006-11-09 at 19:18 -0800, Erblichs wrote:> Bill Sommerfield,
Again, that''s not how my name is spelled.
> 	With some normal sporadic read failure, accessing
> 	the whole spool may force repeated reads for
> 	the replace.
please look again at the iostat I posted:

                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
z               306G   714G  1.43K    658  23.5M  1.11M
  raidz1        109G   231G  1.08K    392  22.3M   497K
    replacing      -      -      0   1012      0  5.72M
      c1t4d0       -      -      0    753      0  5.73M
      c1t5d0       -      -      0    790      0  5.72M
    c2t12d0        -      -    339    177  9.46M   149K
    c2t13d0        -      -    317    177  9.08M   149K
    c3t12d0        -      -    330    181  9.27M   147K
    c3t13d0        -      -    352    180  9.45M   146K
  raidz1        100G   240G    117    101   373K   225K
    c1t3d0         -      -     65     33  3.99M  64.1K
    c2t10d0        -      -     60     44  3.77M  63.2K
    c2t11d0        -      -     62     42  3.87M  63.4K
    c3t10d0        -      -     63     42  3.88M  62.3K
    c3t11d0        -      -     65     35  4.06M  61.8K
  raidz1       96.2G   244G    234    164   768K   415K
    c1t2d0         -      -    129     49  7.85M   112K
    c2t8d0         -      -    133     54  8.05M   112K
    c2t9d0         -      -    132     56  8.08M   113K
    c3t8d0         -      -    132     52  8.01M   113K
    c3t9d0         -      -    132     49  8.16M   112K

there were no (zero, none, nada, zilch) reads directed to the failing
device.  there were a lot of WRITES to the failing device; in fact, the
the same volume of data was being written to BOTH the failing device and
the new device.
> 	So, I was thinking that a read access
> 	that could ALSO be updating the znode. This newer
> 	time/date stamp is causing alot of writes.
that''s not going to be significant as a source of traffic; again, look
at the above iostat, which was representative of the load throughout the
resilver.

Erblichs

2006-Nov-10 05:19 UTC

head link

[zfs-discuss] I/O patterns during a "zpool replace": whywritetothe disk being replaced?

Bill, Sommerfeld, Sorry,

	However, I am trying to explain what I think is
	happening on your system and why I consider this
	normal.

	Most of the reads/FS "replace" are normally 
	at the block level.

	To copy a FS, some level of reading MUST be done
	at the orig_dev.
	At what level and whether it is recorded as a
	normal vnode read  / mmap op for the direct and
	indirect blocks is another story.

	But it is being done. It is just not being
	recorded in FS stats. Read stats are normally used
	for normal FS object access requests.

	Secondly, maybe starting with the ?uberblock?, the
	rest of the meta data is probably being read. And
	because of the normal asyn access of FSs, it would
	not surprise me that then each znode''s access time
	field is updated. Remember, that unless you are just
	touching a FS low-level(file) object, all writes are
	proceeded by at least 1 read.

	Mitchell Erblich
	----------------

	

	
Bill Sommerfeld wrote:> 
> On Thu, 2006-11-09 at 19:18 -0800, Erblichs wrote:
> > Bill Sommerfield,
> 
> Again, that''s not how my name is spelled.
> 
> >       With some normal sporadic read failure, accessing
> >       the whole spool may force repeated reads for
> >       the replace.
> 
> please look again at the iostat I posted:
> 
>                   capacity     operations    bandwidth
> pool            used  avail   read  write   read  write
> -------------  -----  -----  -----  -----  -----  -----
> z               306G   714G  1.43K    658  23.5M  1.11M
>   raidz1        109G   231G  1.08K    392  22.3M   497K
>     replacing      -      -      0   1012      0  5.72M
>       c1t4d0       -      -      0    753      0  5.73M
>       c1t5d0       -      -      0    790      0  5.72M
>     c2t12d0        -      -    339    177  9.46M   149K
>     c2t13d0        -      -    317    177  9.08M   149K
>     c3t12d0        -      -    330    181  9.27M   147K
>     c3t13d0        -      -    352    180  9.45M   146K
>   raidz1        100G   240G    117    101   373K   225K
>     c1t3d0         -      -     65     33  3.99M  64.1K
>     c2t10d0        -      -     60     44  3.77M  63.2K
>     c2t11d0        -      -     62     42  3.87M  63.4K
>     c3t10d0        -      -     63     42  3.88M  62.3K
>     c3t11d0        -      -     65     35  4.06M  61.8K
>   raidz1       96.2G   244G    234    164   768K   415K
>     c1t2d0         -      -    129     49  7.85M   112K
>     c2t8d0         -      -    133     54  8.05M   112K
>     c2t9d0         -      -    132     56  8.08M   113K
>     c3t8d0         -      -    132     52  8.01M   113K
>     c3t9d0         -      -    132     49  8.16M   112K
> 
> there were no (zero, none, nada, zilch) reads directed to the failing
> device.  there were a lot of WRITES to the failing device; in fact, the
> the same volume of data was being written to BOTH the failing device and
> the new device.
> 
> >       So, I was thinking that a read access
> >       that could ALSO be updating the znode. This newer
> >       time/date stamp is causing alot of writes.
> 
> that''s not going to be significant as a source of traffic; again,
look
> at the above iostat, which was representative of the load throughout the
> resilver.

Al Hopper

2006-Nov-10 13:32 UTC

head link

[zfs-discuss] I/O patterns during a "zpool replace": whywritetothe disk being replaced?

On Thu, 9 Nov 2006, Erblichs wrote:
>
> Bill, Sommerfeld, Sorry,
>
> 	However, I am trying to explain what I think is
> 	happening on your system and why I consider this
> 	normal.
>
> 	Most of the reads/FS "replace" are normally
                    ^^^^^> 	at the block level.
>
> 	To copy a FS, some level of reading MUST be done
                                    ^^^^^^^^> 	at the orig_dev.
> 	At what level and whether it is recorded as a
> 	normal vnode read  / mmap op for the direct and
                     ^^^^^> 	indirect blocks is another story.
>
> 	But it is being done. It is just not being
> 	recorded in FS stats. Read stats are normally used
                              ^^^^^> 	for normal FS object access requests.
>
> 	Secondly, maybe starting with the ?uberblock?, the
> 	rest of the meta data is probably being read. And
                                          ^^^^^^^^^^^> 	because of the normal asyn access of FSs, it would
> 	not surprise me that then each znode''s access time
> 	field is updated. Remember, that unless you are just
> 	touching a FS low-level(file) object, all writes are
> 	proceeded by at least 1 read.
                             ^^^^^^^^> 	Mitchell Erblich
> 	----------------
>
Mitchell - Bill is asking about WRITES and you''re talking READS!
Your posts are making absolutely no sense to me....
>
>
> Bill Sommerfeld wrote:
> >
> > On Thu, 2006-11-09 at 19:18 -0800, Erblichs wrote:
> > > Bill Sommerfield,
> >
> > Again, that''s not how my name is spelled.
> >
> > >       With some normal sporadic read failure, accessing
> > >       the whole spool may force repeated reads for
> > >       the replace.
> >
> > please look again at the iostat I posted:
> >
> >                   capacity     operations    bandwidth
> > pool            used  avail   read  write   read  write
> > -------------  -----  -----  -----  -----  -----  -----
> > z               306G   714G  1.43K    658  23.5M  1.11M
> >   raidz1        109G   231G  1.08K    392  22.3M   497K
> >     replacing      -      -      0   1012      0  5.72M
> >       c1t4d0       -      -      0    753      0  5.73M
> >       c1t5d0       -      -      0    790      0  5.72M
> >     c2t12d0        -      -    339    177  9.46M   149K
> >     c2t13d0        -      -    317    177  9.08M   149K
> >     c3t12d0        -      -    330    181  9.27M   147K
> >     c3t13d0        -      -    352    180  9.45M   146K
> >   raidz1        100G   240G    117    101   373K   225K
> >     c1t3d0         -      -     65     33  3.99M  64.1K
> >     c2t10d0        -      -     60     44  3.77M  63.2K
> >     c2t11d0        -      -     62     42  3.87M  63.4K
> >     c3t10d0        -      -     63     42  3.88M  62.3K
> >     c3t11d0        -      -     65     35  4.06M  61.8K
> >   raidz1       96.2G   244G    234    164   768K   415K
> >     c1t2d0         -      -    129     49  7.85M   112K
> >     c2t8d0         -      -    133     54  8.05M   112K
> >     c2t9d0         -      -    132     56  8.08M   113K
> >     c3t8d0         -      -    132     52  8.01M   113K
> >     c3t9d0         -      -    132     49  8.16M   112K
> >
> > there were no (zero, none, nada, zilch) reads directed to the failing
> > device.  there were a lot of WRITES to the failing device; in fact,
the
> > the same volume of data was being written to BOTH the failing device
and
> > the new device.
> >
> > >       So, I was thinking that a read access
> > >       that could ALSO be updating the znode. This newer
> > >       time/date stamp is causing alot of writes.
> >
> > that''s not going to be significant as a source of traffic;
again, look
> > at the above iostat, which was representative of the load throughout
the
> > resilver.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
Al Hopper  Logical Approach Inc, Plano, TX.  al at logical-approach.com
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006

Bill Sommerfeld

2006-Nov-10 14:47 UTC

head link

[zfs-discuss] I/O patterns during a "zpool replace": whywritetothe disk being replaced?

On Thu, 2006-11-09 at 21:19 -0800, Erblichs wrote:> Bill, Sommerfeld, Sorry,
> 
> 	However, I am trying to explain what I think is
> 	happening on your system and why I consider this
> 	normal.
I''m not interested in speculation.  Please do not respond to this
message.
> 	To copy a FS, some level of reading MUST be done
> 	at the orig_dev.
you appear to suffer from poor reading comprehension.  according to
zpool iostat, the original device was not being read.  AT ALL. it was
being WRITTEN.  I found this behavior unusual and wanted to know from
someone actually RESPONSIBLE for ZFS whether this was expected behavior
or not.

I''d appreciate it if only people who have made changes to the ZFS
codebase found in opensolaris respond further to this thread.

						- Bill

Anton B. Rang

2006-Nov-10 15:47 UTC

head link

[zfs-discuss] Re: I/O patterns during a "zpool replace":whywritetothe disk being replaced?

>I''d appreciate it if only people who have made changes to the ZFS
>codebase found in opensolaris respond further to this thread.
Well.  I haven''t made changes, but I can read code.

When replacing a device, ZFS internally takes the device being replaced and
creates a mirror between the old and new device for the duration of the
replacement.  This is presumably done to leverage the existing resilvering code
to copy data from one device to the other.

There''s nothing special done to prevent writes to either side of the
resulting mirror, which is why you see roughly equal amounts of data being
written to each side. Every new block written to the disk being replaced will be
written to both the old and new device.
 
 
This message posted from opensolaris.org

zfs discuss - Nov 2006 - I/O patterns during a "zpool replace": why write to the disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": why write to the disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": why write tothe disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": why write tothe disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": why writetothe disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": why writetothe disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": whywritetothe disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": whywritetothe disk being replaced?

[zfs-discuss] I/O patterns during a "zpool replace": whywritetothe disk being replaced?

[zfs-discuss] Re: I/O patterns during a "zpool replace":whywritetothe disk being replaced?