thr3ads.net - zfs discuss - [zfs-discuss] Puzzling problem with zfs receive exit status [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Borja Marcos

2012-Mar-29 09:46 UTC

[zfs-discuss] Puzzling problem with zfs receive exit status

Hello,

I hope someone has an idea. 

I have a replication program that copies a dataset from one server to another
one. The replication mechanism is the obvious one, of course:

 zfs send -Ri from snapshot(n-1) snapshot(n) > file
scp file remote machine (I do it this way instead of using a pipeline so that a
network error won''t interrupt a receive data stream)
and on the remote machine,
zfs receive -Fd pool

It''s been working perfectly for months, no issues. However, yesterday
we began to see something weird: the zfs receive being executed on the remote
machine is exiting with an exit status of 1, even though the replication is
finished, and I see the copied snapshots on the remote machine.

Any ideas? It''s really puzzling. It seems that the replication is
working (a zfs list -t snapshot shows the new snapshots correctly applied to the
dataset) but I''m afraid there''s some kind of corruption.

The OS is Solaris, SunOS  5.10 Generic_141445-09 i86pc i386 i86pc.

Any ideas?



Thanks in advance,





Borja.

Ian Collins

2012-Mar-29 09:59 UTC

head link

[zfs-discuss] Puzzling problem with zfs receive exit status

On 03/29/12 10:46 PM, Borja Marcos wrote:> Hello,
>
> I hope someone has an idea.
>
> I have a replication program that copies a dataset from one server to
another one. The replication mechanism is the obvious one, of course:
>
>   zfs send -Ri from snapshot(n-1) snapshot(n)>  file
> scp file remote machine (I do it this way instead of using a pipeline so
that a network error won''t interrupt a receive data stream)
> and on the remote machine,
> zfs receive -Fd pool
>
> It''s been working perfectly for months, no issues. However,
yesterday we began to see something weird: the zfs receive being executed on the
remote machine is exiting with an exit status of 1, even though the replication
is finished, and I see the copied snapshots on the remote machine.
>
> Any ideas? It''s really puzzling. It seems that the replication is
working (a zfs list -t snapshot shows the new snapshots correctly applied to the
dataset) but I''m afraid there''s some kind of corruption.
Does zfs receive produce any warnings?  Have you tried adding -v?

--
  Ian.

Carsten John

2012-Mar-29 10:16 UTC

head link

[zfs-discuss] Puzzling problem with zfs receive exit status

-----Original message-----
To:	zfs-discuss at opensolaris.org; 
From:	Borja Marcos <borjam at sarenet.es>
Sent:	Thu 29-03-2012 11:49
Subject:	[zfs-discuss] Puzzling problem with zfs receive exit
status> 
> Hello,
> 
> I hope someone has an idea. 
> 
> I have a replication program that copies a dataset from one server to
another
> one. The replication mechanism is the obvious one, of course:
> 
>  zfs send -Ri from snapshot(n-1) snapshot(n) > file
> scp file remote machine (I do it this way instead of using a pipeline so
that a
> network error won''t interrupt a receive data stream)
> and on the remote machine,
> zfs receive -Fd pool
> 
> It''s been working perfectly for months, no issues. However,
yesterday we began
> to see something weird: the zfs receive being executed on the remote
machine is
> exiting with an exit status of 1, even though the replication is finished,
and
> I see the copied snapshots on the remote machine. 
> 
> Any ideas? It''s really puzzling. It seems that the replication is
working (a
> zfs list -t snapshot shows the new snapshots correctly applied to the
dataset)
> but I''m afraid there''s some kind of corruption.
> 
> The OS is Solaris, SunOS  5.10 Generic_141445-09 i86pc i386 i86pc.
> 
> Any ideas?
> 
> 
> 
> Thanks in advance,
> 
> 
> 
> 
> 
> Borja.
> 
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 

Hi Borja,


did you try to check the snapshot file with zstreamdump? It will validate the
checksums.

Perhaps the information here

http://blog.richardelling.com/2009/10/check-integrity-of-zfs-send-streams.html

might be useful for you.



Carsten

-- 
Max Planck Institut fuer marine Mikrobiologie
- Network Administration -
Celsiustr. 1
D-28359 Bremen
Tel.: +49 421 2028568
Fax.: +49 421 2028565
PGP public key:http://www.mpi-bremen.de/Carsten_John.html

Borja Marcos

2012-Mar-29 11:33 UTC

head link

[zfs-discuss] Puzzling problem with zfs receive exit status

On Mar 29, 2012, at 11:59 AM, Ian Collins wrote:
> Does zfs receive produce any warnings?  Have you tried adding -v?
Thank you very much Ian and Carsten. Well, adding a -v gave me a clue. Turns out
that one of the old snapshots had a clone created.

zfs receive -v  was complaining that it couldn''t destroy an old
snapshot, which wasn''t visible but had been cloned (and forgotten) long
ago. A truss of the zfs receive process shown it accessing the clone.

So, zfs receive was doing its job, the new snapshot was applied correctly, but
it was exiting with an exit value of 1, without printing any warnings, which I
think is wrong.

I''ve destroyed  the clone and everything  has gone back to normal. Now
zfs receive exits with 0.

Still I''m not sure if it could be a bug, the snapshot was cloned in
November 2011 and it had been sitting around for a long time. The pool had less
than 20 % of free space two days ago, maybe it triggered something.

Anyway, as I said, with the clone removed everything has gone back to normal.


Thank you very much,






Borja.

Richard Elling

2012-Mar-29 15:11 UTC

head link

[zfs-discuss] Puzzling problem with zfs receive exit status

On Mar 29, 2012, at 4:33 AM, Borja Marcos wrote:
> 
> On Mar 29, 2012, at 11:59 AM, Ian Collins wrote:
> 
>> Does zfs receive produce any warnings?  Have you tried adding -v?
> 
> Thank you very much Ian and Carsten. Well, adding a -v gave me a clue.
Turns out that one of the old snapshots had a clone created.
> 
> zfs receive -v  was complaining that it couldn''t destroy an old
snapshot, which wasn''t visible but had been cloned (and forgotten) long
ago. A truss of the zfs receive process shown it accessing the clone.
> 
> So, zfs receive was doing its job, the new snapshot was applied correctly,
but it was exiting with an exit value of 1, without printing any warnings, which
I think is wrong.
You are correct. Both zfs and zpool have a bad case of "exit 1 if something
isn''t right."
At Nexenta, I filed a bug against the ambiguity of the return code. You should
consider
filing a similar bug with Oracle. In the open-source ZFS implementations, there
is some
other work to get out of the way before properly tackling this, but that work is
in progress :-)
> 
> I''ve destroyed  the clone and everything  has gone back to normal.
Now zfs receive exits with 0.
> 
> Still I''m not sure if it could be a bug, the snapshot was cloned
in November 2011 and it had been sitting around for a long time. The pool had
less than 20 % of free space two days ago, maybe it triggered something.
> 
> Anyway, as I said, with the clone removed everything has gone back to
normal.
good!
> Thank you very much,
> 
> Borja.

 -- richard

--
DTrace Conference, April 3, 2012,
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
Richard.Elling at RichardElling.com
+1-760-896-4422






-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120329/9265f1bb/attachment.html>

Borja Marcos

2012-Mar-29 15:37 UTC

head link

[zfs-discuss] Puzzling problem with zfs receive exit status

On Mar 29, 2012, at 5:11 PM, Richard Elling wrote:
>> Thank you very much Ian and Carsten. Well, adding a -v gave me a clue.
Turns out that one of the old snapshots had a clone created.
>> 
>> zfs receive -v  was complaining that it couldn''t destroy an
old snapshot, which wasn''t visible but had been cloned (and forgotten)
long ago. A truss of the zfs receive process shown it accessing the clone.
>> 
>> So, zfs receive was doing its job, the new snapshot was applied
correctly, but it was exiting with an exit value of 1, without printing any
warnings, which I think is wrong.
> 
> You are correct. Both zfs and zpool have a bad case of "exit 1 if
something isn''t right."
> At Nexenta, I filed a bug against the ambiguity of the return code. You
should consider
> filing a similar bug with Oracle. In the open-source ZFS implementations,
there is some
> other work to get out of the way before properly tackling this, but that
work is in progress :-)
I understand that either a warning or, at least, a syslog message  with
LOG_WARNING is in order.

Regarding the open source camp, yes, I''m using ZFS on FreeBSD as well
:)






Borja.

zfs discuss - Mar 2012 - Puzzling problem with zfs receive exit status

[zfs-discuss] Puzzling problem with zfs receive exit status

[zfs-discuss] Puzzling problem with zfs receive exit status

[zfs-discuss] Puzzling problem with zfs receive exit status

[zfs-discuss] Puzzling problem with zfs receive exit status

[zfs-discuss] Puzzling problem with zfs receive exit status

[zfs-discuss] Puzzling problem with zfs receive exit status