Ricardo Correia
2006-Jun-09 16:55 UTC
[zfs-code] ztest failing in ztest_vdev_attach_detach()
Hi, I''m wondering if you could shed some light on an issue I''m having in porting libzpool to Linux. It seems that ztest is failing in ztest_vdev_attach_detach(). I think you will be able to reproduce it in OpenSolaris. I think the problem is related to the changes done in spa.c from revision 1.14 to revision 1.15 (changed in 26-May-2006). Could you please try running ztest for a few minutes with the latest sources (>26-May-2006) ? You will find the problem a lot faster if you change line 194 in ztest.c from: { ztest_vdev_attach_detach,&zopt_rarely}, to: { ztest_vdev_attach_detach,&zopt_always}, I think about 30-45 minutes should be enough to find the problem (usually it''s faster). Thank you.
Yes, I''m seeing this unexpectedly fail with EBUSY (errno 16) instead of ENOTSUP. This was definitely caused by the hot spare work, and it''s my fault for not running ztest more rigorously. I''ll file a bug and take a look. For now, you can change (in ztest.c): if (error == EOVERFLOW) expected_error = error; To: if (error == EOVERFLOW || error == EBUSY) expected_error = error; I''ve been able to run this way for 10 minutes, with the ''rarely -> always'' change. Sorry about that, - Eric On Fri, Jun 09, 2006 at 05:55:14PM +0100, Ricardo Correia wrote:> Hi, > > I''m wondering if you could shed some light on an issue I''m having in porting > libzpool to Linux. > > It seems that ztest is failing in ztest_vdev_attach_detach(). > > I think you will be able to reproduce it in OpenSolaris. I think the problem > is related to the changes done in spa.c from revision 1.14 to revision 1.15 > (changed in 26-May-2006). > > Could you please try running ztest for a few minutes with the latest sources > (>26-May-2006) ? > > You will find the problem a lot faster if you change line 194 in ztest.c > > from: { ztest_vdev_attach_detach,&zopt_rarely}, > to: { ztest_vdev_attach_detach,&zopt_always}, > > I think about 30-45 minutes should be enough to find the problem (usually it''s > faster). > > Thank you. > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://opensolaris.org/mailman/listinfo/zfs-code-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
This happens if you have: replacing foo bar And you try to replace ''foo'' with ''bar. Sure enough, the old code used to do: if (parent is replacing) return ENOTSUP if (initialize new vdev failed) return failure (EBUSY) Now it switched the order: if (initialize new vdev failed) return failure (EBUSY) if (parent is replacing) return ENOTSUP Both cases are correct, its just a matter of which error condition gets triggered first. You cannot trigger this through the CLI (since it refuses to issue the ioctl because said disk is actively in use). I''ll think some more about it, but most likely the correct fix is the one below. - Eric On Fri, Jun 09, 2006 at 10:12:51AM -0700, Eric Schrock wrote:> Yes, I''m seeing this unexpectedly fail with EBUSY (errno 16) instead of > ENOTSUP. This was definitely caused by the hot spare work, and it''s my > fault for not running ztest more rigorously. I''ll file a bug and take a > look. For now, you can change (in ztest.c): > > if (error == EOVERFLOW) > expected_error = error; > > To: > > if (error == EOVERFLOW || error == EBUSY) > expected_error = error; > > I''ve been able to run this way for 10 minutes, with the ''rarely -> > always'' change. > > Sorry about that, > > - Eric-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Ricardo Correia
2006-Jun-09 17:36 UTC
[zfs-code] ztest failing in ztest_vdev_attach_detach()
Ok, I''m glad to know my analysis was correct (I wasn''t sure about it, though). This means that the ztest port to Linux now doesn''t have any known bugs ;) Thank you for the fast response. On Friday 09 June 2006 18:26, you wrote:> This happens if you have: > > replacing > foo > bar > > And you try to replace ''foo'' with ''bar. Sure enough, the old code used > to do: > > if (parent is replacing) > return ENOTSUP > if (initialize new vdev failed) > return failure (EBUSY) > > Now it switched the order: > > if (initialize new vdev failed) > return failure (EBUSY) > if (parent is replacing) > return ENOTSUP > > Both cases are correct, its just a matter of which error condition gets > triggered first. You cannot trigger this through the CLI (since it > refuses to issue the ioctl because said disk is actively in use). I''ll > think some more about it, but most likely the correct fix is the one > below. > > - Eric > > On Fri, Jun 09, 2006 at 10:12:51AM -0700, Eric Schrock wrote: > > Yes, I''m seeing this unexpectedly fail with EBUSY (errno 16) instead of > > ENOTSUP. This was definitely caused by the hot spare work, and it''s my > > fault for not running ztest more rigorously. I''ll file a bug and take a > > look. For now, you can change (in ztest.c): > > > > if (error == EOVERFLOW) > > expected_error = error; > > > > To: > > > > if (error == EOVERFLOW || error == EBUSY) > > expected_error = error; > > > > I''ve been able to run this way for 10 minutes, with the ''rarely -> > > always'' change. > > > > Sorry about that, > > > > - Eric > > -- > Eric Schrock, Solaris Kernel Development > http://blogs.sun.com/eschrock
On Fri, Jun 09, 2006 at 06:36:23PM +0100, Ricardo Correia wrote:> Ok, I''m glad to know my analysis was correct (I wasn''t sure about it, though). > > This means that the ztest port to Linux now doesn''t have any known bugs ;) > > Thank you for the fast response.Sweet! Let us know if you had to make any changes to non-zfs_context code, and we can integrate the changes upstream to make future maintenance/other ports easier. FYI, the fix I''ll putback is slightly different from what I mentioned previously: ------- usr/src/cmd/ztest/ztest.c ------- Index: usr/src/cmd/ztest/ztest.c --- /ws/onnv-clone/usr/src/cmd/ztest/ztest.c Fri Jun 2 23:04:35 2006 +++ /export/eschrock/on-fixes/usr/src/cmd/ztest/ztest.c Fri Jun 9 10:46:35 2006 @@ -934,16 +934,16 @@ * * If newvd is already part of the pool, it should fail with * EBUSY. * * If newvd is too small, it should fail with EOVERFLOW. */ - if (pvd->vdev_ops != &vdev_mirror_ops && + if (newvd != NULL) + expected_error = EBUSY; + else if (pvd->vdev_ops != &vdev_mirror_ops && pvd->vdev_ops != &vdev_root_ops && (!replacing || pvd->vdev_ops == &vdev_replacing_ops)) expected_error = ENOTSUP; - else if (newvd != NULL) - expected_error = EBUSY; else if (newsize < oldsize) expected_error = EOVERFLOW; else if (ashift > oldvd->vdev_top->vdev_ashift) expected_error = EDOM; else We already had the check for EBUSY, it just needed to be moved ahead of ENOTSUP just like the associated code. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Ricardo Correia
2006-Jun-09 18:48 UTC
[zfs-code] ztest failing in ztest_vdev_attach_detach()
Ok, the changes are available here: http://www.wizy.org/files/zfs-linux.patch A few notes: 1) The patch includes the header Solaris/Linux differences. I''m not sure what''s relevant, if anything. I have included some Solaris-specific macros and typedefs in sol_compat.h, so you will see some #includes removed. By the way, I''m not exactly a C or UNIX expert (not even close), so if there are any glaring errors in the #includes, please let me know :p 2) I don''t know if I got the necessary casts correctly. I had to do it to eliminate gcc warnings, I don''t know how it''ll work with the Sun Studio compiler. I have tested them with gcc-3.4.5 in both x86 and x86-64 modes. 3) gcc spits a warning in this code: ''struct foo bar = { 0 }''.. I had to change all such code to ''struct foo bar = {}''. The {0} syntax is correct standard C, however. 4) The directory structure is not the same as in the OpenSolaris sources, sorry about that. On Friday 09 June 2006 18:50, Eric Schrock wrote:> Sweet! Let us know if you had to make any changes to non-zfs_context > code, and we can integrate the changes upstream to make future > maintenance/other ports easier. FYI, the fix I''ll putback is slightly > ...
Mark Shellenbaum
2006-Jun-09 19:15 UTC
[zfs-code] ztest failing in ztest_vdev_attach_detach()
Ricardo Correia wrote:> Ok, the changes are available here: http://www.wizy.org/files/zfs-linux.patch > > A few notes: > > 1) The patch includes the header Solaris/Linux differences. I''m not sure > what''s relevant, if anything. I have included some Solaris-specific macros > and typedefs in sol_compat.h, so you will see some #includes removed. > > By the way, I''m not exactly a C or UNIX expert (not even close), so if there > are any glaring errors in the #includes, please let me know :p > > 2) I don''t know if I got the necessary casts correctly. I had to do it to > eliminate gcc warnings, I don''t know how it''ll work with the Sun Studio > compiler. I have tested them with gcc-3.4.5 in both x86 and x86-64 modes. > > 3) gcc spits a warning in this code: ''struct foo bar = { 0 }''.. I had to > change all such code to ''struct foo bar = {}''. The {0} syntax is correct > standard C, however.We are compiling this will both the Sun Compilers and gcc 3.4.3 We use a tool "cw" compiler wrapper to translate Sun Compiler options into gcc options. http://cvs.opensolaris.org/source/xref/on/usr/src/tools/cw/cw.c
On Fri, Jun 09, 2006 at 01:15:23PM -0600, Mark Shellenbaum wrote:> > We are compiling this will both the Sun Compilers and gcc 3.4.3 > We use a tool "cw" compiler wrapper to translate Sun Compiler options > into gcc options. >In particular, it looks like we''re using the folowing options: -Wall -Wno-unknown-pragmas -Wno-missing-braces -Wno-sign-compare -Wno-parentheses -Wno-uninitialized -Wno-implicit-function-declaration -Wno-unused -Wno-trigraphs -Wno-char-subscripts -Wno-switch As well as the c99 flag, which gets turned into ''-std=gnu99'' on gcc. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock