"C. Bergström"
2009-Mar-03 21:35 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
For reasons which I don''t care about Sun may not apply to be a gsoc organization this year. However, I''m not discouraged from trying to propose some exciting zfs related ideas. On/off list feel free to send your vote, let me know if you can mentor or if you know a company that could use it. Here''s more or less what I''ve collected... 1) Excess ditto block removing + other green-bytes zfs+ features - *open source* (very hard.. can''t be done in two months) 2) raidz boot support (planning phase and suitable student already found. could use more docs/info for proposal) 3) Additional zfs compression choices (good for archiving non-text files? 4) zfs cli interface to add safety checks (save your butt from deleting a pool worth more than your job) 5) Web or gui based admin interface 6) zfs defrag (was mentioned by someone working around petabytes of data..) 7) vdev evacuation as an upgrade path (which may depend or take advantage of zfs resize/shrink code) 8) zfs restore/repair tools (being worked on already?) 9) Timeslider ported to kde4.2 ( *cough* couldn''t resist, but put this on the list) 10) Did I miss something.. #2 Currently planning and collecting as much information for the proposal as possible. Today all ufs + solaris grub2 issues were resolved and will likely be committed to upstream soon. There is a one liner fix in the solaris kernel also needed, but that can be binary hacked worst case. #5/9 This also may be possible for an outside project.. either web showcase or tighter desktop integration.. The rest may just be too difficult in a two month period, not something which can go upstream or not enough time to really plan well enough.. Even if this isn''t done for gsoc it may still be possible for the community to pursue some of these.. To be a mentor will most likely require answering daily/weekly technical questions, ideally being on irc and having patience. On top of this I''ll be available to help as much as technically possible, keep the student motivated and the projects on schedule. ./Christopher #ospkg irc.freenode.net - (Mostly OpenSolaris development rambling)
Richard Elling
2009-Mar-03 22:16 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
C. Bergstr?m wrote:> > For reasons which I don''t care about Sun may not apply to be a gsoc > organization this year. However, I''m not discouraged from trying to > propose some exciting zfs related ideas. On/off list feel free to > send your vote, let me know if you can mentor or if you know a company > that could use it. > > Here''s more or less what I''ve collected... > > 1) Excess ditto block removing + other green-bytes zfs+ features - > *open source* (very hard.. can''t be done in two months) > 2) raidz boot support (planning phase and suitable student already > found. could use more docs/info for proposal) > 3) Additional zfs compression choices (good for archiving non-text > files? > 4) zfs cli interface to add safety checks (save your butt from > deleting a pool worth more than your job) > 5) Web or gui based admin interfaceFWIW, I just took at look at the BUI in b108 and it seems to have garnered some love since the last time I looked at it (a year ago?) I encourage folks to take a fresh look at it. https://localhost:6789 -- richard> 6) zfs defrag (was mentioned by someone working around petabytes of > data..) > 7) vdev evacuation as an upgrade path (which may depend or take > advantage of zfs resize/shrink code) > 8) zfs restore/repair tools (being worked on already?) > 9) Timeslider ported to kde4.2 ( *cough* couldn''t resist, but put > this on the list) > 10) Did I miss something.. > > #2 Currently planning and collecting as much information for the > proposal as possible. Today all ufs + solaris grub2 issues were > resolved and will likely be committed to upstream soon. There is a > one liner fix in the solaris kernel also needed, but that can be > binary hacked worst case. > > #5/9 This also may be possible for an outside project.. either web > showcase or tighter desktop integration.. > > The rest may just be too difficult in a two month period, not > something which can go upstream or not enough time to really plan well > enough.. Even if this isn''t done for gsoc it may still be possible > for the community to pursue some of these.. > > To be a mentor will most likely require answering daily/weekly > technical questions, ideally being on irc and having patience. On top > of this I''ll be available to help as much as technically possible, > keep the student motivated and the projects on schedule. > > ./Christopher > > > #ospkg irc.freenode.net - (Mostly OpenSolaris development rambling) > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Nicolas Williams
2009-Mar-03 22:42 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Tue, Mar 03, 2009 at 11:35:40PM +0200, "C. Bergstr?m" wrote:> 7) vdev evacuation as an upgrade path (which may depend or take > advantage of zfs resize/shrink code)IIRC Matt Ahrens has said on this list that vdev evacuation/pool shrinking is being worked. So (7) would be duplication of effort.> 8) zfs restore/repair tools (being worked on already?)IIRC Jeff Bonwick has said on this list that ubberblock rollback on import is now his higher priority. So working on (8) would be duplication of effort.> 1) Excess ditto block removing + other green-bytes zfs+ features - > *open source* (very hard.. can''t be done in two months)Using the new block pointer re-write code you might be able to deal with re-creating blocks with more/fewer ditto copies (and compression, ...) with incremental effort. But ask Matt Ahrens.> 6) zfs defrag (was mentioned by someone working around petabytes of > data..)(6) probably depends on the new block pointer re-write code as well. But (6) may also be implied in vdev evac/pool shrink, so it may be duplication of effort. Nico --
Blake
2009-Mar-04 00:10 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
When I go here: http://opensolaris.org/os/project/isns/bui I get an error. Where are you getting BUI from? On Tue, Mar 3, 2009 at 5:16 PM, Richard Elling <richard.elling at gmail.com>wrote:> > > FWIW, I just took at look at the BUI in b108 and it seems to have > garnered some love since the last time I looked at it (a year ago?) > I encourage folks to take a fresh look at it. > https://localhost:6789 > > -- richard-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090303/0d5f54ac/attachment.html>
Maurice Volaski
2009-Mar-04 00:20 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
> 10) Did I miss something..Somehow, what I posted on the web forum didn''t make it to the mailing list digest... How about implementing dedup? This has been listed as an RFE for almost a year, http://bugs.opensolaris.org/view_bug.do?bug_id=6677093 and discussed here, http://www.opensolaris.org/jive/thread.jspa?messageID=256373. -- Maurice Volaski, mvolaski at aecom.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University
Tim
2009-Mar-04 01:14 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Tue, Mar 3, 2009 at 3:35 PM, "C. Bergstr?m" <cbergstrom at netsyncro.com>wrote:> > For reasons which I don''t care about Sun may not apply to be a gsoc > organization this year. However, I''m not discouraged from trying to propose > some exciting zfs related ideas. On/off list feel free to send your vote, > let me know if you can mentor or if you know a company that could use it. > > Here''s more or less what I''ve collected... > > 1) Excess ditto block removing + other green-bytes zfs+ features - *open > source* (very hard.. can''t be done in two months) > 2) raidz boot support (planning phase and suitable student already found. > could use more docs/info for proposal) > 3) Additional zfs compression choices (good for archiving non-text files? > 4) zfs cli interface to add safety checks (save your butt from deleting a > pool worth more than your job) > 5) Web or gui based admin interface > 6) zfs defrag (was mentioned by someone working around petabytes of > data..) > 7) vdev evacuation as an upgrade path (which may depend or take advantage > of zfs resize/shrink code) > 8) zfs restore/repair tools (being worked on already?) > 9) Timeslider ported to kde4.2 ( *cough* couldn''t resist, but put this on > the list) > 10) Did I miss something.. > > #2 Currently planning and collecting as much information for the proposal > as possible. Today all ufs + solaris grub2 issues were resolved and will > likely be committed to upstream soon. There is a one liner fix in the > solaris kernel also needed, but that can be binary hacked worst case. > > #5/9 This also may be possible for an outside project.. either web showcase > or tighter desktop integration.. > > The rest may just be too difficult in a two month period, not something > which can go upstream or not enough time to really plan well enough.. Even > if this isn''t done for gsoc it may still be possible for the community to > pursue some of these.. > > To be a mentor will most likely require answering daily/weekly technical > questions, ideally being on irc and having patience. On top of this I''ll be > available to help as much as technically possible, keep the student > motivated and the projects on schedule. > > ./Christopher > >I know plenty of "home users" would like the ability to add a single disk to a raid-z vdev in order to grow a disk at a time. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090303/6eb36ae3/attachment.html>
Richard Elling
2009-Mar-04 01:36 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
Blake wrote:> When I go here: > > http://opensolaris.org/os/project/isns/bui > > I get an error. ? Where are you getting BUI from?The BUI is in webconsole which is available on your local machine at port 6789 https://localhost:6798 If you want to access it remotely, you''ll need to change the configuration as documented http://docs.sun.com/app/docs/doc/817-1985/gdhgt?a=view -- richard> > On Tue, Mar 3, 2009 at 5:16 PM, Richard Elling > <richard.elling at gmail.com <mailto:richard.elling at gmail.com>> wrote: > > > FWIW, I just took at look at the BUI in b108 and it seems to have > garnered some love since the last time I looked at it (a year ago?) > I encourage folks to take a fresh look at it. > ? https://localhost:6789 > > -- richard >
Blake
2009-Mar-04 02:08 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
That''s what I thought you meant, and I got excited thinking that you were talking about OpenSolaris :) I''ll see about getting the new packages and trying them out. On Tue, Mar 3, 2009 at 8:36 PM, Richard Elling <richard.elling at gmail.com>wrote:> Blake wrote: > >> When I go here: >> >> http://opensolaris.org/os/project/isns/bui >> >> I get an error. ? Where are you getting BUI from? >> > > The BUI is in webconsole which is available on your local machine at > port 6789 > https://localhost:6798 > > If you want to access it remotely, you''ll need to change the configuration > as documented > http://docs.sun.com/app/docs/doc/817-1985/gdhgt?a=view > > -- richard > >> >> On Tue, Mar 3, 2009 at 5:16 PM, Richard Elling <richard.elling at gmail.com<mailto: >> richard.elling at gmail.com>> wrote: >> >> >> FWIW, I just took at look at the BUI in b108 and it seems to have >> garnered some love since the last time I looked at it (a year ago?) >> I encourage folks to take a fresh look at it. >> ? https://localhost:6789 >> >> -- richard >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090303/1f8e9b0a/attachment.html>
Ghee Teo
2009-Mar-04 11:49 UTC
[zfs-discuss] [osol-discuss] zfs related google summer of code ideas - your vote
Hi Chris, Great to have such a long list of suggestions! Though I think the information is a bit on the short side, hard for mentor/mentee to pick up. Suggestion to create a short description like that here, http://live.gnome.org/SummerOfCode2009/Ideas Title: 1liner o Benefits: 1 liners o Requirements: 1/2 lines o Note: a short description of specific thing to need to know. Slab that on a wiki page, and if any of the ZFS engineers like to mentor can simply put their name down on the page. What do you think? A similar variant for #9 is being proposed on the GNOME page as well.* nautilus: time slider for btrfs* -Ghee C. Bergstr?m wrote:> > For reasons which I don''t care about Sun may not apply to be a gsoc > organization this year. However, I''m not discouraged from trying to > propose some exciting zfs related ideas. On/off list feel free to > send your vote, let me know if you can mentor or if you know a company > that could use it. > > Here''s more or less what I''ve collected... > > 1) Excess ditto block removing + other green-bytes zfs+ features - > *open source* (very hard.. can''t be done in two months) > 2) raidz boot support (planning phase and suitable student already > found. could use more docs/info for proposal) > 3) Additional zfs compression choices (good for archiving non-text > files? > 4) zfs cli interface to add safety checks (save your butt from > deleting a pool worth more than your job) > 5) Web or gui based admin interface > 6) zfs defrag (was mentioned by someone working around petabytes of > data..) > 7) vdev evacuation as an upgrade path (which may depend or take > advantage of zfs resize/shrink code) > 8) zfs restore/repair tools (being worked on already?) > 9) Timeslider ported to kde4.2 ( *cough* couldn''t resist, but put > this on the list) > 10) Did I miss something.. > > #2 Currently planning and collecting as much information for the > proposal as possible. Today all ufs + solaris grub2 issues were > resolved and will likely be committed to upstream soon. There is a > one liner fix in the solaris kernel also needed, but that can be > binary hacked worst case. > > #5/9 This also may be possible for an outside project.. either web > showcase or tighter desktop integration.. > > The rest may just be too difficult in a two month period, not > something which can go upstream or not enough time to really plan well > enough.. Even if this isn''t done for gsoc it may still be possible > for the community to pursue some of these.. > > To be a mentor will most likely require answering daily/weekly > technical questions, ideally being on irc and having patience. On top > of this I''ll be available to help as much as technically possible, > keep the student motivated and the projects on schedule. > > ./Christopher > > > #ospkg irc.freenode.net - (Mostly OpenSolaris development rambling) > > > _______________________________________________ > opensolaris-discuss mailing list > opensolaris-discuss at opensolaris.org
Gary Mills
2009-Mar-04 15:52 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Tue, Mar 03, 2009 at 11:35:40PM +0200, "C. Bergstr?m" wrote:> > Here''s more or less what I''ve collected... >[..]> 10) Did I miss something..I suppose my RFE for two-level ZFS should be included, unless nobody intends to attach a ZFS file server to a SAN with ZFS on application servers. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Tim
2009-Mar-04 16:53 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Wed, Mar 4, 2009 at 9:52 AM, Gary Mills <mills at cc.umanitoba.ca> wrote:> On Tue, Mar 03, 2009 at 11:35:40PM +0200, "C. Bergstr?m" wrote: > > > > Here''s more or less what I''ve collected... > > > [..] > > 10) Did I miss something.. > > I suppose my RFE for two-level ZFS should be included, unless nobody > intends to attach a ZFS file server to a SAN with ZFS on application > servers. > > -- > -Gary Mills- -Unix Support- -U of M Academic Computing and > Networking- >Although it seems your idea fell on a lot of deaf ears, I personally think it''s a great one. It would give people a reason to use Solaris as their server OS of choice as well as their fileserver/storage appliance. It would also give them something NetApp doesn''t have, instead of a bunch of me-too''s (no, we don''t need to start a flame-fest, it''s my opinion, years of reading this list haven''t changed it). As it stands, either you put your workload where the disks are, or you''re wasting disk doing raid twice. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090304/00f04eaa/attachment.html>
Miles Nordin
2009-Mar-04 18:14 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
>>>>> "nw" == Nicolas Williams <Nicolas.Williams at sun.com> writes:nw> IIRC Jeff Bonwick has said on this list that ubberblock nw> rollback on import is now his higher priority. So working on nw> (8) would be duplication of effort. well...if your recovery tool worked by using an older ueberblock. but Anton had another idea to write one that treats the pool being recovered as read-only. For example when someone drinks too much kool-aid and sets ''copies=2'' on a pool with two unredundant vdev''s, and then loses a disk and says ``ok now where''s my data please,'''' Anton''s copy-out rescuing tool might help. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090304/16c24cff/attachment.bin>
Miles Nordin
2009-Mar-04 18:20 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
>>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes:gm> I suppose my RFE for two-level ZFS should be included, Not that my opinion counts for much, but I wasn''t deaf to it---I did respond. I thought it was kind of based on mistaken understanding. It included this strangeness of the upper ZFS ``informing'''' the lower one when corruption had occured on the network, and the lower ZFS was supposed to do something with the physical disks...to resolve corruption on the network? why? IIRC several others pointed out the same bogosity. It makes slightly more sense in the write direction than the read direction maybe, but I still don''t fully get the plan. It is a new protocol to replace iSCSI? or NFS? or, what? Is it a re-invention of pNFS or Lustre, but with more work since you''re starting from zero, and less architectural foresight? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090304/33485f16/attachment.bin>
Bob Friesenhahn
2009-Mar-04 18:28 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
I don''t know if anyone has noticed that the topic is "google summer of code". There is only so much that a starving college student can accomplish from a dead-start in 1-1/2 months. The ZFS equivalent of eliminating world hunger is not among the tasks which may be reasonably accomplished, yet tasks at this level of effort is all that I have seen mentioned here. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
"C. Bergström"
2009-Mar-04 18:45 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
Bob Friesenhahn wrote:> I don''t know if anyone has noticed that the topic is "google summer of > code". There is only so much that a starving college student can > accomplish from a dead-start in 1-1/2 months. The ZFS equivalent of > eliminating world hunger is not among the tasks which may be > reasonably accomplished, yet tasks at this level of effort is all that > I have seen mentioned here.May I interject a bit.. I''m silently collecting this task list and even outside of gsoc may help try to arrange it from a community perspective. Of course this will be volunteer based unless /we/ get a sponsor or sun beats /us/ to it. So all the crazy ideas welcome.. ./C
Toby Thain
2009-Mar-04 19:09 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On 4-Mar-09, at 1:28 PM, Bob Friesenhahn wrote:> I don''t know if anyone has noticed that the topic is "google summer > of code". There is only so much that a starving college student > can accomplish from a dead-start in 1-1/2 months. The ZFS > equivalent of eliminating world hunger is not among the tasks which > may be reasonably accomplished, yet tasks at this level of effort > is all that I have seen mentioned here.Item (4) (CLI sudden death warnings) is not of world hunger scope. Item (5) (GUI) isn''t rocket science. Item (3) (more compressors) is a bit harder but do-able. The rest get into serious engineering, agreed. :) --T> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/ > bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Wes Felter
2009-Mar-04 20:16 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
C. Bergstr?m wrote:> > 10) Did I miss something..T10 DIF support in zvols T10 UNMAP/thin provisioning support in zvols proportional scheduling for storage performance slog and L2ARC on the same SSD These are probably difficult but hopefully not "world hunger" level. Wes Felter - wesley at felter.org
Nicolas Williams
2009-Mar-04 20:37 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Wed, Mar 04, 2009 at 02:16:53PM -0600, Wes Felter wrote:> T10 UNMAP/thin provisioning support in zvolsThat''s probably simple enough, and sufficiently valuable too.
Richard Elling
2009-Mar-04 20:49 UTC
[zfs-discuss] schedulers [was: zfs related google summer of code ideas - your vote]
Wes Felter wrote:> proportional scheduling for storage performance > slog and L2ARC on the same SSD >The current scheduler is rather simple, there might be room for improvements -- but that may be a rather extended research topic. But I''m curious as to why you would want to put both the slog and L2ARC on the same SSD? Is it because the slog tends to be really small and you don''t need a large SSD for it? Or is it because you see the SSD vendors (finally) coming up with a reasonably priced SSD which is both large and handles small writes very quickly? -- richard
Wes Felter
2009-Mar-04 21:02 UTC
[zfs-discuss] schedulers [was: zfs related google summer of code ideas - your vote]
Richard Elling wrote:> Wes Felter wrote: >> proportional scheduling for storage performance >> slog and L2ARC on the same SSD >> > > The current scheduler is rather simple, there might be room for > improvements -- but that may be a rather extended research topic.Yes. For GSoC it would probably be wise to limit the scope to implementing an existing algorithm from the literature.> But I''m curious as to why you would want to put both the slog and > L2ARC on the same SSD?I''m thinking about small NASes where you can''t justify the cost of two or three SSDs. How much performance can you get with one SSD? Wes Felter - wesley at felter.org
Julius Roberts
2009-Mar-04 22:53 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
2009/3/4 Tim <tim at tcsac.net>:> I know plenty of "home users" would like the ability to add a single disk to > a raid-z vdev in order to grow a disk at a time.+1 for that. -- Kind regards, Jules
David Dyer-Bennet
2009-Mar-04 23:22 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Wed, March 4, 2009 16:53, Julius Roberts wrote:> 2009/3/4 Tim <tim at tcsac.net>: >> I know plenty of "home users" would like the ability to add a single >> disk to >> a raid-z vdev in order to grow a disk at a time. > > +1 for that.In theory I''d like that a lot. I''m now committed to two two-disk mirrors precisely because I couldn''t grow a RAIDZ (and would have to replace all three+ disks to expand it that way); I started with one mirror, added a second when I needed it, and when I need to increase again I will replace the disks in one of the mirrors one at a time letting it resilver (yes, I know I''m not covered by redundancy during that process; I''ll do it when my backups are up-to-date. It''s a home server and I can have it out of service during the resilver, or in "no guarantee of saving changes" mode.). And on consideration, depending on the expected lifespan of the system, I might be better off sticking with the mirror strategy anyway. Of course RAIDZ is more cost-efficient, but disks are cheap. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Gary Mills
2009-Mar-05 00:35 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote:> >>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes: > > gm> I suppose my RFE for two-level ZFS should be included, > > Not that my opinion counts for much, but I wasn''t deaf to it---I did > respond.I appreciate that.> I thought it was kind of based on mistaken understanding. It included > this strangeness of the upper ZFS ``informing'''' the lower one when > corruption had occured on the network, and the lower ZFS was supposed > to do something with the physical disks...to resolve corruption on the > network? why? IIRC several others pointed out the same bogosity.It''s a simply a consequence of ZFS''s end-to-end error detection. There are many different components that could contribute to such errors. Since only the lower ZFS has data redundancy, only it can correct the error. Of course, if something in the data path consistently corrupts the data regardless of its origin, it won''t be able to correct the error. The same thing can happen in the simple case, with one ZFS over physical disks.> It makes slightly more sense in the write direction than the read > direction maybe, but I still don''t fully get the plan. It is a new > protocol to replace iSCSI? or NFS? or, what? Is it a re-invention > of pNFS or Lustre, but with more work since you''re starting from zero, > and less architectural foresight?I deliberately did not specify the protocol to keep the concept general. Anything that works and solves the problem would be good. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Toby Thain
2009-Mar-05 01:02 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On 4-Mar-09, at 7:35 PM, Gary Mills wrote:> On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote: >>>>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes: >> >> gm> I suppose my RFE for two-level ZFS should be included, >> >> Not that my opinion counts for much, but I wasn''t deaf to it---I did >> respond. > > I appreciate that. > >> I thought it was kind of based on mistaken understanding. It >> included >> this strangeness of the upper ZFS ``informing'''' the lower one when >> corruption had occured on the network, and the lower ZFS was supposed >> to do something with the physical disks...to resolve corruption on >> the >> network? why? IIRC several others pointed out the same bogosity. > > It''s a simply a consequence of ZFS''s end-to-end error detection. > There are many different components that could contribute to such > errors. Since only the lower ZFS has data redundancy, only it can > correct the error.Why aren''t application level checksums the answer to this "problem"? --Toby> Of course, if something in the data path > consistently corrupts the data regardless of its origin, it won''t be > able to correct the error. The same thing can happen in the simple > case, with one ZFS over physical disks. > >> It makes slightly more sense in the write direction than the read >> direction maybe, but I still don''t fully get the plan. It is a new >> protocol to replace iSCSI? or NFS? or, what? Is it a re-invention >> of pNFS or Lustre, but with more work since you''re starting from >> zero, >> and less architectural foresight? > > I deliberately did not specify the protocol to keep the concept > general. Anything that works and solves the problem would be good. > > -- > -Gary Mills- -Unix Support- -U of M Academic Computing and > Networking- > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bill Sommerfeld
2009-Mar-05 01:31 UTC
[zfs-discuss] schedulers [was: zfs related google summer of code ideas - your vote]
On Wed, 2009-03-04 at 12:49 -0800, Richard Elling wrote:> But I''m curious as to why you would want to put both the slog and > L2ARC on the same SSD?Reducing part count in a small system. For instance: adding L2ARC+slog to a laptop. I might only have one slot free to allocate to ssd. IMHO the right administrative interface for this is for zpool to allow you to add the same device to a pool as both cache and ssd, and let zfs figure out how to not step on itself when allocating blocks. - Bill
Dave
2009-Mar-05 01:31 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
Gary Mills wrote:> On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote: >>>>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes: >> gm> I suppose my RFE for two-level ZFS should be included, >> >> Not that my opinion counts for much, but I wasn''t deaf to it---I did >> respond. > > I appreciate that. > >> I thought it was kind of based on mistaken understanding. It included >> this strangeness of the upper ZFS ``informing'''' the lower one when >> corruption had occured on the network, and the lower ZFS was supposed >> to do something with the physical disks...to resolve corruption on the >> network? why? IIRC several others pointed out the same bogosity. > > It''s a simply a consequence of ZFS''s end-to-end error detection. > There are many different components that could contribute to such > errors. Since only the lower ZFS has data redundancy, only it can > correct the error. Of course, if something in the data path > consistently corrupts the data regardless of its origin, it won''t be > able to correct the error. The same thing can happen in the simple > case, with one ZFS over physical disks.I would argue against building this into ZFS. Any corruption happening on the wire should not be the responsibility of ZFS. If you want to make sure your data is not corrupted over the wire, use IPSec. If you want to prevent corruption in RAM, use ECC sticks, etc. -- Dave
Nathan Kroenert
2009-Mar-05 01:38 UTC
[zfs-discuss] schedulers [was: zfs related google summer of code ideas - your vote]
Hm - a ZilArc?? Or, slarc? Or L2ArZi I''m tried something sort of similar to this when fooling around, adding different *slices* for ZIL / L2ARC but as I''m too poor to afford good SSD''s my resolut was poor at beat... ;) Having ZFS manage some ''arbitrary fast stuff'' and sorting out it''s own ZIL and L2ARC would be interesting, though, given the propensity for SSD''s to be either fast read or fast write at the moment, you may well require some whacky knobs to get it to do what you actually want it to... hm. Nathan. Bill Sommerfeld wrote:> On Wed, 2009-03-04 at 12:49 -0800, Richard Elling wrote: >> But I''m curious as to why you would want to put both the slog and >> L2ARC on the same SSD? > > Reducing part count in a small system. > > For instance: adding L2ARC+slog to a laptop. I might only have one slot > free to allocate to ssd. > > IMHO the right administrative interface for this is for zpool to allow > you to add the same device to a pool as both cache and ssd, and let zfs > figure out how to not step on itself when allocating blocks. > > - Bill > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- /////////////////////////////////////////////////////////////// // Nathan Kroenert nathan.kroenert at sun.com // // Senior Systems Engineer Phone: +61 3 9869 6255 // // Global Systems Engineering Fax: +61 3 9869 6288 // // Level 7, 476 St. Kilda Road // // Melbourne 3004 Victoria Australia // ///////////////////////////////////////////////////////////////
Gary Mills
2009-Mar-05 01:49 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On Wed, Mar 04, 2009 at 06:31:59PM -0700, Dave wrote:> Gary Mills wrote: > >On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote: > >>>>>>>"gm" == Gary Mills <mills at cc.umanitoba.ca> writes: > >> gm> I suppose my RFE for two-level ZFS should be included, > > > >It''s a simply a consequence of ZFS''s end-to-end error detection. > >There are many different components that could contribute to such > >errors. Since only the lower ZFS has data redundancy, only it can > >correct the error. Of course, if something in the data path > >consistently corrupts the data regardless of its origin, it won''t be > >able to correct the error. The same thing can happen in the simple > >case, with one ZFS over physical disks. > > I would argue against building this into ZFS. Any corruption happening > on the wire should not be the responsibility of ZFS. If you want to make > sure your data is not corrupted over the wire, use IPSec. If you want to > prevent corruption in RAM, use ECC sticks, etc.But what if the `wire'' is a SCSI bus? Would you want ZFS to do error correction in that case? There are many possible wires. Every component does its own error checking of some sort, but in its own domain. This brings us back to end-to-end error checking again. Since we are designing a filesystem, that''s where the reliability should reside. -- -Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
Dave
2009-Mar-05 02:43 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
Gary Mills wrote:> On Wed, Mar 04, 2009 at 06:31:59PM -0700, Dave wrote: >> Gary Mills wrote: >>> On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote: >>>>>>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes: >>>> gm> I suppose my RFE for two-level ZFS should be included, >>> It''s a simply a consequence of ZFS''s end-to-end error detection. >>> There are many different components that could contribute to such >>> errors. Since only the lower ZFS has data redundancy, only it can >>> correct the error. Of course, if something in the data path >>> consistently corrupts the data regardless of its origin, it won''t be >>> able to correct the error. The same thing can happen in the simple >>> case, with one ZFS over physical disks. >> I would argue against building this into ZFS. Any corruption happening >> on the wire should not be the responsibility of ZFS. If you want to make >> sure your data is not corrupted over the wire, use IPSec. If you want to >> prevent corruption in RAM, use ECC sticks, etc. > > But what if the `wire'' is a SCSI bus? Would you want ZFS to do error > correction in that case? There are many possible wires. Every > component does its own error checking of some sort, but in its own > domain. This brings us back to end-to-end error checking again. Since > we are designing a filesystem, that''s where the reliability should > reside. >ZFS can''t eliminate or prevent all errors. You should have a split backplane/multiple controllers and a minimum 2-way mirror if you''re concerned about this from a local component POV. Same with iSCSI. I run a minimum 2-way mirror from my ZFS server from 2 different NICs, over 2 gigabit switches w/trunking to two different disk shelves for this reason. I do not stack ZFS layers, since it degrades performance and really doesn''t provide any benefit. What''s your reason for stacking zpools? I can''t recall the original argument for this. -- Dave
Darren J Moffat
2009-Mar-05 10:13 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
Wes Felter wrote:> slog and L2ARC on the same SSDYou can do that already today. Create two slices using format(1M) and add the slices rather than the whole disk as the L2ARC or slog device. However IMO this is with current SSD techology probably the wrong thing to do. The slog wants a very fast write devices, the L2ARC wants a very fast read device. Unless your SSD is fast at both reads and writes using the same SSD for the slog and L2ARC isn''t a good idea. -- Darren J Moffat
Richard Elling
2009-Mar-05 17:30 UTC
[zfs-discuss] schedulers [was: zfs related google summer of code ideas - your vote]
Nathan Kroenert wrote:> Hm - a ZilArc?? > > Or, slarc? > > Or L2ArZi > > I''m tried something sort of similar to this when fooling around, > adding different *slices* for ZIL / L2ARC but as I''m too poor to > afford good SSD''s my resolut was poor at beat... ;)Perfectly predictable. zilstat will show you the size of iops in the ZIL writes. It seems consistent that latency-sensitive apps (NFS service, databases) have a need for many fast, small sync writes. While you do not pay a seek/rotate latency penalty in an SSD, you may pay a page erase penalty. For modern SSDs, the write-optimized use DRAM to help eliminate the page erase penalty, but also cost a lot more, because they have more parts. Meanwhile, the read-optimized SSDs tend to use MLC flash and pay a large erase penalty, but can read very fast. The L2ARC is a read cache, so we can optimize the writes to L2ARC by making them large, thus reducing the impact of the erase penalty. Also, during the time required to write the L2ARC, the data is also in the ARC, so there is no read penalty during the write to L2ARC. The win is apparent after the data is evicted from the ARC and we can read it faster from the L2ARC than we can from the main pool. For this case, not paying the seek/rotate penalty is a huge win for SSDs. To help you understand how this works, you might remember: + None of the data in the slog is in the main pool, yet. + All of the data in the L2ARC is in the ARC or main pool already. -- richard
Miles Nordin
2009-Mar-05 19:03 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
>>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes:gm> There are many different components that could contribute to gm> such errors. yes of course. gm> Since only the lower ZFS has data redundancy, only it can gm> correct the error. um, no? An example already pointed out: kerberized NFS will detect network errors that sneak past the weak TCP checksum, and resend the data. This will work even on an unredundant, unchecksummed UFS filesystem to correct network-induced errors. There is no need for NFS to ``inform'''' UFS so that UFS can use ``redundancy'''' to ``correct the error''''. UFS never hears anything, and doesn''t have any redundancy. NFS resends the data. done. iSCSI also has application-level CRC''s, seperately enableable for headers and data. not sure what FC has. It doesn''t make any sense to me that some higher layer would call back to the ZFS stack on the bottom, and tell it to twiddle with disks because there''s a problem with the network. An idea Richard brought up months ago was ``protection domains,'''' that it might be good to expose ZFS checksums to higher levels to stretch a single protection domain as far as possible upwards in the stack. Application-level checksums also form a single protection domain, for _reading_. Suppose corruption happens in RAM or on the network (where the ZFS backing store cannot detect it), while reading a gzip file on an NFS client. gzip will always warn you! This is end-to-end, and will warn you just as perfectly as hypothetical end-to-end networkified-ZFS. The problem: there''s no way for gzip to ``retry''''. You can run gunzip again, but it will just fail again and again because the file with network-induced errors is cached on the NFS client. It''s the ``cached badness'''' problem Richard alluded to. You would have to reboot the NFS client to clear its read cache, then try gunzip again. This is probably good enough in practice, but it sounds like there''s room for improvement. It''s irrelevant in this scenario that the lower ZFS has ``redundancy''''. All you have to do to fix the problem is resend the read over the network. What would be nice to have, that we don''t have, is a way of keeping ZFS block checksums attached to the data as it travels over the network until it reaches the something-like-an-NFS-client. Each part of the stack that caches data could be trained to either (1) validate ZFS block checksums, or (2) to obey ``read no-cache'''' commands passed down from the layer above. In the application-level gzip example, gzip has no way of doing (2), so extending the protection domain upward rather than pushing cache-flushing obedience downward seems more practical. For writing, application-level checksums do NOT work at all, because you would write corrupt data to the disk, and notice only later when you read it back, when there''s nothing you can do to fix it. ZFS redundancy will not help you here either, because you write corrupt data redundantly! With a single protection domain for writing, the write would arrive at ZFS along with a never-regenerated checksum wrapper-seal attached to it by the something-like-an-NFS-client. Just before ZFS sends the write to the disk driver, ZFS would crack the protection domain open, validate the checksum, reblock the write, and send it to disk with a new checksum. (so, ``single protection domain'''' is really a single domain for reads, and two protection domains for write) If the checksum does not match, ZFS must convince the writing client to resend---in the write direction I think cached bad data will be less of a problem. I think a single protection domain, rather than the currently best-obtainable which is sliced domains where the slices butt up against each other as closely as possible, is an idea with merit. but it doesn''t have anything whatsoever to do with the fact ZFS stores things redundantly on the platters. The whole thing would have just as much merit, and fix the new problem classes it addresses just as frequently!, for single-disk vdev''s as for redundant vdev''s. gm> Of course, if something in the data path consistently corrupts gm> the data regardless of its origin, it won''t be able to correct gm> the error. TCP does this all the time, right? see, watch this: +++ATH0 :) that aside, your idea of ``the error'''' seems too general, like the annoying marketing slicks with the ``healing'''' and ``correcting'''' stuff. stored, transmitted, cached errors are relevantly different, which also means corruption in the read and write directions are different. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090305/35465dcf/attachment.bin>
Toby Thain
2009-Mar-05 20:39 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
On 5-Mar-09, at 2:03 PM, Miles Nordin wrote:>>>>>> "gm" == Gary Mills <mills at cc.umanitoba.ca> writes: > > gm> There are many different components that could contribute to > gm> such errors. > > yes of course. > > gm> Since only the lower ZFS has data redundancy, only it can > gm> correct the error. > > um, no? > ... > For writing, application-level checksums do NOT work at all, because > you would write corrupt data to the disk, and notice only later when > you read it back, when there''s nothing you can do to fix it.Right, it would have to be combined with an always read-back policy in the application...? --Toby> ZFS > redundancy will not help you here either, because you write corrupt > data redundantly! With a single protection domain for writing, the > write would arrive at ZFS along with a never-regenerated checksum > wrapper-seal attached to it by the something-like-an-NFS-client. Just > before ZFS sends the write to the disk driver, ZFS would crack the > protection domain open, validate the checksum, reblock the write, and > send it to disk with a new checksum. (so, ``single protection > domain'''' is really a single domain for reads, and two protection > domains for write) If the checksum does not match, ZFS must convince > the writing client to resend---in the write direction I think cached > bad data will be less of a problem. > ...
Miles Nordin
2009-Mar-05 20:46 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
>>>>> "tt" == Toby Thain <toby at telegraphics.com.au> writes:c> For writing, application-level checksums do NOT work at all, c> because you would write corrupt data to the disk, and notice c> only later when you read it back tt> Right, it would have to be combined with an always read-back tt> policy in the application...? aside from being wasteful, that won''t work because of ``cached goodness''''. :) It''s allegedly common for programs to read what they''ve just written during normal operation, so various caches usually consider whatever''s written as ``recently used'''' for purposes of expiring it from the read cache. If corruption happens below a cache (ex. in the NFS client), as it probably will, you won''t detect it by read-after-write. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090305/4a62925e/attachment.bin>
Dave
2009-Mar-06 19:40 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
C. Bergstr?m wrote:> Bob Friesenhahn wrote: >> I don''t know if anyone has noticed that the topic is "google summer of >> code". There is only so much that a starving college student can >> accomplish from a dead-start in 1-1/2 months. The ZFS equivalent of >> eliminating world hunger is not among the tasks which may be >> reasonably accomplished, yet tasks at this level of effort is all that >> I have seen mentioned here. > May I interject a bit.. I''m silently collecting this task list and even > outside of gsoc may help try to arrange it from a community > perspective. Of course this will be volunteer based unless /we/ get a > sponsor or sun beats /us/ to it. So all the crazy ideas welcome.. >I would really like to see a feature like ''zfs diff fs at snap1 fs at othersnap'' that would report the paths of files that have either been added, deleted, or changed between snapshots. If this could be done at the ZFS level instead of the application level it would be very cool. -- Dave
Tim Haley
2009-Mar-06 20:05 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
Dave wrote:> C. Bergstr?m wrote: >> Bob Friesenhahn wrote: >>> I don''t know if anyone has noticed that the topic is "google summer >>> of code". There is only so much that a starving college student can >>> accomplish from a dead-start in 1-1/2 months. The ZFS equivalent of >>> eliminating world hunger is not among the tasks which may be >>> reasonably accomplished, yet tasks at this level of effort is all >>> that I have seen mentioned here. >> May I interject a bit.. I''m silently collecting this task list and >> even outside of gsoc may help try to arrange it from a community >> perspective. Of course this will be volunteer based unless /we/ get a >> sponsor or sun beats /us/ to it. So all the crazy ideas welcome.. >> > > I would really like to see a feature like ''zfs diff fs at snap1 > fs at othersnap'' that would report the paths of files that have either been > added, deleted, or changed between snapshots. If this could be done at > the ZFS level instead of the application level it would be very cool. > > -- > Dave > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussthis is actually in the works. There is a functioning prototype. -tim
Kyle Kakligian
2009-Mar-06 22:24 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
Its been suggested before and I''ve heard its in the freebsd port... support for spindown? On Fri, Mar 6, 2009 at 11:40 AM, Dave <dave-zfs at dubkat.com> wrote:> C. Bergstr?m wrote: >> >> Bob Friesenhahn wrote: >>> >>> I don''t know if anyone has noticed that the topic is "google summer of >>> code". There is only so much that a starving college student can accomplish >>> from a dead-start in 1-1/2 months. The ZFS equivalent of eliminating world >>> hunger is not among the tasks which may be reasonably accomplished, yet >>> tasks at this level of effort is all that I have seen mentioned here. >> >> May I interject a bit.. I''m silently collecting this task list and even >> outside of gsoc may help try to arrange it from a community perspective. Of >> course this will be volunteer based unless /we/ get a sponsor or sun beats >> /us/ to it. So all the crazy ideas welcome.. >> > > I would really like to see a feature like ''zfs diff fs at snap1 fs at othersnap'' > that would report the paths of files that have either been added, deleted, > or changed between snapshots. If this could be done at the ZFS level instead > of the application level it would be very cool. > > -- > Dave > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Bart Smaalders
2009-Mar-07 02:49 UTC
[zfs-discuss] zfs related google summer of code ideas - your vote
>> I would really like to see a feature like ''zfs diff fs at snap1 fs at othersnap'' >> that would report the paths of files that have either been added, deleted, >> or changed between snapshots. If this could be done at the ZFS level instead >> of the application level it would be very cool. >> >> --AFAIK, this is being actively developed, w/ a prototype working... - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts "You will contribute more with mercurial than with thunderbird."
Cyril Plisko
2009-Mar-07 19:25 UTC
[zfs-discuss] [osol-discuss] zfs related google summer of code ideas - your vote
On Wed, Mar 4, 2009 at 10:37 PM, Nicolas Williams <Nicolas.Williams at sun.com> wrote:> On Wed, Mar 04, 2009 at 02:16:53PM -0600, Wes Felter wrote: >> T10 UNMAP/thin provisioning support in zvols > > That''s probably simple enough, and sufficiently valuable too.I evaluated such feature implementation in ZFS (at large, not only for zvols) some time ago for a client of mine. The result was that it is certainly doable (and it is not too simple, btw). Alas, since that that client seems to have "adjusted" its priorities :( -- Regards, Cyril