Hi there, I am working for a customer right now who is considering Red Hat Storage Server. One of the sought-after features there is bit-rot detection, and even better, (semi-automatic) bit-rot restoration. I know RHSS (nor GlusterFS) has this kind of functionality at the moment (correct me if i'm wrong!), but I would like to propose a design for this as a sort of translator that can be stacked on i.e. a (geo) replication translator. Bit-rot detection can be done through check-summing. It should be a very low priority job running on one of the bricks. The job walks the complete file system and, per file, calculates the check-sum, compares it with the stored check-sum (if present, otherwise it stores the check-sum on all involved bricks, because it hasn't been checked before). Bit-rot restoration could be implemented by comparing the check-sums of the replicas. If there is a mismatch, a more thorough check must be performed, like running a check-sum on all replica's for that file again, do a bit-wise compare, or whatever. If the files are still the same, the check-sum(s) must be replaced. If not, there is actual bit-rot detected. Now what to do? Which replica holds the clean version (the thruth?). With an uneven number of replicas one could simply make it a democratic process and have it fully automated. It should however save the to be replaced version in a separate store and notify the admin for verification. Another method would be to just notify the admin and do nothing. The obvious place to store the check-sums would be in the extended attributes, but one could use a database for it. I have watch the presentation Red Hat Summit 2012 - A Deep Dive Into Red Hat Storage<https://access.redhat.com/knowledge/videos/red-hat-summit-2012-deep-dive-red-hat-storage> by Jeff Darcy and I know he (and Red Hat) are very keen on extending the number of translators with useful functionality. I am no programmer myself, but would like to get involved in this kind of stuff. Comments are very welcome! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120722/36f066d0/attachment.html>
Zach, You are right, but I need it in RHSS (Red Hat Storage Server). Also, I would like to see the detect and repair functionality at the "logical" level, not at the brick level. That means it needs to be part of RHSS, probably as a translator of some sorts. Fred On Sun, Jul 22, 2012 at 12:16 PM, Zach Underwood <zunder1990 at gmail.com>wrote:> I may be wrong but if you use btrfs as the brick filesystem. Btrfs has a > check sum feature. If you would like btrfs in a stable os you can use suse > or wait for RHEL 7 which is due next year. > > On Sun, Jul 22, 2012 at 5:14 AM, Fred van Zwieten <fvzwieten at vxcompany.com > > wrote: > >> Hi there, >> >> I am working for a customer right now who is considering Red Hat Storage >> Server. One of the sought-after features there is bit-rot detection, and >> even better, (semi-automatic) bit-rot restoration. >> >> I know RHSS (nor GlusterFS) has this kind of functionality at the moment >> (correct me if i'm wrong!), but I would like to propose a design for this >> as a sort of translator that can be stacked on i.e. a (geo) replication >> translator. >> >> Bit-rot detection can be done through check-summing. It should be a very >> low priority job running on one of the bricks. The job walks the complete >> file system and, per file, calculates the check-sum, compares it with the >> stored check-sum (if present, otherwise it stores the check-sum on all >> involved bricks, because it hasn't been checked before). >> >> Bit-rot restoration could be implemented by comparing the check-sums of >> the replicas. If there is a mismatch, a more thorough check must be >> performed, like running a check-sum on all replica's for that file again, >> do a bit-wise compare, or whatever. If the files are still the same, >> the check-sum(s) must be replaced. If not, there is >> actual bit-rot detected. Now what to do? Which replica holds the clean >> version (the thruth?). With an uneven number of replicas one could simply >> make it a democratic process and have it fully automated. It should however >> save the to be replaced version in a separate store and notify the admin >> for verification. Another method would be to just notify the admin and do >> nothing. >> >> The obvious place to store the check-sums would be in the extended >> attributes, but one could use a database for it. >> >> I have watch the presentation Red Hat Summit 2012 - A Deep Dive Into Red >> Hat Storage<https://access.redhat.com/knowledge/videos/red-hat-summit-2012-deep-dive-red-hat-storage> by >> Jeff Darcy and I know he (and Red Hat) are very keen on extending the >> number of translators with useful functionality. I am no programmer myself, >> but would like to get involved in this kind of stuff. >> >> Comments are very welcome! >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >> > > > -- > Zach Underwood (RHCE,RHCSA,RHCT) > My website <http://zachunderwood.me> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120722/3425912c/attachment.html>
Hi Fred, If you are interested in hearing more about RHSS, this really isn't the right forum for it. However, if you would like to discuss creating a translator for GlusterFS, we would love to talk to you. You should post this to the gluster-devel mailing list - http://lists.nongnu.org/mailman/listinfo/gluster-devel Thanks, JM ----- Fred van Zwieten <fvzwieten at vxcompany.com> wrote:> Hi there,I am working for a customer right now who is considering Red Hat Storage Server. One of the sought-after features there is bit-rot detection, and even better, (semi-automatic) bit-rot restoration.>I know RHSS (nor GlusterFS) has this kind of functionality at the moment (correct me if i'm wrong!), but I would like to propose a design for this as a sort of translator that can be stacked on i.e. a (geo) replication translator.>Bit-rot detection can be done through check-summing. It should be a very low priority job running on one of the bricks. The job walks the complete file system and, per file, calculates the check-sum, compares it with the stored check-sum (if present, otherwise it stores the check-sum on all involved bricks, because it hasn't been checked before).>Bit-rot restoration could be implemented by comparing the check-sums of the replicas. If there is a mismatch, a more thorough check must be performed, like running a check-sum on all replica's for that file again, do a bit-wise compare, or whatever. If the files are still the same, the check-sum(s) must be replaced. If not, there is actual bit-rot detected. Now what to do? Which replica holds the clean version (the thruth?). With an uneven number of replicas one could simply make it a democratic process and have it fully automated. It should however save the to be replaced version in a separate store and notify the admin for verification. Another method would be to just notify the admin and do nothing.>The obvious place to store the check-sums would be in the extended attributes, but one could use a database for it. I have watch the presentation Red Hat Summit 2012 - A Deep Dive Into Red Hat Storage by Jeff Darcy and I know he (and Red Hat) are very keen on extending the number of translators with useful functionality. I am no programmer myself, but would like to get involved in this kind of stuff.>Comments are very welcome! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120723/f9d032ce/attachment.html>