Rumen Telbizov
2015-Mar-18 17:22 UTC
[Gluster-users] Access data directly from underlying storage
Hello everyone, I am curious to know other people's experience with reading data straight from the underlying volume. According to the documentation it should be fine: http://www.gluster.org/documentation/Technical_FAQ/ *Can I directly access the data on the underlying storage volumes?*> > If you are just doing just read()/access()/stat() like operations, you > should be fine. If you are not using any new features (like > quota/geo-replication etc etc) then technically, you can modify (but surely > not rename(2) and link(2)) the data inside. > > Note that this is not tested as part of gluster?s release cycle and not > recommended for production use. >The last sentence doesn't recommend it for production use. I was wondering if there's any other concern besides the fact that it's not tested as part of the release cycle or one could expect actual some problems with the data being read while doing so? What I am interested is *only* read operations (readdir, stat, read data). All the write operations will continue going over the shared/mounted drive. So what I want to know is that the data that I am reading will be consistent with the rest of the bricks and not corrupted in any way. The reason why I am looking into this is two-fold: 1. I have a lot of small files (hundreds of thousands) that need to be read and reread very frequently. Doing so directly from the underlying disk is way faster. 2. Once I've read them once both metadata and actual data is automatically cached in the OS filesystem cache, thus subsequent reads take almost no time since very few of those files actually change between reruns. I run a three (or five) way mirror and nothing fancy (no geo and such) so if I could, I would prefer to read of the local disk? but I want to be certain that those reads, in terms of correctness and consistency, will be equivalent to reading of the shared drive itself. ?Thank you in advance for sharing your experience.? Regards, -- Rumen Telbizov Unix Systems Administrator <http://telbizov.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150318/854df1d0/attachment.html>
Melkor Lord
2015-Mar-19 07:29 UTC
[Gluster-users] Access data directly from underlying storage
On Wed, Mar 18, 2015 at 6:22 PM, Rumen Telbizov <telbizov at gmail.com> wrote:> > > *Can I directly access the data on the underlying storage volumes?* >> >> If you are just doing just read()/access()/stat() like operations, you >> should be fine. If you are not using any new features (like >> quota/geo-replication etc etc) then technically, you can modify (but surely >> not rename(2) and link(2)) the data inside. >> >> Note that this is not tested as part of gluster?s release cycle and not >> recommended for production use. >> > > The last sentence doesn't recommend it for production use. I was wondering > if there's any other concern besides the fact that it's not tested as part > of the release cycle or one could expect actual some problems with the data > being read while doing so? > > What I am interested is *only* read operations (readdir, stat, read > data). All the write operations will continue going over the shared/mounted > drive. So what I want to know is that the data that I am reading will be > consistent with the rest of the bricks and not corrupted in any way. >This is not necessarily a direct answer to your question but I've tested something similar. With a running volume (but not mounted anywhere), I copied a file directly to the underlying FS directory (a tarball) to test how it would react if a client would mount the volume afterwards. When a client mounted the Gluster filesystem (FUSE client), after some time, the tarball I copied on one of the bricks was replicated to the other servers in my 3 replica test environment. I tested the tarball on each gluster server and it was perfectly consistent. During all my other tests, I did things like the one you intend to do. Mounted the gluster volume on a client and copied some big files there. While the copy was doing its job, I directly accessed the resulting file on the servers to see if it was consistent (checking the first few KB of the file to check headers) I haven't found anything to complain about and all seemed consistent to me so I'd say that what you plan to do is fairly safe. -- Unix _IS_ user friendly, it's just selective about who its friends are. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150319/dd9f0d91/attachment.html>