On Mon, May 1, 2017 at 10:39 PM, Gandalf Corvotempesta < gandalf.corvotempesta at gmail.com> wrote:> 2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com>: > > Yes this is precisely what all the other SDS with metadata servers kind > of > > do. They kind of keep a map of on what all servers a particular > file/blob is > > stored in a metadata server. > > Not exactly. Other SDS has some servers dedicated to metadata and, > personally, I don't like that approach. > > > GlusterFS doesn't do that. In GlusterFS what > > bricks need to be replicated is always given and distribute layer on top > of > > these replication layer will do the job of distributing and fetching the > > data. Because replication happens at a brick level and not at a file > level > > and distribute happens on top of replication and not at file level. There > > isn't too much metadata that needs to be stored per file. Hence no need > for > > separate metadata servers. > > And this is great, that's why i'm talking about embedding a sort of > database > to be stored on all nodes. no metadata servers, only a mapping between > files > and servers. > > > If you know path of the file, you can always know where the file is > stored > > using pathinfo: > > Method-2 in the following link: > > https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/ > > > > You don't need any db. > > For the current gluster yes. > I'm talking about a different thing. > > In a RAID, you have data stored somewhere on the array, with metadata > defining how this data should > be wrote or read. obviously, raid metadata must be stored in a fixed > position, or you won't be able to read > that. > > Something similiar could be added in gluster (i don't know if it would > be hard): you store a file mapping in a fixed > position in gluster, then all gluster clients will be able to know > where a file is by looking at this "metadata" stored in > the fixed position. > > Like ".gluster" directory. Gluster is using some "internal" > directories for internal operations (".shards", ".gluster", ".trash") > A ".metadata" with file mapping would be hard to add ? > > > Basically what you want, if I understood correctly is: > > If we add a 3rd node with just one disk, the data should automatically > > arrange itself splitting itself to 3 categories(Assuming replica-2) > > 1) Files that are present in Node1, Node2 > > 2) Files that are present in Node2, Node3 > > 3) Files that are present in Node1, Node3 > > > > As you can see we arrived at a contradiction where all the nodes should > have > > at least 2 bricks but there is only 1 disk. Hence the contradiction. We > > can't do what you are asking without brick splitting. i.e. we need to > split > > the disk into 2 bricks. > > I don't think so. > Let's assume a replica 2. > > S1B1 + S2B1 > > 1TB each, thus 1TB available (2TB/2) > > Adding a third 1TB disks should increase available space to 1.5TB (3TB/2) >I agree it should. Question is how? What will be the resulting brick-map? -- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170501/d5ce5155/attachment.html>
On Mon, May 1, 2017 at 10:42 PM, Pranith Kumar Karampuri < pkarampu at redhat.com> wrote:> > > On Mon, May 1, 2017 at 10:39 PM, Gandalf Corvotempesta < > gandalf.corvotempesta at gmail.com> wrote: > >> 2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com>: >> > Yes this is precisely what all the other SDS with metadata servers kind >> of >> > do. They kind of keep a map of on what all servers a particular >> file/blob is >> > stored in a metadata server. >> >> Not exactly. Other SDS has some servers dedicated to metadata and, >> personally, I don't like that approach. >> >> > GlusterFS doesn't do that. In GlusterFS what >> > bricks need to be replicated is always given and distribute layer on >> top of >> > these replication layer will do the job of distributing and fetching the >> > data. Because replication happens at a brick level and not at a file >> level >> > and distribute happens on top of replication and not at file level. >> There >> > isn't too much metadata that needs to be stored per file. Hence no need >> for >> > separate metadata servers. >> >> And this is great, that's why i'm talking about embedding a sort of >> database >> to be stored on all nodes. no metadata servers, only a mapping between >> files >> and servers. >> >> > If you know path of the file, you can always know where the file is >> stored >> > using pathinfo: >> > Method-2 in the following link: >> > https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/ >> > >> > You don't need any db. >> >> For the current gluster yes. >> I'm talking about a different thing. >> >> In a RAID, you have data stored somewhere on the array, with metadata >> defining how this data should >> be wrote or read. obviously, raid metadata must be stored in a fixed >> position, or you won't be able to read >> that. >> >> Something similiar could be added in gluster (i don't know if it would >> be hard): you store a file mapping in a fixed >> position in gluster, then all gluster clients will be able to know >> where a file is by looking at this "metadata" stored in >> the fixed position. >> >> Like ".gluster" directory. Gluster is using some "internal" >> directories for internal operations (".shards", ".gluster", ".trash") >> A ".metadata" with file mapping would be hard to add ? >> >> > Basically what you want, if I understood correctly is: >> > If we add a 3rd node with just one disk, the data should automatically >> > arrange itself splitting itself to 3 categories(Assuming replica-2) >> > 1) Files that are present in Node1, Node2 >> > 2) Files that are present in Node2, Node3 >> > 3) Files that are present in Node1, Node3 >> > >> > As you can see we arrived at a contradiction where all the nodes should >> have >> > at least 2 bricks but there is only 1 disk. Hence the contradiction. We >> > can't do what you are asking without brick splitting. i.e. we need to >> split >> > the disk into 2 bricks. >> >> I don't think so. >> Let's assume a replica 2. >> >> S1B1 + S2B1 >> >> 1TB each, thus 1TB available (2TB/2) >> >> Adding a third 1TB disks should increase available space to 1.5TB (3TB/2) >> > > I agree it should. Question is how? What will be the resulting brick-map? >I don't see any solution that we can do without at least 2 bricks on each of the 3 servers.> > > -- > Pranith >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170501/15592416/attachment.html>
2017-05-01 19:12 GMT+02:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:> I agree it should. Question is how? What will be the resulting brick-map?This is why i'm suggesting to add a file mapping somewhere. You could also use xattr for this: "file1" is mapped to GFID, then, as xattr for that GFID, you could save the server/brick location, it this way you always know where a file is. To keep it simple for non-developers like me (this is wrong, it's a simplification): "/tmp/file1" hashes to 306040e474f199e7969ec266afd10d93 hash starts with "3" thus is located on brick3 You don't need any metadata for this, the hash algoritm is the only thing you need. But if you store the file location mapping somewhere (in example as xattr for the GFID file) you can look for the file without using the hash algoritm location. ORIG_FILE="/tmp/file1" GFID="306040e474f199e7969ec266afd10d93" FILE_LOCATION=$(getfattr -n "file_location" $GFID) if $FILE_LOCATION read from $FILE_LOCATION else read from original algoritm