thr3ads.net - Gluster users - [Gluster-users] Reconstructing files from shards [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Jamie Lawrence

2018-Apr-20 19:44 UTC

[Gluster-users] Reconstructing files from shards

Hello,

So I have a volume on a gluster install (3.12.5) on which sharding was enabled
at some point recently. (Don't know how it happened, it may have been an
accidental run of an old script.) So it has been happily sharding behind our
backs and it shouldn't have.

I'd like to turn sharding off and reverse the files back to normal.  Some of
these are sparse files, so I need to account for holes. There are more than
enough that I need to write a tool to do it.

I saw notes ca. 3.7 saying the only way to do it was to read-off on the
client-side, blow away the volume and start over. This would be extremely
disruptive for us, and language I've seen reading tickets and old messages
to this list make me think that isn't needed anymore, but confirmation of
that would be good.

The only discussion I can find are these videos[1]:
http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html
, and some hints[2] that are old enough that I don't trust them without
confirmation that nothing's changed. The video things don't acknowledge
the existence of file holes. Also, the hint in [2] mentions using
trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that
value looks like base64, but when I attempt to decode it, base64 complains about
invalid input.

In short, I can't find sufficient information to reconstruct these. Has
anyone written a current, step-by-step guide on reconstructing sharded files? Or
has someone has written a tool so I don't have to?

Thanks,

-j


[1] Why one would choose to annoy the crap out of their fellow gluster users by
using video to convey about 80 bytes of ASCII-encoded information, I have no
idea.
[2] http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html

Alessandro Briosi

2018-Apr-22 08:39 UTC

head link

[Gluster-users] Reconstructing files from shards

Il 20/04/2018 21:44, Jamie Lawrence ha scritto:> Hello,
> 
> So I have a volume on a gluster install (3.12.5) on which sharding was
enabled at some point recently. (Don't know how it happened, it may have
been an accidental run of an old script.) So it has been happily sharding behind
our backs and it shouldn't have.
> 
> I'd like to turn sharding off and reverse the files back to normal. 
Some of these are sparse files, so I need to account for holes. There are more
than enough that I need to write a tool to do it.
> 
> I saw notes ca. 3.7 saying the only way to do it was to read-off on the
client-side, blow away the volume and start over. This would be extremely
disruptive for us, and language I've seen reading tickets and old messages
to this list make me think that isn't needed anymore, but confirmation of
that would be good.
> 
> The only discussion I can find are these
videos[1]:http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html
, and some hints[2] that are old enough that I don't trust them without
confirmation that nothing's changed. The video things don't acknowledge
the existence of file holes. Also, the hint in [2] mentions using
trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that
value looks like base64, but when I attempt to decode it, base64 complains about
invalid input.
> 
> In short, I can't find sufficient information to reconstruct these. Has
anyone written a current, step-by-step guide on reconstructing sharded files? Or
has someone has written a tool so I don't have to?
Imho the easiest path would be to turn off sharding on the volume and 
simply do a copy of the files (to a different directory, or rename and 
then copy i.e.)

This should simply store the files without sharding.

my 2 cents.

Alessandro

Gandalf Corvotempesta

2018-Apr-22 09:39 UTC

head link

[Gluster-users] Reconstructing files from shards

Il dom 22 apr 2018, 10:46 Alessandro Briosi <ab1 at metalit.com> ha
scritto:
> Imho the easiest path would be to turn off sharding on the volume and
> simply do a copy of the files (to a different directory, or rename and
> then copy i.e.)
>
> This should simply store the files without sharding.
>
If you turn off sharding on a sharded volume with data in it, all sharded
files would be unreadable
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180422/c47baad6/attachment.html>

2018-Apr-23 17:49 UTC

head link

[Gluster-users] Reconstructing files from shards

From some old May 2017 email. I asked the following:

"From the docs, I see you can identify the shards by the GFID

# getfattr -d -m. -e hex/path_to_file/

# ls /bricks/*/.shard -lh | grep /GFID

Is there a gluster tool/script that will recreate the file?

or can you just sort them sort them properly and then simply cat/copy+ 
them back together?

cat shardGFID.1 .. shardGFID.X > thefile "

/
The response from RedHat was:

"Yes, this should work, but you would need to include the base file (the 
0th shard, if you will) first in the list of files that you're stitching 
up.? In the happy case, you can test it by comparing the md5sum of the 
file from the mount to that of your stitched file."


We tested it with some VM files and it indeed worked fine. That was 
probably on 3.10.1 at the time.


-wk



On 4/20/2018 12:44 PM, Jamie Lawrence wrote:> Hello,
>
> So I have a volume on a gluster install (3.12.5) on which sharding was
enabled at some point recently. (Don't know how it happened, it may have
been an accidental run of an old script.) So it has been happily sharding behind
our backs and it shouldn't have.
>
> I'd like to turn sharding off and reverse the files back to normal. 
Some of these are sparse files, so I need to account for holes. There are more
than enough that I need to write a tool to do it.
>
> I saw notes ca. 3.7 saying the only way to do it was to read-off on the
client-side, blow away the volume and start over. This would be extremely
disruptive for us, and language I've seen reading tickets and old messages
to this list make me think that isn't needed anymore, but confirmation of
that would be good.
>
> The only discussion I can find are these videos[1]:
http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html
, and some hints[2] that are old enough that I don't trust them without
confirmation that nothing's changed. The video things don't acknowledge
the existence of file holes. Also, the hint in [2] mentions using
trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that
value looks like base64, but when I attempt to decode it, base64 complains about
invalid input.
>
> In short, I can't find sufficient information to reconstruct these. Has
anyone written a current, step-by-step guide on reconstructing sharded files? Or
has someone has written a tool so I don't have to?
>
> Thanks,
>
> -j
>
>
> [1] Why one would choose to annoy the crap out of their fellow gluster
users by using video to convey about 80 bytes of ASCII-encoded information, I
have no idea.
> [2] http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180423/6b2ca911/attachment.html>

Jamie Lawrence

2018-Apr-23 18:46 UTC

head link

[Gluster-users] Reconstructing files from shards

> On Apr 23, 2018, at 10:49 AM, WK <wkmail at bneit.com> wrote:
> 
> From some old May 2017 email. I asked the following:
> "From the docs, I see you can identify the shards by the GFID
> # getfattr -d -m. -e hex path_to_file
> # ls /bricks/*/.shard -lh | grep GFID
> 
> Is there a gluster tool/script that will recreate the file?
> 
> or can you just sort them sort them properly and then simply cat/copy+ them
back together?
> 
> cat shardGFID.1 .. shardGFID.X > thefile "
> 
> 
> The response from RedHat was:
> 
> "Yes, this should work, but you would need to include the base file
(the 0th shard, if you will) first in the list of files that you're
stitching up.  In the happy case, you can test it by comparing the md5sum of the
file from the mount to that of your stitched file."
> 
> We tested it with some VM files and it indeed worked fine. That was
probably on 3.10.1 at the time.
Thanks for that, WK.

Do you know if those images were sparse files? My understanding is that this
will not work with files with holes.

Quoting from :
http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html

- - snip

1. A non-existent/missing shard anywhere between offset $SHARD_BLOCK_SIZE
through ceiling ($FILE_SIZE/$SHARD_BLOCK_SIZE)
indicates a hole. When you reconstruct data from a sharded file of this
nature, you need to take care to retain this property.

2. The above is also true for partially filled shards between offset
$SHARD_BLOCK_SIZE through ceiling ($FILE_SIZE/$SHARD_BLOCK_SIZE).
What do I mean by partially filled shards? Shards whose sizes are not equal
to $SHARD_BLOCK_SIZE.

In the above, $FILE_SIZE can be gotten from the
'trusted.glusterfs.shard.file-size' extended attribute on the base file
(the 0th block).

- - snip

So it sounds like (although I am not sure, which is why I was writing in the
first place) one would need to use `dd` or similar to read out (
${trusted.glusterfs.shard.file-size} - ($SHARD_BLOCK_SIZE * count) )  bytes from
the partial shard.

Although I also just realized the above quote fails to explain, if a file has a
hole less than $SHARD_BLOCK_SIZE in size, how we know which shard(s) are holey,
so I'm back to thinking reconstruction is undocumented and unsupported
except for reading the files off on a client, blowing away the volume and
reconstructing. Which is a problem.

-j

> -wk
> 
> 
> On 4/20/2018 12:44 PM, Jamie Lawrence wrote:
>> Hello,
>> 
>> So I have a volume on a gluster install (3.12.5) on which sharding was
enabled at some point recently. (Don't know how it happened, it may have
been an accidental run of an old script.) So it has been happily sharding behind
our backs and it shouldn't have.
>> 
>> I'd like to turn sharding off and reverse the files back to normal.
Some of these are sparse files, so I need to account for holes. There are more
than enough that I need to write a tool to do it.
>> 
>> I saw notes ca. 3.7 saying the only way to do it was to read-off on the
client-side, blow away the volume and start over. This would be extremely
disruptive for us, and language I've seen reading tickets and old messages
to this list make me think that isn't needed anymore, but confirmation of
that would be good.
>> 
>> The only discussion I can find are these videos[1]: 
>>
http://opensource-storage.blogspot.com/2016/07/de-mystifying-gluster-shards.html
>>  , and some hints[2] that are old enough that I don't trust them
without confirmation that nothing's changed. The video things don't
acknowledge the existence of file holes. Also, the hint in [2] mentions using
trusted.glusterfs.shard.file-size to get the size of a partly filled hole; that
value looks like base64, but when I attempt to decode it, base64 complains about
invalid input.
>> 
>> In short, I can't find sufficient information to reconstruct these.
Has anyone written a current, step-by-step guide on reconstructing sharded
files? Or has someone has written a tool so I don't have to?
>> 
>> Thanks,
>> 
>> -j
>> 
>> 
>> [1] Why one would choose to annoy the crap out of their fellow gluster
users by using video to convey about 80 bytes of ASCII-encoded information, I
have no idea.
>> [2] 
>> http://lists.gluster.org/pipermail/gluster-devel/2017-March/052212.html
>>

Krutika Dhananjay

2018-Apr-27 04:00 UTC

head link

[Gluster-users] Reconstructing files from shards

The short answer is - no there exists no script currently that can piece
the shards together into a single file.

Long answer:
IMO the safest way to convert from sharded to a single file _is_ by copying
the data out into a new volume at the moment.
Picking up the files from the individual bricks directly and joining them,
although fast, is a strict no-no for many reasons - for example, when you
have a replicated volume
and the good copy needs to be carefully selected and must remain a good
copy through the course of the copying process. There could be other
consistency issues with
file attributes changing while they are being copied. All of this is not
possible, unless you're open to taking the volume down.

Then the other option is to have gluster client (perhaps in the shard
translator itself)) do the conversion in the background within the gluster
translator stack, which is safer
but would require that shard lock it until the copying is complete. And
until then no IO can happen into this file.
(I haven't found the time to work on this, as there exists a workaround and
I've been busy with other tasks. If anyone wants to volunteer to get this
done, I'll be happy to help).

But anway, why is copying data into new unsharded volume disruptive for you?

-Krutika

On Sat, Apr 21, 2018 at 1:14 AM, Jamie Lawrence <jlawrence at
squaretrade.com>
wrote:
> Hello,
>
> So I have a volume on a gluster install (3.12.5) on which sharding was
> enabled at some point recently. (Don't know how it happened, it may
have
> been an accidental run of an old script.) So it has been happily sharding
> behind our backs and it shouldn't have.
>
> I'd like to turn sharding off and reverse the files back to normal. 
Some
> of these are sparse files, so I need to account for holes. There are more
> than enough that I need to write a tool to do it.
>
> I saw notes ca. 3.7 saying the only way to do it was to read-off on the
> client-side, blow away the volume and start over. This would be extremely
> disruptive for us, and language I've seen reading tickets and old
messages
> to this list make me think that isn't needed anymore, but confirmation
of
> that would be good.
>
> The only discussion I can find are these videos[1]:
> http://opensource-storage.blogspot.com/2016/07/de-
> mystifying-gluster-shards.html , and some hints[2] that are old enough
> that I don't trust them without confirmation that nothing's
changed. The
> video things don't acknowledge the existence of file holes. Also, the
hint
> in [2] mentions using trusted.glusterfs.shard.file-size to get the size
> of a partly filled hole; that value looks like base64, but when I attempt
> to decode it, base64 complains about invalid input.
>
> In short, I can't find sufficient information to reconstruct these. Has
> anyone written a current, step-by-step guide on reconstructing sharded
> files? Or has someone has written a tool so I don't have to?
>
> Thanks,
>
> -j
>
>
> [1] Why one would choose to annoy the crap out of their fellow gluster
> users by using video to convey about 80 bytes of ASCII-encoded information,
> I have no idea.
> [2] http://lists.gluster.org/pipermail/gluster-devel/2017-
> March/052212.html
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180427/caa8148a/attachment.html>

Jim Kinney

2018-Apr-27 11:45 UTC

head link

[Gluster-users] Reconstructing files from shards

For me, the process of copying out the drive file from Ovirt is a tedious, very
manual process. Each vm has a single drive file with tens of thousands of shards
each. Typical vm size is 100G for me. And it's all mostly sparse. So, yes, a
copy out from the gluster share is best.

Did the outstanding bug of adding bricks to sharded domain causing data loss get
fixed in release 3.12?

On April 27, 2018 12:00:15 AM EDT, Krutika Dhananjay <kdhananj at
redhat.com> wrote:>The short answer is - no there exists no script currently that can
>piece
>the shards together into a single file.
>
>Long answer:
>IMO the safest way to convert from sharded to a single file _is_ by
>copying
>the data out into a new volume at the moment.
>Picking up the files from the individual bricks directly and joining
>them,
>although fast, is a strict no-no for many reasons - for example, when
>you
>have a replicated volume
>and the good copy needs to be carefully selected and must remain a good
>copy through the course of the copying process. There could be other
>consistency issues with
>file attributes changing while they are being copied. All of this is
>not
>possible, unless you're open to taking the volume down.
>
>Then the other option is to have gluster client (perhaps in the shard
>translator itself)) do the conversion in the background within the
>gluster
>translator stack, which is safer
>but would require that shard lock it until the copying is complete. And
>until then no IO can happen into this file.
>(I haven't found the time to work on this, as there exists a workaround
>and
>I've been busy with other tasks. If anyone wants to volunteer to get
>this
>done, I'll be happy to help).
>
>But anway, why is copying data into new unsharded volume disruptive for
>you?
>
>-Krutika
>
>
>On Sat, Apr 21, 2018 at 1:14 AM, Jamie Lawrence
><jlawrence at squaretrade.com>
>wrote:
>
>> Hello,
>>
>> So I have a volume on a gluster install (3.12.5) on which sharding
>was
>> enabled at some point recently. (Don't know how it happened, it may
>have
>> been an accidental run of an old script.) So it has been happily
>sharding
>> behind our backs and it shouldn't have.
>>
>> I'd like to turn sharding off and reverse the files back to normal.
>Some
>> of these are sparse files, so I need to account for holes. There are
>more
>> than enough that I need to write a tool to do it.
>>
>> I saw notes ca. 3.7 saying the only way to do it was to read-off on
>the
>> client-side, blow away the volume and start over. This would be
>extremely
>> disruptive for us, and language I've seen reading tickets and old
>messages
>> to this list make me think that isn't needed anymore, but
>confirmation of
>> that would be good.
>>
>
>> The only discussion I can find are these videos[1]:
>> http://opensource-storage.blogspot.com/2016/07/de-
>> mystifying-gluster-shards.html , and some hints[2] that are old
>enough
>> that I don't trust them without confirmation that nothing's
changed.
>The
>> video things don't acknowledge the existence of file holes. Also,
the
>hint
>> in [2] mentions using trusted.glusterfs.shard.file-size to get the
>size
>> of a partly filled hole; that value looks like base64, but when I
>attempt
>> to decode it, base64 complains about invalid input.
>>
>> In short, I can't find sufficient information to reconstruct these.
>Has
>> anyone written a current, step-by-step guide on reconstructing
>sharded
>> files? Or has someone has written a tool so I don't have to?
>>
>> Thanks,
>>
>> -j
>>
>>
>> [1] Why one would choose to annoy the crap out of their fellow
>gluster
>> users by using video to convey about 80 bytes of ASCII-encoded
>information,
>> I have no idea.
>> [2] http://lists.gluster.org/pipermail/gluster-devel/2017-
>> March/052212.html
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
-- 
Sent from my Android device with K-9 Mail. All tyopes are thumb related and
reflect authenticity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180427/bfdc2563/attachment.html>

Jamie Lawrence

2018-Apr-27 22:34 UTC

head link

[Gluster-users] Reconstructing files from shards

> On Apr 26, 2018, at 9:00 PM, Krutika Dhananjay <kdhananj at
redhat.com> wrote:
> But anway, why is copying data into new unsharded volume disruptive for
you?
The copy itself isn't; blowing away the existing volume and recreating it
is.

That is for the usual reasons - storage on the cluster machines is not infinite,
the cluster serves a purpose that humans rely on, downtime is expensive.

-j

Apparently Analagous Threads

Search for more possibly parallel threads

Gluster users - Apr 2018 - Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

[Gluster-users] Reconstructing files from shards

Apparently Analagous Threads