thr3ads.net - Gluster users - [Gluster-users] Advice on rebuilding underlying filesystem [Apr 2014]

If this information is useful, please help other people find it:
Share via:

Andrew Smith

2014-Apr-11 21:32 UTC

[Gluster-users] Advice on rebuilding underlying filesystem

Hi, I have a problem, which I hope for your sake, is uncommon.

I built a Gluster  volume with 8 bricks, 4 80TB and 4 68TB with 
a total capacity of about 600TB. The underlying filesystem 
is BTRFS. 

I found out after the system was half full that BTRFS was a 
bad idea. BTRFS doesn?t have inodes. It allocates some fraction
of the disk space to metadata and when it runs out, it allocates
more. This allocation process on large volumes is painfully slow
and brings effective write speeds down to only a few MB/s with long 
timeouts. The data can be read at high speeds, but writing to the
volume is a big fat mess. Reading is still fairly fast though, 
so access to the my data by users is acceptable. 

I need to keep this volume available and I don?t have a second 
copy of the hardware to rebuild the system on. So, I need to do 
an in-situ transition from BTRFS to XFS. 

To do this, I first cleared out some data to free up metadata space,
and then with much difficulty managed to do a 

   # gluster volume remove-brick 

I retired the removed brick and then reformatted it with XFS and added
it back to my Gluster volume. At this point, I thought I was nearly 
home. I thought I could retire a second brick and the data would 
be copied to the empty brick. However, this is not what happens.
Some data ends up on the newly added brick, but some of the data 
flows elsewhere, which due to the BTRFS problem is a nightmare.

I assume this is because when I took my volume from 8 bricks to 7, it 
became unbalanced. The data on the brick that I was retiring 
belongs on several different bricks and so I am not just doing a 
substitution.

I need to be able to tell my Gluster volume to include all the bricks, 
but do not write files to any of the BTRFS bricks so that it puts data
only on the XFS brick. If I could somehow tell Gluster that these bricks
were full, that would suffice. 

I could do a "rebalance migrate-data" to make make the data on the
BTRFS
volumes more uniform, but I don?t know how this will work. Does reposition
the data brick by brick or file by file. Brick by brick would be bad, since
the last brick to rebalance would need to receive all the data that it requires
before it would get to write data out to free up metadata space. 

There is a ?rebalance-brick? option in the man page, but I don?t see that 
documented. This may be useful, but I have no idea what it will do.

Is there a solution to my problem? Whip it and start over is not helpful. 
Any help on how I can predict where data will go will also help.

Andy

Machiel Groeneveld

2014-Apr-11 21:34 UTC

head link

[Gluster-users] Advice on rebuilding underlying filesystem

Isn't that what replace-brick is for?

> On 11 Apr 2014, at 23:32, Andrew Smith <smith.andrew.james at
gmail.com> wrote:
> 
> 
> Hi, I have a problem, which I hope for your sake, is uncommon.
> 
> I built a Gluster  volume with 8 bricks, 4 80TB and 4 68TB with 
> a total capacity of about 600TB. The underlying filesystem 
> is BTRFS. 
> 
> I found out after the system was half full that BTRFS was a 
> bad idea. BTRFS doesn?t have inodes. It allocates some fraction
> of the disk space to metadata and when it runs out, it allocates
> more. This allocation process on large volumes is painfully slow
> and brings effective write speeds down to only a few MB/s with long 
> timeouts. The data can be read at high speeds, but writing to the
> volume is a big fat mess. Reading is still fairly fast though, 
> so access to the my data by users is acceptable. 
> 
> I need to keep this volume available and I don?t have a second 
> copy of the hardware to rebuild the system on. So, I need to do 
> an in-situ transition from BTRFS to XFS. 
> 
> To do this, I first cleared out some data to free up metadata space,
> and then with much difficulty managed to do a 
> 
>   # gluster volume remove-brick 
> 
> I retired the removed brick and then reformatted it with XFS and added
> it back to my Gluster volume. At this point, I thought I was nearly 
> home. I thought I could retire a second brick and the data would 
> be copied to the empty brick. However, this is not what happens.
> Some data ends up on the newly added brick, but some of the data 
> flows elsewhere, which due to the BTRFS problem is a nightmare.
> 
> I assume this is because when I took my volume from 8 bricks to 7, it 
> became unbalanced. The data on the brick that I was retiring 
> belongs on several different bricks and so I am not just doing a 
> substitution.
> 
> I need to be able to tell my Gluster volume to include all the bricks, 
> but do not write files to any of the BTRFS bricks so that it puts data
> only on the XFS brick. If I could somehow tell Gluster that these bricks
> were full, that would suffice. 
> 
> I could do a "rebalance migrate-data" to make make the data on
the BTRFS
> volumes more uniform, but I don?t know how this will work. Does reposition
> the data brick by brick or file by file. Brick by brick would be bad, since
> the last brick to rebalance would need to receive all the data that it
requires
> before it would get to write data out to free up metadata space. 
> 
> There is a ?rebalance-brick? option in the man page, but I don?t see that 
> documented. This may be useful, but I have no idea what it will do.
> 
> Is there a solution to my problem? Whip it and start over is not helpful. 
> Any help on how I can predict where data will go will also help.
> 
> Andy
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

Gluster users - Apr 2014 - Advice on rebuilding underlying filesystem

[Gluster-users] Advice on rebuilding underlying filesystem

[Gluster-users] Advice on rebuilding underlying filesystem