thr3ads.net - Gluster users - [Gluster-users] Disperse volumes on armhf [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Fox

2018-Aug-03 04:21 UTC

[Gluster-users] Disperse volumes on armhf

Just wondering if anyone else is running into the same behavior with
disperse volumes described below and what I might be able to do about it.

I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have
installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single
brick. I can successfully create a working volume via the command:

gluster volume create testvol1 disperse 12 redundancy 4
gluster01:/exports/sda/brick1/testvol1
gluster02:/exports/sda/brick1/testvol1
gluster03:/exports/sda/brick1/testvol1
gluster04:/exports/sda/brick1/testvol1
gluster05:/exports/sda/brick1/testvol1
gluster06:/exports/sda/brick1/testvol1
gluster07:/exports/sda/brick1/testvol1
gluster08:/exports/sda/brick1/testvol1
gluster09:/exports/sda/brick1/testvol1
gluster10:/exports/sda/brick1/testvol1
gluster11:/exports/sda/brick1/testvol1
gluster12:/exports/sda/brick1/testvol1

And start the volume:
gluster volume start testvol1

Mounting the volume on an x86-64 system it performs as expected.

Mounting the same volume on an armhf system (such as one of the cluster
members) I can create directories but trying to create a file I get an
error and the file system unmounts/crashes:
root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt
root at gluster01:~# cd /mnt
root at gluster01:/mnt# ls
root at gluster01:/mnt# mkdir test
root at gluster01:/mnt# cd test
root at gluster01:/mnt/test# cp /root/notes.txt ./
cp: failed to close './notes.txt': Software caused connection abort
root at gluster01:/mnt/test# ls
ls: cannot open directory '.': Transport endpoint is not connected

I get many of these in the glusterfsd.log:
The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save]
0-management: Failed to save the backtrace." repeated 100 times between
[2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895]


Furthermore, if a cluster member ducks out (reboots, loses connection, etc)
and needs healing the self heal daemon logs messages similar to that above
and can not heal - no disk activity (verified via iotop) though very high
CPU usage and the volume heal info command indicates the volume needs
healing.


I tested all of the above in virtual environments using x86-64 VMs and
could self heal as expected.

Again this only happens when using disperse volumes. Should I be filing a
bug report instead?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180803/bd45bcf2/attachment.html>

Ashish Pandey

2018-Aug-03 05:57 UTC

head link

[Gluster-users] Disperse volumes on armhf

Yes, you should file a bug to track this issue and to share information. 
Also, I would like to have logs which are present in /var/log/messages,
specially mount logs with name mnt.log or something.

Following are the points I would like to bring in to your notice- 

1 - Are you sure that all the bricks are UP? 
2 - Is there any connection issues? 
3 - It is possible that there is a bug which caused crash. So please check for
core dump created while doing mount and you saw ENOTCONN error.
4 - I am not very much aware of armhf and have not run glusterfs on this
hardware. So, we need to see if there is anything in code which is
stopping us to run glusterfs on this architecture and setup. 
5 - Please provide the output of gluster v info and gluster v status for the
volume in BZ.

--- 
Ashish 

----- Original Message -----

From: "Fox" <foxxz.net at gmail.com> 
To: gluster-users at gluster.org 
Sent: Friday, August 3, 2018 9:51:30 AM 
Subject: [Gluster-users] Disperse volumes on armhf 

Just wondering if anyone else is running into the same behavior with disperse
volumes described below and what I might be able to do about it.

I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have installed
gluster 4.1.2 via PPA. I have 12 member nodes each with a single brick. I can
successfully create a working volume via the command:

gluster volume create testvol1 disperse 12 redundancy 4
gluster01:/exports/sda/brick1/testvol1 gluster02:/exports/sda/brick1/testvol1
gluster03:/exports/sda/brick1/testvol1 gluster04:/exports/sda/brick1/testvol1
gluster05:/exports/sda/brick1/testvol1 gluster06:/exports/sda/brick1/testvol1
gluster07:/exports/sda/brick1/testvol1 gluster08:/exports/sda/brick1/testvol1
gluster09:/exports/sda/brick1/testvol1 gluster10:/exports/sda/brick1/testvol1
gluster11:/exports/sda/brick1/testvol1 gluster12:/exports/sda/brick1/testvol1

And start the volume: 

gluster volume start testvol1 

Mounting the volume on an x86-64 system it performs as expected. 

Mounting the same volume on an armhf system (such as one of the cluster members)
I can create directories but trying to create a file I get an error and the file
system unmounts/crashes:
root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt 
root at gluster01:~# cd /mnt 
root at gluster01:/mnt# ls 
root at gluster01:/mnt# mkdir test 
root at gluster01:/mnt# cd test 
root at gluster01:/mnt/test# cp /root/notes.txt ./ 
cp: failed to close './notes.txt': Software caused connection abort 
root at gluster01:/mnt/test# ls 
ls: cannot open directory '.': Transport endpoint is not connected 

I get many of these in the glusterfsd.log: 
The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save]
0-management: Failed to save the backtrace." repeated 100 times between
[2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895]


Furthermore, if a cluster member ducks out (reboots, loses connection, etc) and
needs healing the self heal daemon logs messages similar to that above and can
not heal - no disk activity (verified via iotop) though very high CPU usage and
the volume heal info command indicates the volume needs healing.


I tested all of the above in virtual environments using x86-64 VMs and could
self heal as expected.

Again this only happens when using disperse volumes. Should I be filing a bug
report instead?

_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180803/e15e56e8/attachment.html>

Milind Changire

2018-Aug-03 07:33 UTC

head link

[Gluster-users] Disperse volumes on armhf

What is the endianness of the armhf CPU ?
Are you running a 32bit or 64bit Operating System ?


On Fri, Aug 3, 2018 at 9:51 AM, Fox <foxxz.net at gmail.com> wrote:
> Just wondering if anyone else is running into the same behavior with
> disperse volumes described below and what I might be able to do about it.
>
> I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have
> installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single
> brick. I can successfully create a working volume via the command:
>
> gluster volume create testvol1 disperse 12 redundancy 4
> gluster01:/exports/sda/brick1/testvol1
gluster02:/exports/sda/brick1/testvol1
> gluster03:/exports/sda/brick1/testvol1
gluster04:/exports/sda/brick1/testvol1
> gluster05:/exports/sda/brick1/testvol1
gluster06:/exports/sda/brick1/testvol1
> gluster07:/exports/sda/brick1/testvol1
gluster08:/exports/sda/brick1/testvol1
> gluster09:/exports/sda/brick1/testvol1
gluster10:/exports/sda/brick1/testvol1
> gluster11:/exports/sda/brick1/testvol1 gluster12:/exports/sda/brick1/
> testvol1
>
> And start the volume:
> gluster volume start testvol1
>
> Mounting the volume on an x86-64 system it performs as expected.
>
> Mounting the same volume on an armhf system (such as one of the cluster
> members) I can create directories but trying to create a file I get an
> error and the file system unmounts/crashes:
> root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt
> root at gluster01:~# cd /mnt
> root at gluster01:/mnt# ls
> root at gluster01:/mnt# mkdir test
> root at gluster01:/mnt# cd test
> root at gluster01:/mnt/test# cp /root/notes.txt ./
> cp: failed to close './notes.txt': Software caused connection abort
> root at gluster01:/mnt/test# ls
> ls: cannot open directory '.': Transport endpoint is not connected
>
> I get many of these in the glusterfsd.log:
> The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save]
> 0-management: Failed to save the backtrace." repeated 100 times
between
> [2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895]
>
>
> Furthermore, if a cluster member ducks out (reboots, loses connection,
> etc) and needs healing the self heal daemon logs messages similar to that
> above and can not heal - no disk activity (verified via iotop) though very
> high CPU usage and the volume heal info command indicates the volume needs
> healing.
>
>
> I tested all of the above in virtual environments using x86-64 VMs and
> could self heal as expected.
>
> Again this only happens when using disperse volumes. Should I be filing a
> bug report instead?
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Milind
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180803/066d1e99/attachment.html>

Gluster users - Aug 2018 - Disperse volumes on armhf

[Gluster-users] Disperse volumes on armhf

[Gluster-users] Disperse volumes on armhf

[Gluster-users] Disperse volumes on armhf