Just wondering if anyone else is running into the same behavior with disperse volumes described below and what I might be able to do about it. I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single brick. I can successfully create a working volume via the command: gluster volume create testvol1 disperse 12 redundancy 4 gluster01:/exports/sda/brick1/testvol1 gluster02:/exports/sda/brick1/testvol1 gluster03:/exports/sda/brick1/testvol1 gluster04:/exports/sda/brick1/testvol1 gluster05:/exports/sda/brick1/testvol1 gluster06:/exports/sda/brick1/testvol1 gluster07:/exports/sda/brick1/testvol1 gluster08:/exports/sda/brick1/testvol1 gluster09:/exports/sda/brick1/testvol1 gluster10:/exports/sda/brick1/testvol1 gluster11:/exports/sda/brick1/testvol1 gluster12:/exports/sda/brick1/testvol1 And start the volume: gluster volume start testvol1 Mounting the volume on an x86-64 system it performs as expected. Mounting the same volume on an armhf system (such as one of the cluster members) I can create directories but trying to create a file I get an error and the file system unmounts/crashes: root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt root at gluster01:~# cd /mnt root at gluster01:/mnt# ls root at gluster01:/mnt# mkdir test root at gluster01:/mnt# cd test root at gluster01:/mnt/test# cp /root/notes.txt ./ cp: failed to close './notes.txt': Software caused connection abort root at gluster01:/mnt/test# ls ls: cannot open directory '.': Transport endpoint is not connected I get many of these in the glusterfsd.log: The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 100 times between [2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895] Furthermore, if a cluster member ducks out (reboots, loses connection, etc) and needs healing the self heal daemon logs messages similar to that above and can not heal - no disk activity (verified via iotop) though very high CPU usage and the volume heal info command indicates the volume needs healing. I tested all of the above in virtual environments using x86-64 VMs and could self heal as expected. Again this only happens when using disperse volumes. Should I be filing a bug report instead? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180803/bd45bcf2/attachment.html>
Yes, you should file a bug to track this issue and to share information. Also, I would like to have logs which are present in /var/log/messages, specially mount logs with name mnt.log or something. Following are the points I would like to bring in to your notice- 1 - Are you sure that all the bricks are UP? 2 - Is there any connection issues? 3 - It is possible that there is a bug which caused crash. So please check for core dump created while doing mount and you saw ENOTCONN error. 4 - I am not very much aware of armhf and have not run glusterfs on this hardware. So, we need to see if there is anything in code which is stopping us to run glusterfs on this architecture and setup. 5 - Please provide the output of gluster v info and gluster v status for the volume in BZ. --- Ashish ----- Original Message ----- From: "Fox" <foxxz.net at gmail.com> To: gluster-users at gluster.org Sent: Friday, August 3, 2018 9:51:30 AM Subject: [Gluster-users] Disperse volumes on armhf Just wondering if anyone else is running into the same behavior with disperse volumes described below and what I might be able to do about it. I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single brick. I can successfully create a working volume via the command: gluster volume create testvol1 disperse 12 redundancy 4 gluster01:/exports/sda/brick1/testvol1 gluster02:/exports/sda/brick1/testvol1 gluster03:/exports/sda/brick1/testvol1 gluster04:/exports/sda/brick1/testvol1 gluster05:/exports/sda/brick1/testvol1 gluster06:/exports/sda/brick1/testvol1 gluster07:/exports/sda/brick1/testvol1 gluster08:/exports/sda/brick1/testvol1 gluster09:/exports/sda/brick1/testvol1 gluster10:/exports/sda/brick1/testvol1 gluster11:/exports/sda/brick1/testvol1 gluster12:/exports/sda/brick1/testvol1 And start the volume: gluster volume start testvol1 Mounting the volume on an x86-64 system it performs as expected. Mounting the same volume on an armhf system (such as one of the cluster members) I can create directories but trying to create a file I get an error and the file system unmounts/crashes: root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt root at gluster01:~# cd /mnt root at gluster01:/mnt# ls root at gluster01:/mnt# mkdir test root at gluster01:/mnt# cd test root at gluster01:/mnt/test# cp /root/notes.txt ./ cp: failed to close './notes.txt': Software caused connection abort root at gluster01:/mnt/test# ls ls: cannot open directory '.': Transport endpoint is not connected I get many of these in the glusterfsd.log: The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] 0-management: Failed to save the backtrace." repeated 100 times between [2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895] Furthermore, if a cluster member ducks out (reboots, loses connection, etc) and needs healing the self heal daemon logs messages similar to that above and can not heal - no disk activity (verified via iotop) though very high CPU usage and the volume heal info command indicates the volume needs healing. I tested all of the above in virtual environments using x86-64 VMs and could self heal as expected. Again this only happens when using disperse volumes. Should I be filing a bug report instead? _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180803/e15e56e8/attachment.html>
What is the endianness of the armhf CPU ? Are you running a 32bit or 64bit Operating System ? On Fri, Aug 3, 2018 at 9:51 AM, Fox <foxxz.net at gmail.com> wrote:> Just wondering if anyone else is running into the same behavior with > disperse volumes described below and what I might be able to do about it. > > I am using ubuntu 18.04LTS on Odroid HC-2 hardware (armhf) and have > installed gluster 4.1.2 via PPA. I have 12 member nodes each with a single > brick. I can successfully create a working volume via the command: > > gluster volume create testvol1 disperse 12 redundancy 4 > gluster01:/exports/sda/brick1/testvol1 gluster02:/exports/sda/brick1/testvol1 > gluster03:/exports/sda/brick1/testvol1 gluster04:/exports/sda/brick1/testvol1 > gluster05:/exports/sda/brick1/testvol1 gluster06:/exports/sda/brick1/testvol1 > gluster07:/exports/sda/brick1/testvol1 gluster08:/exports/sda/brick1/testvol1 > gluster09:/exports/sda/brick1/testvol1 gluster10:/exports/sda/brick1/testvol1 > gluster11:/exports/sda/brick1/testvol1 gluster12:/exports/sda/brick1/ > testvol1 > > And start the volume: > gluster volume start testvol1 > > Mounting the volume on an x86-64 system it performs as expected. > > Mounting the same volume on an armhf system (such as one of the cluster > members) I can create directories but trying to create a file I get an > error and the file system unmounts/crashes: > root at gluster01:~# mount -t glusterfs gluster01:/testvol1 /mnt > root at gluster01:~# cd /mnt > root at gluster01:/mnt# ls > root at gluster01:/mnt# mkdir test > root at gluster01:/mnt# cd test > root at gluster01:/mnt/test# cp /root/notes.txt ./ > cp: failed to close './notes.txt': Software caused connection abort > root at gluster01:/mnt/test# ls > ls: cannot open directory '.': Transport endpoint is not connected > > I get many of these in the glusterfsd.log: > The message "W [MSGID: 101088] [common-utils.c:4316:gf_backtrace_save] > 0-management: Failed to save the backtrace." repeated 100 times between > [2018-08-03 04:06:39.904166] and [2018-08-03 04:06:57.521895] > > > Furthermore, if a cluster member ducks out (reboots, loses connection, > etc) and needs healing the self heal daemon logs messages similar to that > above and can not heal - no disk activity (verified via iotop) though very > high CPU usage and the volume heal info command indicates the volume needs > healing. > > > I tested all of the above in virtual environments using x86-64 VMs and > could self heal as expected. > > Again this only happens when using disperse volumes. Should I be filing a > bug report instead? > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- Milind -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180803/066d1e99/attachment.html>