Hi All,
After performing Strahil's checks and poking around some more, we found
that the problem was with the underlying filesystem thinking it was full
when it wasn't.? Following the information in the links below, we found
that mounting with 64bit inodes fixed this problem.
https://serverfault.com/questions/357367/xfs-no-space-left-on-device-but-i-have-850gb-available
https://support.microfocus.com/kb/doc.php?id=7014318
Thanks
Pat
On 3/12/20 4:24 PM, Strahil Nikolov wrote:> On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley at mit.edu>
wrote:
>> Hi
>>
>> Yesterday we seemed to clear an issue with erroneous "No space
left on
>> device" messages
>>
(https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html)
>>
>> I am now seeing "Stale file handle" messages coming from
directories
>> I've just created.
>>
>> We are running gluster 3.7.11 in a distributed volume across 2 servers
>> (2 bricks each). For the "Stale file handle" for a newly
created
>> directory, I've noticed that the directory does not appear in
brick1
>> (it
>> is in the other 3 bricks).
>>
>> In the cli.log on the server with brick1 I'm seeing messages like
>>
>> --------------------------------------------------------
>> [2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running
>> gluster with version 3.7.11
>> [2020-03-12 17:21:36.604587] I
>> [cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
>>
>> installed
>> [2020-03-12 17:21:36.605100] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>
>> with index 1
>> [2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]
>> 0-transport: disconnecting now
>> [2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:
>> 0
>> --------------------------------------------------------
>>
>> I'm not sure why I would be getting any geo-replication messages,
we
>> aren't using replication. The cli.log on the other server is
showing
>>
>> --------------------------------------------------------
>> [2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running
>> gluster with version 3.7.11
>> [2020-03-12 17:27:08.302564] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>>
>> with index 1
>> [2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]
>> 0-transport: disconnecting now
>> [2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:
>> 0
>> --------------------------------------------------------
>>
>>
>> On the server with brick1, the etc-glusterfs-glusterd.vol.log is
>> showing
>>
>> --------------------------------------------------------
>> [2020-03-12 17:21:25.925394] I [MSGID: 106499]
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>>
>> Received status volume req for volume data-volume
>> [2020-03-12 17:21:25.946240] W [MSGID: 106217]
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>> uuid to hostname conversion
>> [2020-03-12 17:21:25.946282] W [MSGID: 106387]
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>> modification failed
>> [2020-03-12 17:21:36.617090] I [MSGID: 106487]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>> 0-glusterd:
>> Received cli list req
>> [2020-03-12 17:21:15.577829] I [MSGID: 106488]
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> --------------------------------------------------------
>>
>> On the other server I'm seeing similar messages
>>
>> --------------------------------------------------------
>> [2020-03-12 17:26:57.024168] I [MSGID: 106499]
>> [glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>>
>> Received status volume req for volume data-volume
>> [2020-03-12 17:26:57.037269] W [MSGID: 106217]
>> [glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>> uuid to hostname conversion
>> [2020-03-12 17:26:57.037299] W [MSGID: 106387]
>> [glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>> modification failed
>> [2020-03-12 17:26:42.025200] I [MSGID: 106488]
>> [glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>> Received get vol req
>> [2020-03-12 17:27:08.304267] I [MSGID: 106487]
>> [glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>> 0-glusterd:
>> Received cli list req
>> --------------------------------------------------------
>>
>> And I've just noticed that I'm again seeing "No space left
on device"
>> in
>> the logs of brick1 (although there is 3.5 TB free)
>>
>> --------------------------------------------------------
>> [2020-03-12 17:19:54.576597] E [MSGID: 113027]
>> [posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of
>> /mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>> failed [No space left on device]
>> [2020-03-12 17:19:54.576681] E [MSGID: 115056]
>> [server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:
>>
>> MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>> (96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space
>> left
>> on device) [No space left on device]
>> --------------------------------------------------------
>>
>> Any thoughts would be greatly appreciated.? (Some additional
>> information
>> below)
>>
>> Thanks
>>
>> Pat
>>
>> --------------------------------------------------------
>> server 1:
>> [root at mseas-data2 ~]# df -h
>> Filesystem????? Size? Used Avail Use% Mounted on
>> /dev/sdb??????? 164T? 161T? 3.5T? 98% /mnt/brick2
>> /dev/sda??????? 164T? 159T? 5.4T? 97% /mnt/brick1
>>
>> [root at mseas-data2 ~]# df -i
>> Filesystem???????? Inodes??? IUsed????? IFree IUse% Mounted on
>> /dev/sdb?????? 7031960320 31213790 7000746530??? 1% /mnt/brick2
>> /dev/sda?????? 7031960320 28707456 7003252864??? 1% /mnt/brick1
>> --------------------------------------------------------
>>
>> --------------------------------------------------------
>> server 2:
>> [root at mseas-data3 ~]# df -h
>> Filesystem??????????? Size? Used Avail Use% Mounted on
>> /dev/sda?????????????? 91T?? 88T? 3.9T? 96% /export/sda/brick3
>> /dev/mapper/vg_Data4-lv_Data4
>> ?????????????????????? 91T?? 89T? 2.6T? 98% /export/sdc/brick4
>>
>> [root at mseas-data3 glusterfs]# df -i
>> Filesystem?????????????? Inodes??? IUsed????? IFree IUse% Mounted on
>> /dev/sda???????????? 1953182464 10039172 1943143292??? 1%
>> /export/sda/brick3
>> /dev/mapper/vg_Data4-lv_Data4
>> ???????????????????? 3906272768 11917222 3894355546??? 1%
>> /export/sdc/brick4
>> --------------------------------------------------------
>>
>> --------------------------------------------------------
>> [root at mseas-data2 ~]# gluster volume info
>> --------------------------------------------------------
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Brick3: mseas-data3:/export/sda/brick3
>> Brick4: mseas-data3:/export/sdc/brick4
>> Options Reconfigured:
>> cluster.min-free-disk: 1%
>> nfs.export-volumes: off
>> nfs.disable: on
>> performance.readdir-ahead: on
>> diagnostics.brick-sys-log-level: WARNING
>> nfs.exports-auth-enable: on
>> server.allow-insecure: on
>> auth.allow: *
>> disperse.eager-lock: off
>> performance.open-behind: off
>> performance.md-cache-timeout: 60
>> network.inode-lru-limit: 50000
>> diagnostics.client-log-level: ERROR
>>
>> --------------------------------------------------------
>> [root at mseas-data2 ~]# gluster volume status data-volume detail
>> --------------------------------------------------------
>> Status of volume: data-volume
>>
------------------------------------------------------------------------------
>> Brick??????????????? : Brick mseas-data2:/mnt/brick1
>> TCP Port???????????? : 49154
>> RDMA Port??????????? : 0
>> Online?????????????? : Y
>> Pid????????????????? : 4601
>> File System????????? : xfs
>> Device?????????????? : /dev/sda
>> Mount Options??????? : rw
>> Inode Size?????????? : 256
>> Disk Space Free????? : 5.4TB
>> Total Disk Space???? : 163.7TB
>> Inode Count????????? : 7031960320
>> Free Inodes????????? : 7003252864
>>
------------------------------------------------------------------------------
>> Brick??????????????? : Brick mseas-data2:/mnt/brick2
>> TCP Port???????????? : 49155
>> RDMA Port??????????? : 0
>> Online?????????????? : Y
>> Pid????????????????? : 7949
>> File System????????? : xfs
>> Device?????????????? : /dev/sdb
>> Mount Options??????? : rw
>> Inode Size?????????? : 256
>> Disk Space Free????? : 3.4TB
>> Total Disk Space???? : 163.7TB
>> Inode Count????????? : 7031960320
>> Free Inodes????????? : 7000746530
>>
------------------------------------------------------------------------------
>> Brick??????????????? : Brick mseas-data3:/export/sda/brick3
>> TCP Port???????????? : 49153
>> RDMA Port??????????? : 0
>> Online?????????????? : Y
>> Pid????????????????? : 4650
>> File System????????? : xfs
>> Device?????????????? : /dev/sda
>> Mount Options??????? : rw
>> Inode Size?????????? : 512
>> Disk Space Free????? : 3.9TB
>> Total Disk Space???? : 91.0TB
>> Inode Count????????? : 1953182464
>> Free Inodes????????? : 1943143292
>>
------------------------------------------------------------------------------
>> Brick??????????????? : Brick mseas-data3:/export/sdc/brick4
>> TCP Port???????????? : 49154
>> RDMA Port??????????? : 0
>> Online?????????????? : Y
>> Pid????????????????? : 23772
>> File System????????? : xfs
>> Device?????????????? : /dev/mapper/vg_Data4-lv_Data4
>> Mount Options??????? : rw
>> Inode Size?????????? : 256
>> Disk Space Free????? : 2.6TB
>> Total Disk Space???? : 90.9TB
>> Inode Count????????? : 3906272768
>> Free Inodes????????? : 3894355546
>>
>> --
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley Email: phaley at mit.edu
>> Center for Ocean Engineering Phone: (617) 253-6824
>> Dept. of Mechanical Engineering Fax: (617) 253-8125
>> MIT, Room 5-213 http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA 02139-4301
>>
>> ________
>>
>>
>>
>> Community Meeting Calendar:
>>
>> Schedule -
>> Every Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://bluejeans.com/441850968
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
> Hey Pat,
>
> The logs are not providing much information , but the following seems
strange:
> 'Failed uuid to hostname conversion'
>
> Have you checked dns resolution (both short name and fqdn)?
> Also, check the systems' ntp/chrony is in sync and the 'gluster
peer status' on all nodes.
>
> Is it possible that the client is not reaching all bricks ?
>
>
> P.S.: Consider increasing the log level, as current level is not
sufficient.
>
> Best Regards,
> Strahil Nikolov
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley at mit.edu
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301