On March 12, 2020 8:06:14 PM GMT+02:00, Pat Haley <phaley at mit.edu>
wrote:>Hi
>
>Yesterday we seemed to clear an issue with erroneous "No space left on
>device" messages
>(https://lists.gluster.org/pipermail/gluster-users/2020-March/037848.html)
>
>I am now seeing "Stale file handle" messages coming from
directories
>I've just created.
>
>We are running gluster 3.7.11 in a distributed volume across 2 servers
>(2 bricks each). For the "Stale file handle" for a newly created
>directory, I've noticed that the directory does not appear in brick1
>(it
>is in the other 3 bricks).
>
>In the cli.log on the server with brick1 I'm seeing messages like
>
>--------------------------------------------------------
>[2020-03-12 17:21:36.596908] I [cli.c:721:main] 0-cli: Started running
>gluster with version 3.7.11
>[2020-03-12 17:21:36.604587] I
>[cli-cmd-volume.c:1795:cli_check_gsync_present] 0-: geo-replication not
>
>installed
>[2020-03-12 17:21:36.605100] I [MSGID: 101190]
>[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>
>with index 1
>[2020-03-12 17:21:36.605155] I [socket.c:2356:socket_event_handler]
>0-transport: disconnecting now
>[2020-03-12 17:21:36.617433] I [input.c:36:cli_batch] 0-: Exiting with:
>0
>--------------------------------------------------------
>
>I'm not sure why I would be getting any geo-replication messages, we
>aren't using replication. The cli.log on the other server is showing
>
>--------------------------------------------------------
>[2020-03-12 17:27:08.172573] I [cli.c:721:main] 0-cli: Started running
>gluster with version 3.7.11
>[2020-03-12 17:27:08.302564] I [MSGID: 101190]
>[event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>
>with index 1
>[2020-03-12 17:27:08.302716] I [socket.c:2356:socket_event_handler]
>0-transport: disconnecting now
>[2020-03-12 17:27:08.304557] I [input.c:36:cli_batch] 0-: Exiting with:
>0
>--------------------------------------------------------
>
>
>On the server with brick1, the etc-glusterfs-glusterd.vol.log is
>showing
>
>--------------------------------------------------------
>[2020-03-12 17:21:25.925394] I [MSGID: 106499]
>[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>
>Received status volume req for volume data-volume
>[2020-03-12 17:21:25.946240] W [MSGID: 106217]
>[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>uuid to hostname conversion
>[2020-03-12 17:21:25.946282] W [MSGID: 106387]
>[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>modification failed
>[2020-03-12 17:21:36.617090] I [MSGID: 106487]
>[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>0-glusterd:
>Received cli list req
>[2020-03-12 17:21:15.577829] I [MSGID: 106488]
>[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>Received get vol req
>--------------------------------------------------------
>
>On the other server I'm seeing similar messages
>
>--------------------------------------------------------
>[2020-03-12 17:26:57.024168] I [MSGID: 106499]
>[glusterd-handler.c:4331:__glusterd_handle_status_volume] 0-management:
>
>Received status volume req for volume data-volume
>[2020-03-12 17:26:57.037269] W [MSGID: 106217]
>[glusterd-op-sm.c:4630:glusterd_op_modify_op_ctx] 0-management: Failed
>uuid to hostname conversion
>[2020-03-12 17:26:57.037299] W [MSGID: 106387]
>[glusterd-op-sm.c:4734:glusterd_op_modify_op_ctx] 0-management: op_ctx
>modification failed
>[2020-03-12 17:26:42.025200] I [MSGID: 106488]
>[glusterd-handler.c:1533:__glusterd_handle_cli_get_volume] 0-glusterd:
>Received get vol req
>[2020-03-12 17:27:08.304267] I [MSGID: 106487]
>[glusterd-handler.c:1472:__glusterd_handle_cli_list_friends]
>0-glusterd:
>Received cli list req
>--------------------------------------------------------
>
>And I've just noticed that I'm again seeing "No space left on
device"
>in
>the logs of brick1 (although there is 3.5 TB free)
>
>--------------------------------------------------------
>[2020-03-12 17:19:54.576597] E [MSGID: 113027]
>[posix.c:1427:posix_mkdir] 0-data-volume-posix: mkdir of
>/mnt/brick1/projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>failed [No space left on device]
>[2020-03-12 17:19:54.576681] E [MSGID: 115056]
>[server-rpc-fops.c:512:server_mkdir_cbk] 0-data-volume-server: 5001698:
>
>MKDIR /projects/deep_sea_mining/Tide/2020/Mar06/ccfzR75deg_001
>(96e0b7e4-6b43-42ef-9896-86097b4208fe/ccfzR75deg_001) ==> (No space
>left
>on device) [No space left on device]
>--------------------------------------------------------
>
>Any thoughts would be greatly appreciated.? (Some additional
>information
>below)
>
>Thanks
>
>Pat
>
>--------------------------------------------------------
>server 1:
>[root at mseas-data2 ~]# df -h
>Filesystem????? Size? Used Avail Use% Mounted on
>/dev/sdb??????? 164T? 161T? 3.5T? 98% /mnt/brick2
>/dev/sda??????? 164T? 159T? 5.4T? 97% /mnt/brick1
>
>[root at mseas-data2 ~]# df -i
>Filesystem???????? Inodes??? IUsed????? IFree IUse% Mounted on
>/dev/sdb?????? 7031960320 31213790 7000746530??? 1% /mnt/brick2
>/dev/sda?????? 7031960320 28707456 7003252864??? 1% /mnt/brick1
>--------------------------------------------------------
>
>--------------------------------------------------------
>server 2:
>[root at mseas-data3 ~]# df -h
>Filesystem??????????? Size? Used Avail Use% Mounted on
>/dev/sda?????????????? 91T?? 88T? 3.9T? 96% /export/sda/brick3
>/dev/mapper/vg_Data4-lv_Data4
> ?????????????????????? 91T?? 89T? 2.6T? 98% /export/sdc/brick4
>
>[root at mseas-data3 glusterfs]# df -i
>Filesystem?????????????? Inodes??? IUsed????? IFree IUse% Mounted on
>/dev/sda???????????? 1953182464 10039172 1943143292??? 1%
>/export/sda/brick3
>/dev/mapper/vg_Data4-lv_Data4
> ???????????????????? 3906272768 11917222 3894355546??? 1%
>/export/sdc/brick4
>--------------------------------------------------------
>
>--------------------------------------------------------
>[root at mseas-data2 ~]# gluster volume info
>--------------------------------------------------------
>Volume Name: data-volume
>Type: Distribute
>Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>Status: Started
>Number of Bricks: 4
>Transport-type: tcp
>Bricks:
>Brick1: mseas-data2:/mnt/brick1
>Brick2: mseas-data2:/mnt/brick2
>Brick3: mseas-data3:/export/sda/brick3
>Brick4: mseas-data3:/export/sdc/brick4
>Options Reconfigured:
>cluster.min-free-disk: 1%
>nfs.export-volumes: off
>nfs.disable: on
>performance.readdir-ahead: on
>diagnostics.brick-sys-log-level: WARNING
>nfs.exports-auth-enable: on
>server.allow-insecure: on
>auth.allow: *
>disperse.eager-lock: off
>performance.open-behind: off
>performance.md-cache-timeout: 60
>network.inode-lru-limit: 50000
>diagnostics.client-log-level: ERROR
>
>--------------------------------------------------------
>[root at mseas-data2 ~]# gluster volume status data-volume detail
>--------------------------------------------------------
>Status of volume: data-volume
>------------------------------------------------------------------------------
>Brick??????????????? : Brick mseas-data2:/mnt/brick1
>TCP Port???????????? : 49154
>RDMA Port??????????? : 0
>Online?????????????? : Y
>Pid????????????????? : 4601
>File System????????? : xfs
>Device?????????????? : /dev/sda
>Mount Options??????? : rw
>Inode Size?????????? : 256
>Disk Space Free????? : 5.4TB
>Total Disk Space???? : 163.7TB
>Inode Count????????? : 7031960320
>Free Inodes????????? : 7003252864
>------------------------------------------------------------------------------
>Brick??????????????? : Brick mseas-data2:/mnt/brick2
>TCP Port???????????? : 49155
>RDMA Port??????????? : 0
>Online?????????????? : Y
>Pid????????????????? : 7949
>File System????????? : xfs
>Device?????????????? : /dev/sdb
>Mount Options??????? : rw
>Inode Size?????????? : 256
>Disk Space Free????? : 3.4TB
>Total Disk Space???? : 163.7TB
>Inode Count????????? : 7031960320
>Free Inodes????????? : 7000746530
>------------------------------------------------------------------------------
>Brick??????????????? : Brick mseas-data3:/export/sda/brick3
>TCP Port???????????? : 49153
>RDMA Port??????????? : 0
>Online?????????????? : Y
>Pid????????????????? : 4650
>File System????????? : xfs
>Device?????????????? : /dev/sda
>Mount Options??????? : rw
>Inode Size?????????? : 512
>Disk Space Free????? : 3.9TB
>Total Disk Space???? : 91.0TB
>Inode Count????????? : 1953182464
>Free Inodes????????? : 1943143292
>------------------------------------------------------------------------------
>Brick??????????????? : Brick mseas-data3:/export/sdc/brick4
>TCP Port???????????? : 49154
>RDMA Port??????????? : 0
>Online?????????????? : Y
>Pid????????????????? : 23772
>File System????????? : xfs
>Device?????????????? : /dev/mapper/vg_Data4-lv_Data4
>Mount Options??????? : rw
>Inode Size?????????? : 256
>Disk Space Free????? : 2.6TB
>Total Disk Space???? : 90.9TB
>Inode Count????????? : 3906272768
>Free Inodes????????? : 3894355546
>
>--
>
>-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>Pat Haley Email: phaley at mit.edu
>Center for Ocean Engineering Phone: (617) 253-6824
>Dept. of Mechanical Engineering Fax: (617) 253-8125
>MIT, Room 5-213 http://web.mit.edu/phaley/www/
>77 Massachusetts Avenue
>Cambridge, MA 02139-4301
>
>________
>
>
>
>Community Meeting Calendar:
>
>Schedule -
>Every Tuesday at 14:30 IST / 09:00 UTC
>Bridge: https://bluejeans.com/441850968
>
>Gluster-users mailing list
>Gluster-users at gluster.org
>https://lists.gluster.org/mailman/listinfo/gluster-users
Hey Pat,
The logs are not providing much information , but the following seems
strange:
'Failed uuid to hostname conversion'
Have you checked dns resolution (both short name and fqdn)?
Also, check the systems' ntp/chrony is in sync and the 'gluster peer
status' on all nodes.
Is it possible that the client is not reaching all bricks ?
P.S.: Consider increasing the log level, as current level is not sufficient.
Best Regards,
Strahil Nikolov