thr3ads.net - Gluster users - [Gluster-users] Samba vfs_glusterfs no such file or directory [Jun 2014]

If this information is useful, please help other people find it:
Share via:

David Gibbons

2014-Jun-17 12:37 UTC

[Gluster-users] Samba vfs_glusterfs no such file or directory

Hi All,

I am running into a strange error with samba and vfs_glusterfs.

Here is some version information:
[root at gfs-a-3 samba]# smbd -V
Version 3.6.20

[root at gfs-a-3 tmp]# glusterfsd --version
glusterfs 3.4.1 built on Oct 21 2013 09:23:23

Samba is configured in an AD environment, using winbind. Group resolution,
user resolution, and cross--mapping of SIDs to IDs to usernames all works
as expected. The vfs_glusterfs module is working perfectly for the vast
majority of the users I have configured. A small percentage of the users,
though, get an "access is denied" error when they attempt to access
the
share. They are all configured in the same way as the users that are
working.

We initially thought that perhaps the number of groups the user was a
member of was causing the issue. This still might be the case but we're not
sure how to verify that guess.

When we connect with a working user, with glusterfs:loglevel = 10, here is
are the last bits of log file. I'm not really sure where the interesting
lines are, any guidance would be much appreciated:

[2014-06-17 12:11:53.753289] D> [client-handshake.c:1430:client_setvolume_cbk] 0-shares-client-5:
> clnt-lk-version = 1, server-lk-version = 0
> [2014-06-17 12:11:53.753296] I
> [client-handshake.c:1456:client_setvolume_cbk] 0-shares-client-5: Connected
> to 172.16.10.13:49153, attached to remote volume
> '/mnt/a-3-shares-brick-2/brick'.
> [2014-06-17 12:11:53.753301] I
> [client-handshake.c:1468:client_setvolume_cbk] 0-shares-client-5: Server
> and Client lk-version numbers are not same, reopening the fds
> [2014-06-17 12:11:53.753306] D
> [client-handshake.c:1318:client_post_handshake] 0-shares-client-5: No fds
> to open - notifying all parents child up
> [2014-06-17 12:11:53.753313] D
> [client-handshake.c:486:client_set_lk_version] 0-shares-client-5: Sending
> SET_LK_VERSION
> [2014-06-17 12:11:53.753320] T [rpc-clnt.c:1302:rpc_clnt_record]
> 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2014-06-17 12:11:53.753327] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 132, payload: 68, rpc hdr: 64
> [2014-06-17 12:11:53.753344] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x32x Program: GlusterFS Handshake,
> ProgVers: 2, Proc: 4) to rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753353] T [rpc-clnt.c:1302:rpc_clnt_record]
> 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2014-06-17 12:11:53.753360] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 64, payload: 0, rpc hdr: 64
> [2014-06-17 12:11:53.753373] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x33x Program: GlusterFS Handshake,
> ProgVers: 2, Proc: 3) to rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753381] I [afr-common.c:3698:afr_notify]
> 0-shares-replicate-2: Subvolume 'shares-client-5' came back up;
going
> online.
> [2014-06-17 12:11:53.753393] T [rpc-clnt.c:1302:rpc_clnt_record]
> 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> [2014-06-17 12:11:53.753399] T
> [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen
> 84, payload: 20, rpc hdr: 64
> [2014-06-17 12:11:53.753413] T [rpc-clnt.c:1499:rpc_clnt_submit]
> 0-rpc-clnt: submitted request (XID: 0x34x Program: GlusterFS 3.3, ProgVers:
> 330, Proc: 14) to rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753430] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-shares-client-5: received rpc message (RPC XID: 0x32x Program: GlusterFS
> Handshake, ProgVers: 2, Proc: 4) from rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753441] I
> [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-5:
> Server lk version = 1
> [2014-06-17 12:11:53.753451] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-shares-client-5: received rpc message (RPC XID: 0x33x Program: GlusterFS
> Handshake, ProgVers: 2, Proc: 3) from rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753474] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> 0-shares-client-5: received rpc message (RPC XID: 0x34x Program: GlusterFS
> 3.3, ProgVers: 330, Proc: 14) from rpc-transport (shares-client-5)
> [2014-06-17 12:11:53.753483] D [dht-diskusage.c:80:dht_du_info_cbk]
> 0-shares-dht: on subvolume 'shares-replicate-2': avail_percent is:
95.00
> and avail_space is: 1050826719232 and avail_inodes is: 99.00

And here is a log snip from the non-working user:

[2014-06-17 12:07:17.866693] W
[socket.c:514:__socket_rwv]> 0-shares-client-13: readv failed (No data available)
> [2014-06-17 12:07:17.866699] D
> [socket.c:1962:__socket_proto_state_machine] 0-shares-client-13: reading
> from socket failed. Error (No data available), peer (172.16.10.13:49155)
> [2014-06-17 12:07:17.866707] D [socket.c:2236:socket_event_handler]
> 0-transport: disconnecting now
> [2014-06-17 12:07:17.866716] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-13: cleaning
> up state in transport object 0x7f22300aaa60
> [2014-06-17 12:07:17.866722] I [client.c:2097:client_rpc_notify]
> 0-shares-client-13: disconnected
> [2014-06-17 12:07:17.866735] E [afr-common.c:3735:afr_notify]
> 0-shares-replicate-6: All subvolumes are down. Going offline until atleast
> one of them comes back up.
> [2014-06-17 12:07:17.866743] D [socket.c:486:__socket_rwv]
> 0-shares-client-14: EOF on socket
> [2014-06-17 12:07:17.866750] W [socket.c:514:__socket_rwv]
> 0-shares-client-14: readv failed (No data available)
> [2014-06-17 12:07:17.866755] D
> [socket.c:1962:__socket_proto_state_machine] 0-shares-client-14: reading
> from socket failed. Error (No data available), peer (172.16.10.12:49162)
> [2014-06-17 12:07:17.866761] D [socket.c:2236:socket_event_handler]
> 0-transport: disconnecting now
> [2014-06-17 12:07:17.866769] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-14: cleaning
> up state in transport object 0x7f2230085b60
> [2014-06-17 12:07:17.866775] I [client.c:2097:client_rpc_notify]
> 0-shares-client-14: disconnected
> [2014-06-17 12:07:17.866781] D [glfs-master.c:106:notify] 0-gfapi: got
> notify event 8
> [2014-06-17 12:07:17.866787] D [socket.c:486:__socket_rwv]
> 0-shares-client-15: EOF on socket
> [2014-06-17 12:07:17.866801] W [socket.c:514:__socket_rwv]
> 0-shares-client-15: readv failed (No data available)
> [2014-06-17 12:07:17.866807] D
> [socket.c:1962:__socket_proto_state_machine] 0-shares-client-15: reading
> from socket failed. Error (No data available), peer (172.16.10.14:49159)
> [2014-06-17 12:07:17.866813] D [socket.c:2236:socket_event_handler]
> 0-transport: disconnecting now
> [2014-06-17 12:07:17.866820] T
> [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-15: cleaning
> up state in transport object 0x7f2230060c00
> [2014-06-17 12:07:17.866827] I [client.c:2097:client_rpc_notify]
> 0-shares-client-15: disconnected
> [2014-06-17 12:07:17.866832] E [afr-common.c:3735:afr_notify]
> 0-shares-replicate-7: All subvolumes are down. Going offline until atleast
> one of them comes back up.

Note that these log snips are from the same machine, minutes apart, same
config other than the username that is connecting to the share. It almost
appears as though the vfs_glusterfs interaction with the gluster volume is
related to the username.

I am trying to relate this to other similar bugs I've been able to dig up
online. Is there a limit to the number of clients that a gluster node can
handle?

What am I missing here?

Cheers,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140617/d9978b91/attachment.html>

Niels de Vos

2014-Jun-23 15:50 UTC

head link

[Gluster-users] Samba vfs_glusterfs no such file or directory

On Tue, Jun 17, 2014 at 08:37:54AM -0400, David Gibbons
wrote:> Hi All,
> 
> I am running into a strange error with samba and vfs_glusterfs.
> 
> Here is some version information:
> [root at gfs-a-3 samba]# smbd -V
> Version 3.6.20
> 
> [root at gfs-a-3 tmp]# glusterfsd --version
> glusterfs 3.4.1 built on Oct 21 2013 09:23:23
> 
> Samba is configured in an AD environment, using winbind. Group resolution,
> user resolution, and cross--mapping of SIDs to IDs to usernames all works
> as expected. The vfs_glusterfs module is working perfectly for the vast
> majority of the users I have configured. A small percentage of the users,
> though, get an "access is denied" error when they attempt to
access the
> share. They are all configured in the same way as the users that are
> working.
> 
> We initially thought that perhaps the number of groups the user was a
> member of was causing the issue. This still might be the case but we're
not
> sure how to verify that guess.
Samba with vfs_glusterfs has a limit of approx. 93 groups. If 'id $USER'
returns more than 93 groups, those users can run into various issues.
'Access is denied' is one of the most common errors they'll see.

The upcoming 3.5.1 release has a 'server.manage-gids' volume option.  
With this option enabled, the number of groups will be limited to 65535.
> When we connect with a working user, with glusterfs:loglevel = 10, here is
> are the last bits of log file. I'm not really sure where the
interesting
> lines are, any guidance would be much appreciated:
> 
> [2014-06-17 12:11:53.753289] D
> > [client-handshake.c:1430:client_setvolume_cbk] 0-shares-client-5:
> > clnt-lk-version = 1, server-lk-version = 0
> > [2014-06-17 12:11:53.753296] I
> > [client-handshake.c:1456:client_setvolume_cbk] 0-shares-client-5:
Connected
> > to 172.16.10.13:49153, attached to remote volume
> > '/mnt/a-3-shares-brick-2/brick'.
> > [2014-06-17 12:11:53.753301] I
> > [client-handshake.c:1468:client_setvolume_cbk] 0-shares-client-5:
Server
> > and Client lk-version numbers are not same, reopening the fds
> > [2014-06-17 12:11:53.753306] D
> > [client-handshake.c:1318:client_post_handshake] 0-shares-client-5: No
fds
> > to open - notifying all parents child up
> > [2014-06-17 12:11:53.753313] D
> > [client-handshake.c:486:client_set_lk_version] 0-shares-client-5:
Sending
> > SET_LK_VERSION
> > [2014-06-17 12:11:53.753320] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> > [2014-06-17 12:11:53.753327] T
> > [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request
fraglen
> > 132, payload: 68, rpc hdr: 64
> > [2014-06-17 12:11:53.753344] T [rpc-clnt.c:1499:rpc_clnt_submit]
> > 0-rpc-clnt: submitted request (XID: 0x32x Program: GlusterFS
Handshake,
> > ProgVers: 2, Proc: 4) to rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753353] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> > [2014-06-17 12:11:53.753360] T
> > [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request
fraglen
> > 64, payload: 0, rpc hdr: 64
> > [2014-06-17 12:11:53.753373] T [rpc-clnt.c:1499:rpc_clnt_submit]
> > 0-rpc-clnt: submitted request (XID: 0x33x Program: GlusterFS
Handshake,
> > ProgVers: 2, Proc: 3) to rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753381] I [afr-common.c:3698:afr_notify]
> > 0-shares-replicate-2: Subvolume 'shares-client-5' came back
up; going
> > online.
> > [2014-06-17 12:11:53.753393] T [rpc-clnt.c:1302:rpc_clnt_record]
> > 0-shares-client-5: Auth Info: pid: 0, uid: 0, gid: 0, owner:
> > [2014-06-17 12:11:53.753399] T
> > [rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Request
fraglen
> > 84, payload: 20, rpc hdr: 64
> > [2014-06-17 12:11:53.753413] T [rpc-clnt.c:1499:rpc_clnt_submit]
> > 0-rpc-clnt: submitted request (XID: 0x34x Program: GlusterFS 3.3,
ProgVers:
> > 330, Proc: 14) to rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753430] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-shares-client-5: received rpc message (RPC XID: 0x32x Program:
GlusterFS
> > Handshake, ProgVers: 2, Proc: 4) from rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753441] I
> > [client-handshake.c:450:client_set_lk_version_cbk] 0-shares-client-5:
> > Server lk version = 1
> > [2014-06-17 12:11:53.753451] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-shares-client-5: received rpc message (RPC XID: 0x33x Program:
GlusterFS
> > Handshake, ProgVers: 2, Proc: 3) from rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753474] T [rpc-clnt.c:669:rpc_clnt_reply_init]
> > 0-shares-client-5: received rpc message (RPC XID: 0x34x Program:
GlusterFS
> > 3.3, ProgVers: 330, Proc: 14) from rpc-transport (shares-client-5)
> > [2014-06-17 12:11:53.753483] D [dht-diskusage.c:80:dht_du_info_cbk]
> > 0-shares-dht: on subvolume 'shares-replicate-2': avail_percent
is: 95.00
> > and avail_space is: 1050826719232 and avail_inodes is: 99.00
> 
> 
> And here is a log snip from the non-working user:
> 
> [2014-06-17 12:07:17.866693] W [socket.c:514:__socket_rwv]
> > 0-shares-client-13: readv failed (No data available)
> > [2014-06-17 12:07:17.866699] D
> > [socket.c:1962:__socket_proto_state_machine] 0-shares-client-13:
reading
> > from socket failed. Error (No data available), peer
(172.16.10.13:49155)
> > [2014-06-17 12:07:17.866707] D [socket.c:2236:socket_event_handler]
> > 0-transport: disconnecting now
> > [2014-06-17 12:07:17.866716] T
> > [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-13:
cleaning
> > up state in transport object 0x7f22300aaa60
> > [2014-06-17 12:07:17.866722] I [client.c:2097:client_rpc_notify]
> > 0-shares-client-13: disconnected
> > [2014-06-17 12:07:17.866735] E [afr-common.c:3735:afr_notify]
> > 0-shares-replicate-6: All subvolumes are down. Going offline until
atleast
> > one of them comes back up.
> > [2014-06-17 12:07:17.866743] D [socket.c:486:__socket_rwv]
> > 0-shares-client-14: EOF on socket
> > [2014-06-17 12:07:17.866750] W [socket.c:514:__socket_rwv]
> > 0-shares-client-14: readv failed (No data available)
> > [2014-06-17 12:07:17.866755] D
> > [socket.c:1962:__socket_proto_state_machine] 0-shares-client-14:
reading
> > from socket failed. Error (No data available), peer
(172.16.10.12:49162)
> > [2014-06-17 12:07:17.866761] D [socket.c:2236:socket_event_handler]
> > 0-transport: disconnecting now
> > [2014-06-17 12:07:17.866769] T
> > [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-14:
cleaning
> > up state in transport object 0x7f2230085b60
> > [2014-06-17 12:07:17.866775] I [client.c:2097:client_rpc_notify]
> > 0-shares-client-14: disconnected
> > [2014-06-17 12:07:17.866781] D [glfs-master.c:106:notify] 0-gfapi: got
> > notify event 8
> > [2014-06-17 12:07:17.866787] D [socket.c:486:__socket_rwv]
> > 0-shares-client-15: EOF on socket
> > [2014-06-17 12:07:17.866801] W [socket.c:514:__socket_rwv]
> > 0-shares-client-15: readv failed (No data available)
> > [2014-06-17 12:07:17.866807] D
> > [socket.c:1962:__socket_proto_state_machine] 0-shares-client-15:
reading
> > from socket failed. Error (No data available), peer
(172.16.10.14:49159)
> > [2014-06-17 12:07:17.866813] D [socket.c:2236:socket_event_handler]
> > 0-transport: disconnecting now
> > [2014-06-17 12:07:17.866820] T
> > [rpc-clnt.c:519:rpc_clnt_connection_cleanup] 0-shares-client-15:
cleaning
> > up state in transport object 0x7f2230060c00
> > [2014-06-17 12:07:17.866827] I [client.c:2097:client_rpc_notify]
> > 0-shares-client-15: disconnected
> > [2014-06-17 12:07:17.866832] E [afr-common.c:3735:afr_notify]
> > 0-shares-replicate-7: All subvolumes are down. Going offline until
atleast
> > one of them comes back up.
> 
> 
> Note that these log snips are from the same machine, minutes apart, same
> config other than the username that is connecting to the share. It almost
> appears as though the vfs_glusterfs interaction with the gluster volume is
> related to the username.
I'm not sure how vfs_glusterfs fails when the user belongs to more than 
93 groups. The sending of the READ procedure would fail, maybe this 
results in an incorrect assumption that the bricks are unreachable.
> I am trying to relate this to other similar bugs I've been able to dig
up
> online. Is there a limit to the number of clients that a gluster node can
> handle?
No, not that I am aware of.
> What am I missing here?
Very little, I would also suspect that the number of groups that those
problematic users belong to is too big.

HTH,
Niels

Gluster users - Jun 2014 - Samba vfs_glusterfs no such file or directory

[Gluster-users] Samba vfs_glusterfs no such file or directory

[Gluster-users] Samba vfs_glusterfs no such file or directory