thr3ads.net - Gluster users - [Gluster-users] Gluster-users Digest, Vol 41, Issue 16 [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Jürgen Winkler

2011-Sep-07 10:31 UTC

[Gluster-users] Gluster-users Digest, Vol 41, Issue 16

Hi Phil,

we?d the same Problem, try to compile with debug options.
Yes this sounds strange but it help?s when u are using SLES, the 
glusterd works ok and u can start to work with it.

just put

exportCFLAGS='-g3 -O0'

between %build and %configure in the glusterfs spec file.



But be warned don?t use it with important data especially when u are 
planing to use the replication feature, this will cause in data loss  
sooner or later.

Cheers !





Am 07.09.2011 11:21, schrieb gluster-users-request at
gluster.org:> Send Gluster-users mailing list submissions to
> 	gluster-users at gluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> or, via email, send a message with subject or body 'help' to
> 	gluster-users-request at gluster.org
>
> You can reach the person managing the list at
> 	gluster-users-owner at gluster.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Gluster-users digest..."
>
>
> Today's Topics:
>
>     1. Re: Reading directly from brick (Reinis Rozitis)
>     2. Re: NFS secondary groups not working. (Di Pe)
>     3. Inconsistent md5sum of replicated file (Anthony Delviscio)
>     4. Re: Inconsistent md5sum of replicated file (Pranith Kumar K)
>     5. Problems with SLES 11 (Phil Bayfield)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 6 Sep 2011 23:24:24 +0300
> From: "Reinis Rozitis"<r at roze.lv>
> Subject: Re: [Gluster-users] Reading directly from brick
> To:<gluster-users at gluster.org>
> Message-ID:<F7DAC991835C44889BCDB281F977B692 at NeiRoze>
> Content-Type: text/plain; format=flowed; charset="utf-8";
> 	reply-type=original
>
>> Simple answer - no, it's not ever safe to do writes to an active
Gluster
>> backend.
> Question was about reads though and then the answer is it is perfectly fine
> (and faster) to do reads directly from the filesystem (in replicated
setups)
> if you keep in mind that by doing so you lose the Glusters autoheal
> eature  - eg if one of the gluster nodes goes down and there is a file
> written meanwhile when the server comes up if you access the file directly
> it won't show up while it would when accessing it via the gluster mount
> point (you can work arround it by manually triggering the self heal).
>
>
>> I've heard that reads from glusterfs are around 20 times slower
than from
>> ext3:
> "20 times" might be fetched out of thin air but of course there
is a
> significant overhead of serving a file from a gluster which basically
> involves network operations and additional meta data checks versus fetching
> the file directly from iron.
>
>
> rr
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 6 Sep 2011 14:46:28 -0700
> From: Di Pe<dipeit at gmail.com>
> Subject: Re: [Gluster-users] NFS secondary groups not working.
> To: gluster-users at gluster.org
> Message-ID:
> 	<CAB9T+o+fAb+YasVxMsUsVmMw0Scp3BLSqc0Y_grusRmV11qejg at
mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Anand, has this issue been confirmed by gluster and is it in the pipe
> to get fixed or do you need .additional information? We are no gluster
> experts but are happy to help if we know who to provide additional
> debugging info.
>
> On Mon, Aug 29, 2011 at 9:44 AM, Mike Hanby<mhanby at uab.edu> 
wrote:
>> I just noticed the problem happening on one client in our environment
(clients and servers running 3.2.2), other clients work fine.
>>
>> The clients and servers are all CentOS 5.6 x86_64
>>
>> I get the same permission denied using Gluster FUSE and Gluster NFS
mounts on this client.
>>
>> I'm not mounting it with ACL.
>>
>> The volume is a simple distributed volume with two servers.
>>
>>> -----Original Message-----
>>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-
>>> bounces at gluster.org] On Behalf Of Hubert-Jan Schaminee
>>> Sent: Saturday, August 27, 2011 10:10 AM
>>> To: Anand Avati
>>> Cc: gluster-users at gluster.org
>>> Subject: Re: [Gluster-users] NFS secondary groups not working.
>>>
>>> Op zaterdag 13-08-2011 om 20:22 uur [tijdzone +0530], schreef Anand
>>> Avati:
>>>>
>>>> On Sat, Aug 13, 2011 at 5:29 PM, Dipeit<dipeit at
gmail.com>  wrote:
>>>> ? ? ? ? We noticed this bug too using the gluster client.
I'm
>>>> ? ? ? ? surprised that not more people noticed this lack of
posix
>>>> ? ? ? ? compliance. This makes gluster really unusable in
multiuser
>>>> ? ? ? ? environments. Is that because gluster is mostly used in
large
>>>> ? ? ? ? web farms like pandora?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> GlusterFS is POSIX compliant w.r.t user groups. We have not
seen this
>>>> issue in our testing. Can you give more info about your setup?
Have
>>>> you mounted with -o acl or without? Anything unusual in the
logs?
>>>>
>>>>
>>>> Avati
>>> I'm having the same problem here.
>>>
>>> I use the latest version (3.2.3 build on Aug 23 2011 19:54:51 of
the
>>> download site) on a Centos 5.6 as a gluster servers, Debian squeeze
>>> (same version) as client.
>>> I'm refused access to files and directories despite having
correct
>>> group permissions.
>>>
>>> So I installed a clean Centos client (also latest version) for a
test
>>> and everything is working perfectly .... ?
>>>
>>> The used Debian (squeeze) and Centos are 64 bits (repository from
>>> gluster.com).
>>> Using Debian testing (64 and 32 bits) and gluster from the Debian
>>> repository also denies me access in 64 and 32 bits version.
>>>
>>> I assume the mixed environment explains why this bug is rare.
>>>
>>> The used gluster installation is a basic replicated setup one with
two
>>> servers like described in the de Gluster docs.
>>>
>>>
>>> Hubert-Jan Schamin?e
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 6 Sep 2011 18:52:52 -0400
> From: Anthony Delviscio<adelviscio at gmail.com>
> Subject: [Gluster-users] Inconsistent md5sum of replicated file
> To: gluster-users at gluster.org
> Message-ID:
> 	<CAKE0inQy3Tjf3TB11kc+F_F-P7kN2CJ+eG+2FaRUxOe4tnzgwQ at
mail.gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> I was wondering if anyone would be able to shed some light on how a file
> could end up with inconsistent md5sums on Gluster backend storage.
>
>
>
> Our configuration is running on Gluster v3.1.5 in a distribute-replicate
> setup consisting of 8 bricks.
>
> Our OS is Red Hat 5.6 x86_64.  Backend storage is an ext3 RAID 5.
>
>
>
> The 8 bricks are in RR DNS and are mounted for reading/writing via NFS
> automounts.
>
>
>
> When comparing md5sums of the file from two different NFS clients, they
were
> different.
>
>
>
> The extended attributes of the files on backend storage are identical.  The
> file size and permissions are identical.  The stat data (excluding inode on
> backend storage file system) is identical.
>
> However, running md5sum on the two files, results in two different md5sums.
>
>
>
> Copying both files to another location/server and running the md5sum also
> results in no change ? they?re still different.
>
>
>
> Gluster logs do not show anything related to the filename in question.
>   Triggering
> a self-healing operation didn?t seem to do anything and it may have to do
> with the fact that the extended attributes are identical.
>
>
>
> If more information is required, let me know and I will try to accommodate.
>
>
> Thank you
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
URL:<http://gluster.org/pipermail/gluster-users/attachments/20110906/4628faa2/attachment-0001.htm>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 7 Sep 2011 14:13:56 +0530
> From: Pranith Kumar K<pranithk at gluster.com>
> Subject: Re: [Gluster-users] Inconsistent md5sum of replicated file
> To: Anthony Delviscio<adelviscio at gmail.com>
> Cc: gluster-users at gluster.org
> Message-ID:<4E672ECC.7050703 at gluster.com>
> Content-Type: text/plain; charset="windows-1252";
Format="flowed"
>
> hi Anthony,
>         Could you send the output of the getfattr -d -m . -e hex
> <filepath>  on both the bricks and also the stat output on the both
the
> backends. Give the outputs for its parent directory also.
>
> Pranith.
>
> On 09/07/2011 04:22 AM, Anthony Delviscio wrote:
>> I was wondering if anyone would be able to shed some light on how a
>> file could end up with inconsistent md5sums on Gluster backend storage.
>>
>> Our configuration is running on Gluster v3.1.5 in a
>> distribute-replicate setup consisting of 8 bricks.
>>
>> Our OS is Red Hat 5.6 x86_64.Backend storage is an ext3 RAID 5.
>>
>> The 8 bricks are in RR DNS and are mounted for reading/writing via NFS
>> automounts.
>>
>> When comparing md5sums of the file from two different NFS clients,
>> they were different.
>>
>> The extended attributes of the files on backend storage are
>> identical.The file size and permissions are identical.The stat data
>> (excluding inode on backend storage file system) is identical.
>>
>> However, running md5sum on the two files, results in two different
>> md5sums.
>>
>> Copying both files to another location/server and running the md5sum
>> also results in no change ? they?re still different.
>>
>> Gluster logs do not show anything related to the filename in
>> question.Triggering a self-healing operation didn?t seem to do
>> anything and it may have to do with the fact that the extended
>> attributes are identical.
>>
>> If more information is required, let me know and I will try to
>> accommodate.
>>
>> Thank you
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
URL:<http://gluster.org/pipermail/gluster-users/attachments/20110907/86d14cab/attachment-0001.htm>
>
> ------------------------------
>
> Message: 5
> Date: Wed, 7 Sep 2011 10:15:43 +0100
> From: Phil Bayfield<phil at techlightenment.com>
> Subject: [Gluster-users] Problems with SLES 11
> To: gluster-users at gluster.org
> Message-ID:
> 	<CAFXH-fW0DBE9YomJzAtvdFAWaf5Zpq-TfbfTPb+K7gBu-R+06Q at
mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi there,
>
> I compiled and installed the latest version of Gluster on a couple of SLES
> 11 SP1 boxes, everything up to this point seemed ok.
>
> I start the daemon on both boxes, and both are listening on 24007.
>
> I issue a "gluster peer probe"  command on one of the boxes and
the daemon
> instantly dies, I restart it and it shows:
>
> # gluster peer status
> Number of Peers: 1
>
> Hostname: mckalcpap02
> Uuid: 00000000-0000-0000-0000-000000000000
> State: Establishing Connection (Connected)
>
> I attempted to run the probe on the other box, the daemon crashes, now as I
> start the daemon on each box the daemon just crashes on the other box.
>
> The log output immediately prior to the crash is as follows:
>
> [2011-06-07 08:05:10.700710] I
> [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: Received CLI
> probe req mckalcpap02 24007
> [2011-06-07 08:05:10.701058] I
[glusterd-handler.c:391:glusterd_friend_find]
> 0-glusterd: Unable to find hostname: mckalcpap02
> [2011-06-07 08:05:10.701086] I
> [glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to find
> peerinfo for host: mckalcpap02 (24007)
> [2011-06-07 08:05:10.702832] I
[glusterd-handler.c:3404:glusterd_friend_add]
> 0-glusterd: connect returned 0
> [2011-06-07 08:05:10.703110] I
> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using Program
> glusterd clnt mgmt, Num (1238433), Version (1)
>
> If I use the IP address the same thing happens:
>
> [2011-06-07 08:07:12.873075] I
> [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: Received CLI
> probe req 10.9.54.2 24007
> [2011-06-07 08:07:12.873410] I
[glusterd-handler.c:391:glusterd_friend_find]
> 0-glusterd: Unable to find hostname: 10.9.54.2
> [2011-06-07 08:07:12.873438] I
> [glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to find
> peerinfo for host: 10.9.54.2 (24007)
> [2011-06-07 08:07:12.875046] I
[glusterd-handler.c:3404:glusterd_friend_add]
> 0-glusterd: connect returned 0
> [2011-06-07 08:07:12.875280] I
> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using Program
> glusterd clnt mgmt, Num (1238433), Version (1)
>
> There is no firewall issue:
>
> # telnet mckalcpap02 24007
> Trying 10.9.54.2...
> Connected to mckalcpap02.
> Escape character is '^]'.
>
> Following restart (which crashes the other node) the log output is as
> follows:
>
> [2011-06-07 08:10:09.616486] I [glusterd.c:564:init] 0-management: Using
> /etc/glusterd as working directory
> [2011-06-07 08:10:09.617619] C [rdma.c:3933:rdma_init]
0-rpc-transport/rdma:
> Failed to get IB devices
> [2011-06-07 08:10:09.617676] E [rdma.c:4812:init] 0-rdma.management: Failed
> to initialize IB Device
> [2011-06-07 08:10:09.617700] E [rpc-transport.c:741:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2011-06-07 08:10:09.617724] W [rpcsvc.c:1288:rpcsvc_transport_create]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2011-06-07 08:10:09.617830] I [glusterd.c:88:glusterd_uuid_init]
> 0-glusterd: retrieved UUID: 1e344f5d-6904-4d14-9be2-8f0f44b97dd7
> [2011-06-07 08:10:11.258098] I
[glusterd-handler.c:3404:glusterd_friend_add]
> 0-glusterd: connect returned 0
> Given volfile:
>
+------------------------------------------------------------------------------+
>    1: volume management
>    2:     type mgmt/glusterd
>    3:     option working-directory /etc/glusterd
>    4:     option transport-type socket,rdma
>    5:     option transport.socket.keepalive-time 10
>    6:     option transport.socket.keepalive-interval 2
>    7: end-volume
>    8:
>
>
+------------------------------------------------------------------------------+
> [2011-06-07 08:10:11.258431] I
> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using Program
> glusterd clnt mgmt, Num (1238433), Version (1)
> [2011-06-07 08:10:11.280533] W [socket.c:1494:__socket_proto_state_machine]
> 0-socket.management: reading from socket failed. Error (Transport endpoint
> is not connected), peer (10.9.54.2:1023)
> [2011-06-07 08:10:11.280595] W [socket.c:1494:__socket_proto_state_machine]
> 0-management: reading from socket failed. Error (Transport endpoint is not
> connected), peer (10.9.54.2:24007)
> [2011-06-07 08:10:17.256235] E [socket.c:1685:socket_connect_finish]
> 0-management: connection to 10.9.54.2:24007 failed (Connection refused)
>
> There are no logs on the node which crashes.
>
> I've tried various possibly solutions from searching the net but got
getting
> anywhere, can anyone advise how to proceed?
>
> Thanks,
> Phil.
>

Phil Bayfield

2011-Sep-07 11:12 UTC

head link

[Gluster-users] Gluster-users Digest, Vol 41, Issue 16

Hi J?rgen,

Thanks for your advice. I'll setup some vms later and give this a try.

On my prod boxes I've compiled and installed 3.0.8 this morning as had used
3.0.x previously without issue.

Using glusterfs-volgen based configuration it's up and running nicely
without any problems.

Cheers,
Phil.

On 7 September 2011 11:31, J?rgen Winkler <juergen.winkler at
xidras.com>wrote:
> Hi Phil,
>
> we?d the same Problem, try to compile with debug options.
> Yes this sounds strange but it help?s when u are using SLES, the glusterd
> works ok and u can start to work with it.
>
> just put
>
> exportCFLAGS='-g3 -O0'
>
> between %build and %configure in the glusterfs spec file.
>
>
>
> But be warned don?t use it with important data especially when u are
> planing to use the replication feature, this will cause in data loss 
sooner
> or later.
>
> Cheers !
>
>
>
>
>
> Am 07.09.2011 11:21, schrieb gluster-users-request at
gluster.**org<gluster-users-request at gluster.org>
> :
>
>> Send Gluster-users mailing list submissions to
>>        gluster-users at gluster.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>       
http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>> or, via email, send a message with subject or body 'help' to
>>        gluster-users-request at gluster.**org<gluster-users-request
at gluster.org>
>>
>> You can reach the person managing the list at
>>        gluster-users-owner at gluster.**org<gluster-users-owner at
gluster.org>
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Gluster-users digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: Reading directly from brick (Reinis Rozitis)
>>    2. Re: NFS secondary groups not working. (Di Pe)
>>    3. Inconsistent md5sum of replicated file (Anthony Delviscio)
>>    4. Re: Inconsistent md5sum of replicated file (Pranith Kumar K)
>>    5. Problems with SLES 11 (Phil Bayfield)
>>
>>
>> ------------------------------**------------------------------**
>> ----------
>>
>> Message: 1
>> Date: Tue, 6 Sep 2011 23:24:24 +0300
>> From: "Reinis Rozitis"<r at roze.lv>
>> Subject: Re: [Gluster-users] Reading directly from brick
>> To:<gluster-users at gluster.org>
>> Message-ID:<**F7DAC991835C44889BCDB281F977B6**92 at NeiRoze>
>> Content-Type: text/plain; format=flowed; charset="utf-8";
>>        reply-type=original
>>
>>  Simple answer - no, it's not ever safe to do writes to an active
Gluster
>>> backend.
>>>
>> Question was about reads though and then the answer is it is perfectly
>> fine
>> (and faster) to do reads directly from the filesystem (in replicated
>> setups)
>> if you keep in mind that by doing so you lose the Glusters autoheal
>> eature  - eg if one of the gluster nodes goes down and there is a file
>> written meanwhile when the server comes up if you access the file
directly
>> it won't show up while it would when accessing it via the gluster
mount
>> point (you can work arround it by manually triggering the self heal).
>>
>>
>>  I've heard that reads from glusterfs are around 20 times slower
than from
>>> ext3:
>>>
>> "20 times" might be fetched out of thin air but of course
there is a
>> significant overhead of serving a file from a gluster which basically
>> involves network operations and additional meta data checks versus
>> fetching
>> the file directly from iron.
>>
>>
>> rr
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 6 Sep 2011 14:46:28 -0700
>> From: Di Pe<dipeit at gmail.com>
>> Subject: Re: [Gluster-users] NFS secondary groups not working.
>> To: gluster-users at gluster.org
>> Message-ID:
>>        <CAB9T+o+fAb+**YasVxMsUsVmMw0Scp3BLSqc0Y_**
>> grusRmV11qejg at
mail.gmail.com<CAB9T%2Bo%2BfAb%2BYasVxMsUsVmMw0Scp3BLSqc0Y_grusRmV11qejg at
mail.gmail.com>
>> >
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Anand, has this issue been confirmed by gluster and is it in the pipe
>> to get fixed or do you need .additional information? We are no gluster
>> experts but are happy to help if we know who to provide additional
>> debugging info.
>>
>> On Mon, Aug 29, 2011 at 9:44 AM, Mike Hanby<mhanby at uab.edu> 
wrote:
>>
>>> I just noticed the problem happening on one client in our
environment
>>> (clients and servers running 3.2.2), other clients work fine.
>>>
>>> The clients and servers are all CentOS 5.6 x86_64
>>>
>>> I get the same permission denied using Gluster FUSE and Gluster NFS
>>> mounts on this client.
>>>
>>> I'm not mounting it with ACL.
>>>
>>> The volume is a simple distributed volume with two servers.
>>>
>>>  -----Original Message-----
>>>> From: gluster-users-bounces at
gluster.**org<gluster-users-bounces at gluster.org>[mailto:
>>>> gluster-users-
>>>> bounces at gluster.org] On Behalf Of Hubert-Jan Schaminee
>>>> Sent: Saturday, August 27, 2011 10:10 AM
>>>> To: Anand Avati
>>>> Cc: gluster-users at gluster.org
>>>> Subject: Re: [Gluster-users] NFS secondary groups not working.
>>>>
>>>> Op zaterdag 13-08-2011 om 20:22 uur [tijdzone +0530], schreef
Anand
>>>> Avati:
>>>>
>>>>>
>>>>> On Sat, Aug 13, 2011 at 5:29 PM, Dipeit<dipeit at
gmail.com>  wrote:
>>>>> ? ? ? ? We noticed this bug too using the gluster client.
I'm
>>>>> ? ? ? ? surprised that not more people noticed this lack of
posix
>>>>> ? ? ? ? compliance. This makes gluster really unusable in
multiuser
>>>>> ? ? ? ? environments. Is that because gluster is mostly
used in large
>>>>> ? ? ? ? web farms like pandora?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> GlusterFS is POSIX compliant w.r.t user groups. We have not
seen this
>>>>> issue in our testing. Can you give more info about your
setup? Have
>>>>> you mounted with -o acl or without? Anything unusual in the
logs?
>>>>>
>>>>>
>>>>> Avati
>>>>>
>>>> I'm having the same problem here.
>>>>
>>>> I use the latest version (3.2.3 build on Aug 23 2011 19:54:51
of the
>>>> download site) on a Centos 5.6 as a gluster servers, Debian
squeeze
>>>> (same version) as client.
>>>> I'm refused access to files and directories despite having
correct
>>>> group permissions.
>>>>
>>>> So I installed a clean Centos client (also latest version) for
a test
>>>> and everything is working perfectly .... ?
>>>>
>>>> The used Debian (squeeze) and Centos are 64 bits (repository
from
>>>> gluster.com).
>>>> Using Debian testing (64 and 32 bits) and gluster from the
Debian
>>>> repository also denies me access in 64 and 32 bits version.
>>>>
>>>> I assume the mixed environment explains why this bug is rare.
>>>>
>>>> The used gluster installation is a basic replicated setup one
with two
>>>> servers like described in the de Gluster docs.
>>>>
>>>>
>>>> Hubert-Jan Schamin?e
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ______________________________**_________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>>
http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>>>>
>>> ______________________________**_________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>>
http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>>>
>>>
>> ------------------------------
>>
>> Message: 3
>> Date: Tue, 6 Sep 2011 18:52:52 -0400
>> From: Anthony Delviscio<adelviscio at gmail.com**>
>> Subject: [Gluster-users] Inconsistent md5sum of replicated file
>> To: gluster-users at gluster.org
>> Message-ID:
>>        <CAKE0inQy3Tjf3TB11kc+F_F-**P7kN2CJ+eG+2FaRUxOe4tnzgwQ@**
>>
mail.gmail.com<CAKE0inQy3Tjf3TB11kc%2BF_F-P7kN2CJ%2BeG%2B2FaRUxOe4tnzgwQ at
mail.gmail.com>
>> >
>> Content-Type: text/plain; charset="windows-1252"
>>
>> I was wondering if anyone would be able to shed some light on how a
file
>> could end up with inconsistent md5sums on Gluster backend storage.
>>
>>
>>
>> Our configuration is running on Gluster v3.1.5 in a
distribute-replicate
>> setup consisting of 8 bricks.
>>
>> Our OS is Red Hat 5.6 x86_64.  Backend storage is an ext3 RAID 5.
>>
>>
>>
>> The 8 bricks are in RR DNS and are mounted for reading/writing via NFS
>> automounts.
>>
>>
>>
>> When comparing md5sums of the file from two different NFS clients, they
>> were
>> different.
>>
>>
>>
>> The extended attributes of the files on backend storage are identical.
>>  The
>> file size and permissions are identical.  The stat data (excluding
inode
>> on
>> backend storage file system) is identical.
>>
>> However, running md5sum on the two files, results in two different
>> md5sums.
>>
>>
>>
>> Copying both files to another location/server and running the md5sum
also
>> results in no change ? they?re still different.
>>
>>
>>
>> Gluster logs do not show anything related to the filename in question.
>>  Triggering
>> a self-healing operation didn?t seem to do anything and it may have to
do
>> with the fact that the extended attributes are identical.
>>
>>
>>
>> If more information is required, let me know and I will try to
>> accommodate.
>>
>>
>> Thank you
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:<http://gluster.org/**pipermail/gluster-users/**
>>
attachments/20110906/4628faa2/**attachment-0001.htm<http://gluster.org/pipermail/gluster-users/attachments/20110906/4628faa2/attachment-0001.htm>
>> >
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Wed, 7 Sep 2011 14:13:56 +0530
>> From: Pranith Kumar K<pranithk at gluster.com>
>> Subject: Re: [Gluster-users] Inconsistent md5sum of replicated file
>> To: Anthony Delviscio<adelviscio at gmail.com**>
>> Cc: gluster-users at gluster.org
>> Message-ID:<4E672ECC.7050703@**gluster.com <4E672ECC.7050703 at
gluster.com>
>> >
>> Content-Type: text/plain; charset="windows-1252";
Format="flowed"
>>
>> hi Anthony,
>>        Could you send the output of the getfattr -d -m . -e hex
>> <filepath>  on both the bricks and also the stat output on the
both the
>> backends. Give the outputs for its parent directory also.
>>
>> Pranith.
>>
>> On 09/07/2011 04:22 AM, Anthony Delviscio wrote:
>>
>>> I was wondering if anyone would be able to shed some light on how a
>>> file could end up with inconsistent md5sums on Gluster backend
storage.
>>>
>>> Our configuration is running on Gluster v3.1.5 in a
>>> distribute-replicate setup consisting of 8 bricks.
>>>
>>> Our OS is Red Hat 5.6 x86_64.Backend storage is an ext3 RAID 5.
>>>
>>> The 8 bricks are in RR DNS and are mounted for reading/writing via
NFS
>>> automounts.
>>>
>>> When comparing md5sums of the file from two different NFS clients,
>>> they were different.
>>>
>>> The extended attributes of the files on backend storage are
>>> identical.The file size and permissions are identical.The stat data
>>> (excluding inode on backend storage file system) is identical.
>>>
>>> However, running md5sum on the two files, results in two different
>>> md5sums.
>>>
>>> Copying both files to another location/server and running the
md5sum
>>> also results in no change ? they?re still different.
>>>
>>> Gluster logs do not show anything related to the filename in
>>> question.Triggering a self-healing operation didn?t seem to do
>>> anything and it may have to do with the fact that the extended
>>> attributes are identical.
>>>
>>> If more information is required, let me know and I will try to
>>> accommodate.
>>>
>>> Thank you
>>>
>>>
>>> ______________________________**_________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>>
http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:<http://gluster.org/**pipermail/gluster-users/**
>>
attachments/20110907/86d14cab/**attachment-0001.htm<http://gluster.org/pipermail/gluster-users/attachments/20110907/86d14cab/attachment-0001.htm>
>> >
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Wed, 7 Sep 2011 10:15:43 +0100
>> From: Phil Bayfield<phil at techlightenment.**com <phil at
techlightenment.com>
>> >
>> Subject: [Gluster-users] Problems with SLES 11
>> To: gluster-users at gluster.org
>> Message-ID:
>>        <CAFXH-**fW0DBE9YomJzAtvdFAWaf5Zpq-**TfbfTPb+K7gBu-R+06Q at
mail.**
>> gmail.com<CAFXH-fW0DBE9YomJzAtvdFAWaf5Zpq-TfbfTPb%2BK7gBu-R%2B06Q at
mail.gmail.com>
>> >
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi there,
>>
>> I compiled and installed the latest version of Gluster on a couple of
SLES
>> 11 SP1 boxes, everything up to this point seemed ok.
>>
>> I start the daemon on both boxes, and both are listening on 24007.
>>
>> I issue a "gluster peer probe"  command on one of the boxes
and the daemon
>> instantly dies, I restart it and it shows:
>>
>> # gluster peer status
>> Number of Peers: 1
>>
>> Hostname: mckalcpap02
>> Uuid: 00000000-0000-0000-0000-**000000000000
>> State: Establishing Connection (Connected)
>>
>> I attempted to run the probe on the other box, the daemon crashes, now
as
>> I
>> start the daemon on each box the daemon just crashes on the other box.
>>
>> The log output immediately prior to the crash is as follows:
>>
>> [2011-06-07 08:05:10.700710] I
>> [glusterd-handler.c:623:**glusterd_handle_cli_probe] 0-glusterd:
Received
>> CLI
>> probe req mckalcpap02 24007
>> [2011-06-07 08:05:10.701058] I [glusterd-handler.c:391:**
>> glusterd_friend_find]
>> 0-glusterd: Unable to find hostname: mckalcpap02
>> [2011-06-07 08:05:10.701086] I
>> [glusterd-handler.c:3422:**glusterd_probe_begin] 0-glusterd: Unable to
>> find
>> peerinfo for host: mckalcpap02 (24007)
>> [2011-06-07 08:05:10.702832] I [glusterd-handler.c:3404:**
>> glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> [2011-06-07 08:05:10.703110] I
>> [glusterd-handshake.c:317:**glusterd_set_clnt_mgmt_**program] 0-: Using
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>>
>> If I use the IP address the same thing happens:
>>
>> [2011-06-07 08:07:12.873075] I
>> [glusterd-handler.c:623:**glusterd_handle_cli_probe] 0-glusterd:
Received
>> CLI
>> probe req 10.9.54.2 24007
>> [2011-06-07 08:07:12.873410] I [glusterd-handler.c:391:**
>> glusterd_friend_find]
>> 0-glusterd: Unable to find hostname: 10.9.54.2
>> [2011-06-07 08:07:12.873438] I
>> [glusterd-handler.c:3422:**glusterd_probe_begin] 0-glusterd: Unable to
>> find
>> peerinfo for host: 10.9.54.2 (24007)
>> [2011-06-07 08:07:12.875046] I [glusterd-handler.c:3404:**
>> glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> [2011-06-07 08:07:12.875280] I
>> [glusterd-handshake.c:317:**glusterd_set_clnt_mgmt_**program] 0-: Using
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>>
>> There is no firewall issue:
>>
>> # telnet mckalcpap02 24007
>> Trying 10.9.54.2...
>> Connected to mckalcpap02.
>> Escape character is '^]'.
>>
>> Following restart (which crashes the other node) the log output is as
>> follows:
>>
>> [2011-06-07 08:10:09.616486] I [glusterd.c:564:init] 0-management:
Using
>> /etc/glusterd as working directory
>> [2011-06-07 08:10:09.617619] C [rdma.c:3933:rdma_init]
>> 0-rpc-transport/rdma:
>> Failed to get IB devices
>> [2011-06-07 08:10:09.617676] E [rdma.c:4812:init] 0-rdma.management:
>> Failed
>> to initialize IB Device
>> [2011-06-07 08:10:09.617700] E
[rpc-transport.c:741:rpc_**transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2011-06-07 08:10:09.617724] W
[rpcsvc.c:1288:rpcsvc_**transport_create]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2011-06-07 08:10:09.617830] I [glusterd.c:88:glusterd_uuid_**init]
>> 0-glusterd: retrieved UUID: 1e344f5d-6904-4d14-9be2-**8f0f44b97dd7
>> [2011-06-07 08:10:11.258098] I [glusterd-handler.c:3404:**
>> glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> Given volfile:
>> +-----------------------------**------------------------------**
>> -------------------+
>>   1: volume management
>>   2:     type mgmt/glusterd
>>   3:     option working-directory /etc/glusterd
>>   4:     option transport-type socket,rdma
>>   5:     option transport.socket.keepalive-**time 10
>>   6:     option transport.socket.keepalive-**interval 2
>>   7: end-volume
>>   8:
>>
>> +-----------------------------**------------------------------**
>> -------------------+
>> [2011-06-07 08:10:11.258431] I
>> [glusterd-handshake.c:317:**glusterd_set_clnt_mgmt_**program] 0-: Using
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>> [2011-06-07 08:10:11.280533] W [socket.c:1494:__socket_proto_**
>> state_machine]
>> 0-socket.management: reading from socket failed. Error (Transport
endpoint
>> is not connected), peer (10.9.54.2:1023)
>> [2011-06-07 08:10:11.280595] W [socket.c:1494:__socket_proto_**
>> state_machine]
>> 0-management: reading from socket failed. Error (Transport endpoint is
not
>> connected), peer (10.9.54.2:24007)
>> [2011-06-07 08:10:17.256235] E [socket.c:1685:socket_connect_**finish]
>> 0-management: connection to 10.9.54.2:24007 failed (Connection refused)
>>
>> There are no logs on the node which crashes.
>>
>> I've tried various possibly solutions from searching the net but
got
>> getting
>> anywhere, can anyone advise how to proceed?
>>
>> Thanks,
>> Phil.
>>
>>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
>
http://gluster.org/cgi-bin/**mailman/listinfo/gluster-users<http://gluster.org/cgi-bin/mailman/listinfo/gluster-users>
>


-- 
Phil Bayfield
Development Manager
Alchemy Social, part of Techlightenment, an Experian company

Office 202 | 89 Worship Street | London | EC2A 2BF

t:   +44 (0) 207 392 2618
m: +44 (0) 7825 561 091
e:  phil at techlightenment.com
<phil at techlightenment.com>skype: phil.tl

www.techlightenment.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110907/a1935629/attachment.html>

Pranith Kumar K

2011-Sep-07 11:41 UTC

head link

[Gluster-users] Gluster-users Digest, Vol 41, Issue 16

hi,
       Could you please elaborate on replication causing data loss. 
Please let us know the test case which lead you to this.

Pranith.

On 09/07/2011 04:01 PM, J?rgen Winkler wrote:> Hi Phil,
>
> we?d the same Problem, try to compile with debug options.
> Yes this sounds strange but it help?s when u are using SLES, the 
> glusterd works ok and u can start to work with it.
>
> just put
>
> exportCFLAGS='-g3 -O0'
>
> between %build and %configure in the glusterfs spec file.
>
>
>
> But be warned don?t use it with important data especially when u are 
> planing to use the replication feature, this will cause in data loss  
> sooner or later.
>
> Cheers !
>
>
>
>
>
> Am 07.09.2011 11:21, schrieb gluster-users-request at gluster.org:
>> Send Gluster-users mailing list submissions to
>>     gluster-users at gluster.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>     http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> or, via email, send a message with subject or body 'help' to
>>     gluster-users-request at gluster.org
>>
>> You can reach the person managing the list at
>>     gluster-users-owner at gluster.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Gluster-users digest..."
>>
>>
>> Today's Topics:
>>
>>     1. Re: Reading directly from brick (Reinis Rozitis)
>>     2. Re: NFS secondary groups not working. (Di Pe)
>>     3. Inconsistent md5sum of replicated file (Anthony Delviscio)
>>     4. Re: Inconsistent md5sum of replicated file (Pranith Kumar K)
>>     5. Problems with SLES 11 (Phil Bayfield)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 6 Sep 2011 23:24:24 +0300
>> From: "Reinis Rozitis"<r at roze.lv>
>> Subject: Re: [Gluster-users] Reading directly from brick
>> To:<gluster-users at gluster.org>
>> Message-ID:<F7DAC991835C44889BCDB281F977B692 at NeiRoze>
>> Content-Type: text/plain; format=flowed; charset="utf-8";
>>     reply-type=original
>>
>>> Simple answer - no, it's not ever safe to do writes to an
active
>>> Gluster
>>> backend.
>> Question was about reads though and then the answer is it is 
>> perfectly fine
>> (and faster) to do reads directly from the filesystem (in replicated 
>> setups)
>> if you keep in mind that by doing so you lose the Glusters autoheal
>> eature  - eg if one of the gluster nodes goes down and there is a file
>> written meanwhile when the server comes up if you access the file 
>> directly
>> it won't show up while it would when accessing it via the gluster
mount
>> point (you can work arround it by manually triggering the self heal).
>>
>>
>>> I've heard that reads from glusterfs are around 20 times slower
than
>>> from
>>> ext3:
>> "20 times" might be fetched out of thin air but of course
there is a
>> significant overhead of serving a file from a gluster which basically
>> involves network operations and additional meta data checks versus 
>> fetching
>> the file directly from iron.
>>
>>
>> rr
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 6 Sep 2011 14:46:28 -0700
>> From: Di Pe<dipeit at gmail.com>
>> Subject: Re: [Gluster-users] NFS secondary groups not working.
>> To: gluster-users at gluster.org
>> Message-ID:
>> <CAB9T+o+fAb+YasVxMsUsVmMw0Scp3BLSqc0Y_grusRmV11qejg at
mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Anand, has this issue been confirmed by gluster and is it in the pipe
>> to get fixed or do you need .additional information? We are no gluster
>> experts but are happy to help if we know who to provide additional
>> debugging info.
>>
>> On Mon, Aug 29, 2011 at 9:44 AM, Mike Hanby<mhanby at uab.edu> 
wrote:
>>> I just noticed the problem happening on one client in our 
>>> environment (clients and servers running 3.2.2), other clients work
>>> fine.
>>>
>>> The clients and servers are all CentOS 5.6 x86_64
>>>
>>> I get the same permission denied using Gluster FUSE and Gluster NFS
>>> mounts on this client.
>>>
>>> I'm not mounting it with ACL.
>>>
>>> The volume is a simple distributed volume with two servers.
>>>
>>>> -----Original Message-----
>>>> From: gluster-users-bounces at gluster.org
[mailto:gluster-users-
>>>> bounces at gluster.org] On Behalf Of Hubert-Jan Schaminee
>>>> Sent: Saturday, August 27, 2011 10:10 AM
>>>> To: Anand Avati
>>>> Cc: gluster-users at gluster.org
>>>> Subject: Re: [Gluster-users] NFS secondary groups not working.
>>>>
>>>> Op zaterdag 13-08-2011 om 20:22 uur [tijdzone +0530], schreef
Anand
>>>> Avati:
>>>>>
>>>>> On Sat, Aug 13, 2011 at 5:29 PM, Dipeit<dipeit at
gmail.com>  wrote:
>>>>> ? ? ? ? We noticed this bug too using the gluster client.
I'm
>>>>> ? ? ? ? surprised that not more people noticed this lack of
posix
>>>>> ? ? ? ? compliance. This makes gluster really unusable in
multiuser
>>>>> ? ? ? ? environments. Is that because gluster is mostly
used in large
>>>>> ? ? ? ? web farms like pandora?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> GlusterFS is POSIX compliant w.r.t user groups. We have not
seen this
>>>>> issue in our testing. Can you give more info about your
setup? Have
>>>>> you mounted with -o acl or without? Anything unusual in the
logs?
>>>>>
>>>>>
>>>>> Avati
>>>> I'm having the same problem here.
>>>>
>>>> I use the latest version (3.2.3 build on Aug 23 2011 19:54:51
of the
>>>> download site) on a Centos 5.6 as a gluster servers, Debian
squeeze
>>>> (same version) as client.
>>>> I'm refused access to files and directories despite having
correct
>>>> group permissions.
>>>>
>>>> So I installed a clean Centos client (also latest version) for
a test
>>>> and everything is working perfectly .... ?
>>>>
>>>> The used Debian (squeeze) and Centos are 64 bits (repository
from
>>>> gluster.com).
>>>> Using Debian testing (64 and 32 bits) and gluster from the
Debian
>>>> repository also denies me access in 64 and 32 bits version.
>>>>
>>>> I assume the mixed environment explains why this bug is rare.
>>>>
>>>> The used gluster installation is a basic replicated setup one
with two
>>>> servers like described in the de Gluster docs.
>>>>
>>>>
>>>> Hubert-Jan Schamin?e
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Tue, 6 Sep 2011 18:52:52 -0400
>> From: Anthony Delviscio<adelviscio at gmail.com>
>> Subject: [Gluster-users] Inconsistent md5sum of replicated file
>> To: gluster-users at gluster.org
>> Message-ID:
>> <CAKE0inQy3Tjf3TB11kc+F_F-P7kN2CJ+eG+2FaRUxOe4tnzgwQ at
mail.gmail.com>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> I was wondering if anyone would be able to shed some light on how a
file
>> could end up with inconsistent md5sums on Gluster backend storage.
>>
>>
>>
>> Our configuration is running on Gluster v3.1.5 in a
distribute-replicate
>> setup consisting of 8 bricks.
>>
>> Our OS is Red Hat 5.6 x86_64.  Backend storage is an ext3 RAID 5.
>>
>>
>>
>> The 8 bricks are in RR DNS and are mounted for reading/writing via NFS
>> automounts.
>>
>>
>>
>> When comparing md5sums of the file from two different NFS clients, 
>> they were
>> different.
>>
>>
>>
>> The extended attributes of the files on backend storage are 
>> identical.  The
>> file size and permissions are identical.  The stat data (excluding 
>> inode on
>> backend storage file system) is identical.
>>
>> However, running md5sum on the two files, results in two different 
>> md5sums.
>>
>>
>>
>> Copying both files to another location/server and running the md5sum 
>> also
>> results in no change ? they?re still different.
>>
>>
>>
>> Gluster logs do not show anything related to the filename in question.
>>   Triggering
>> a self-healing operation didn?t seem to do anything and it may have 
>> to do
>> with the fact that the extended attributes are identical.
>>
>>
>>
>> If more information is required, let me know and I will try to 
>> accommodate.
>>
>>
>> Thank you
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>>
URL:<http://gluster.org/pipermail/gluster-users/attachments/20110906/4628faa2/attachment-0001.htm>
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Wed, 7 Sep 2011 14:13:56 +0530
>> From: Pranith Kumar K<pranithk at gluster.com>
>> Subject: Re: [Gluster-users] Inconsistent md5sum of replicated file
>> To: Anthony Delviscio<adelviscio at gmail.com>
>> Cc: gluster-users at gluster.org
>> Message-ID:<4E672ECC.7050703 at gluster.com>
>> Content-Type: text/plain; charset="windows-1252";
Format="flowed"
>>
>> hi Anthony,
>>         Could you send the output of the getfattr -d -m . -e hex
>> <filepath>  on both the bricks and also the stat output on the
both the
>> backends. Give the outputs for its parent directory also.
>>
>> Pranith.
>>
>> On 09/07/2011 04:22 AM, Anthony Delviscio wrote:
>>> I was wondering if anyone would be able to shed some light on how a
>>> file could end up with inconsistent md5sums on Gluster backend
storage.
>>>
>>> Our configuration is running on Gluster v3.1.5 in a
>>> distribute-replicate setup consisting of 8 bricks.
>>>
>>> Our OS is Red Hat 5.6 x86_64.Backend storage is an ext3 RAID 5.
>>>
>>> The 8 bricks are in RR DNS and are mounted for reading/writing via
NFS
>>> automounts.
>>>
>>> When comparing md5sums of the file from two different NFS clients,
>>> they were different.
>>>
>>> The extended attributes of the files on backend storage are
>>> identical.The file size and permissions are identical.The stat data
>>> (excluding inode on backend storage file system) is identical.
>>>
>>> However, running md5sum on the two files, results in two different
>>> md5sums.
>>>
>>> Copying both files to another location/server and running the
md5sum
>>> also results in no change ? they?re still different.
>>>
>>> Gluster logs do not show anything related to the filename in
>>> question.Triggering a self-healing operation didn?t seem to do
>>> anything and it may have to do with the fact that the extended
>>> attributes are identical.
>>>
>>> If more information is required, let me know and I will try to
>>> accommodate.
>>>
>>> Thank you
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>>
URL:<http://gluster.org/pipermail/gluster-users/attachments/20110907/86d14cab/attachment-0001.htm>
>>
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Wed, 7 Sep 2011 10:15:43 +0100
>> From: Phil Bayfield<phil at techlightenment.com>
>> Subject: [Gluster-users] Problems with SLES 11
>> To: gluster-users at gluster.org
>> Message-ID:
>> <CAFXH-fW0DBE9YomJzAtvdFAWaf5Zpq-TfbfTPb+K7gBu-R+06Q at
mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi there,
>>
>> I compiled and installed the latest version of Gluster on a couple of 
>> SLES
>> 11 SP1 boxes, everything up to this point seemed ok.
>>
>> I start the daemon on both boxes, and both are listening on 24007.
>>
>> I issue a "gluster peer probe"  command on one of the boxes
and the
>> daemon
>> instantly dies, I restart it and it shows:
>>
>> # gluster peer status
>> Number of Peers: 1
>>
>> Hostname: mckalcpap02
>> Uuid: 00000000-0000-0000-0000-000000000000
>> State: Establishing Connection (Connected)
>>
>> I attempted to run the probe on the other box, the daemon crashes, 
>> now as I
>> start the daemon on each box the daemon just crashes on the other box.
>>
>> The log output immediately prior to the crash is as follows:
>>
>> [2011-06-07 08:05:10.700710] I
>> [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: 
>> Received CLI
>> probe req mckalcpap02 24007
>> [2011-06-07 08:05:10.701058] I 
>> [glusterd-handler.c:391:glusterd_friend_find]
>> 0-glusterd: Unable to find hostname: mckalcpap02
>> [2011-06-07 08:05:10.701086] I
>> [glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to 
>> find
>> peerinfo for host: mckalcpap02 (24007)
>> [2011-06-07 08:05:10.702832] I 
>> [glusterd-handler.c:3404:glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> [2011-06-07 08:05:10.703110] I
>> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using 
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>>
>> If I use the IP address the same thing happens:
>>
>> [2011-06-07 08:07:12.873075] I
>> [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: 
>> Received CLI
>> probe req 10.9.54.2 24007
>> [2011-06-07 08:07:12.873410] I 
>> [glusterd-handler.c:391:glusterd_friend_find]
>> 0-glusterd: Unable to find hostname: 10.9.54.2
>> [2011-06-07 08:07:12.873438] I
>> [glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to 
>> find
>> peerinfo for host: 10.9.54.2 (24007)
>> [2011-06-07 08:07:12.875046] I 
>> [glusterd-handler.c:3404:glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> [2011-06-07 08:07:12.875280] I
>> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using 
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>>
>> There is no firewall issue:
>>
>> # telnet mckalcpap02 24007
>> Trying 10.9.54.2...
>> Connected to mckalcpap02.
>> Escape character is '^]'.
>>
>> Following restart (which crashes the other node) the log output is as
>> follows:
>>
>> [2011-06-07 08:10:09.616486] I [glusterd.c:564:init] 0-management:
Using
>> /etc/glusterd as working directory
>> [2011-06-07 08:10:09.617619] C [rdma.c:3933:rdma_init] 
>> 0-rpc-transport/rdma:
>> Failed to get IB devices
>> [2011-06-07 08:10:09.617676] E [rdma.c:4812:init] 0-rdma.management: 
>> Failed
>> to initialize IB Device
>> [2011-06-07 08:10:09.617700] E [rpc-transport.c:741:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2011-06-07 08:10:09.617724] W [rpcsvc.c:1288:rpcsvc_transport_create]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2011-06-07 08:10:09.617830] I [glusterd.c:88:glusterd_uuid_init]
>> 0-glusterd: retrieved UUID: 1e344f5d-6904-4d14-9be2-8f0f44b97dd7
>> [2011-06-07 08:10:11.258098] I 
>> [glusterd-handler.c:3404:glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> Given volfile:
>>
+------------------------------------------------------------------------------+
>>
>>    1: volume management
>>    2:     type mgmt/glusterd
>>    3:     option working-directory /etc/glusterd
>>    4:     option transport-type socket,rdma
>>    5:     option transport.socket.keepalive-time 10
>>    6:     option transport.socket.keepalive-interval 2
>>    7: end-volume
>>    8:
>>
>>
+------------------------------------------------------------------------------+
>>
>> [2011-06-07 08:10:11.258431] I
>> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using 
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>> [2011-06-07 08:10:11.280533] W 
>> [socket.c:1494:__socket_proto_state_machine]
>> 0-socket.management: reading from socket failed. Error (Transport 
>> endpoint
>> is not connected), peer (10.9.54.2:1023)
>> [2011-06-07 08:10:11.280595] W 
>> [socket.c:1494:__socket_proto_state_machine]
>> 0-management: reading from socket failed. Error (Transport endpoint 
>> is not
>> connected), peer (10.9.54.2:24007)
>> [2011-06-07 08:10:17.256235] E [socket.c:1685:socket_connect_finish]
>> 0-management: connection to 10.9.54.2:24007 failed (Connection refused)
>>
>> There are no logs on the node which crashes.
>>
>> I've tried various possibly solutions from searching the net but
got
>> getting
>> anywhere, can anyone advise how to proceed?
>>
>> Thanks,
>> Phil.
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Seemingly Similar Threads

Search for more maybe matching threads

Gluster users - Sep 2011 - Gluster-users Digest, Vol 41, Issue 16

[Gluster-users] Gluster-users Digest, Vol 41, Issue 16

[Gluster-users] Gluster-users Digest, Vol 41, Issue 16

[Gluster-users] Gluster-users Digest, Vol 41, Issue 16

Seemingly Similar Threads