thr3ads.net - Gluster users - [Gluster-users] Geo-rep failing [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Adrian Carpenter

2011-Jun-21 21:37 UTC

[Gluster-users] Geo-rep failing

I'm evaluating geo-replication  feature of 3.2.1,  and today two of the
volumes that I am replicating have both stopped with the same error )although
different files,  here are some logs:

File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py",
line 215, in crawl
    raise RuntimeError("timestamp corruption for " + path)
RuntimeError: timestamp corruption for
./x86_64/freesurfer4/freesurfer-4.0.5/mni/include/bicpl

I have deleted the file and copied it back from another source,  which appears
to fix the problem,  although of course in a test environment this is OK,  in
production it would be an issue.

Are there other suggestions to correct this problem?  

Adrian

p.s. the volume is replicated.

Anand Babu Periasamy

2011-Jun-27 21:57 UTC

head link

[Gluster-users] Geo-rep failing

On Tue, Jun 21, 2011 at 2:37 PM, Adrian Carpenter <tac12 at
wbic.cam.ac.uk>wrote:
> I'm evaluating geo-replication  feature of 3.2.1,  and today two of the
> volumes that I am replicating have both stopped with the same error
> )although different files,  here are some logs:
>
> File
>
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/master.py",
> line 215, in crawl
>    raise RuntimeError("timestamp corruption for " + path)
> RuntimeError: timestamp corruption for
> ./x86_64/freesurfer4/freesurfer-4.0.5/mni/include/bicpl
>
> I have deleted the file and copied it back from another source,  which
> appears to fix the problem,  although of course in a test environment this
> is OK,  in production it would be an issue.
>
> Are there other suggestions to correct this problem?
>
> Adrian
>
> p.s. the volume is replicated.
>
>Adrian,
Thanks for reporting this bug.  We will look in to this issue. This
exception should be trapped and healed appropriately. Will file a bug and
keep you in the loop.

Csaba, can you please followup.

-- 
Anand Babu Periasamy
Blog [http://www.unlocksmith.org]

Imagination is more important than knowledge --Albert Einstein
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110627/cc42af7a/attachment.html>

Csaba Henk

2011-Jun-30 14:27 UTC

head link

[Gluster-users] Geo-rep failing

It seems that the connection gets dropped (or not even able to
establish). Is the ssh auth set up properly from the second volume?

Csaba

On Thu, Jun 30, 2011 at 4:22 PM, Adrian Carpenter <tac12 at
wbic.cam.ac.uk> wrote:> Hi Csaba,
>
> I'm now seeing consistent errors with a second volume:
>
> [2011-06-30 06:08:48.299174] I [monitor(monitor):19:set_state] Monitor: new
state: OK
> [2011-06-30 09:27:46.875745] E [syncdutils:131:exception] <top>:
FAIL:
> Traceback (most recent call last):
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 152, in twrap
> ? ?tf(*aa)
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py",
line 118, in listen
> ? ?rid, exc, res = recv(self.inf)
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py",
line 42, in recv
> ? ?return pickle.load(inf)
> EOFError
> [2011-06-30 09:27:58.413588] I [monitor(monitor):42:monitor] Monitor:
------------------------------------------------------------
> [2011-06-30 09:27:58.413830] I [monitor(monitor):43:monitor] Monitor:
starting gsyncd worker
> [2011-06-30 09:27:58.479687] I [gsyncd:286:main_i] <top>: syncing:
gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 09:28:03.963303] I [master:181:crawl] GMaster: new master is
a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 09:28:03.963587] I [master:187:crawl] GMaster: primary master
with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
> [2011-06-30 09:34:35.592005] E [syncdutils:131:exception] <top>:
FAIL:
> Traceback (most recent call last):
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 152, in twrap
> ? ?tf(*aa)
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py",
line 118, in listen
> ? ?rid, exc, res = recv(self.inf)
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py",
line 42, in recv
> ? ?return pickle.load(inf)
> EOFError
> [2011-06-30 09:34:45.595258] I [monitor(monitor):42:monitor] Monitor:
------------------------------------------------------------
> [2011-06-30 09:34:45.595668] I [monitor(monitor):43:monitor] Monitor:
starting gsyncd worker
> [2011-06-30 09:34:45.661334] I [gsyncd:286:main_i] <top>: syncing:
gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 09:34:51.145607] I [master:181:crawl] GMaster: new master is
a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 09:34:51.145898] I [master:187:crawl] GMaster: primary master
with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
> [2011-06-30 12:35:54.394453] E [syncdutils:131:exception] <top>:
FAIL:
> Traceback (most recent call last):
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/syncdutils.py",
line 152, in twrap
> ? ?tf(*aa)
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py",
line 118, in listen
> ? ?rid, exc, res = recv(self.inf)
> ?File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/repce.py",
line 42, in recv
> ? ?return pickle.load(inf)
> UnpicklingError: invalid load key, '???'.
> [2011-06-30 12:36:05.839510] I [monitor(monitor):42:monitor] Monitor:
------------------------------------------------------------
> [2011-06-30 12:36:05.839916] I [monitor(monitor):43:monitor] Monitor:
starting gsyncd worker
> [2011-06-30 12:36:05.905232] I [gsyncd:286:main_i] <top>: syncing:
gluster://localhost:user-volume -> file:///geo-tank/user-volume
> [2011-06-30 12:36:11.413764] I [master:181:crawl] GMaster: new master is
a747062e-1caa-4cb3-9f86-34d03486a842
> [2011-06-30 12:36:11.414047] I [master:187:crawl] GMaster: primary master
with volume id a747062e-1caa-4cb3-9f86-34d03486a842 ...
>
>
> Adrian
> On 28 Jun 2011, at 11:16, Csaba Henk wrote:
>
>> Hi Adrian,
>>
>>
>> On Tue, Jun 28, 2011 at 12:04 PM, Adrian Carpenter <tac12 at
wbic.cam.ac.uk> wrote:
>>> Thanks Csaba,
>>>
>>> So far as I am aware nothing tampered with the xattrs, ?and all the
bricks etc are time synchronised. ?Anyway I did as you suggest, ?now for one
volume ?(I have three being geo-rep'd) I consistently get this:
>>>
>>> OSError: [Errno 12] Cannot allocate memory
>>
>> do you get this consistently, or randomly-but-recurring, or spotted
>> once/a few times then gone?
>>
>>> File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py",
line 26, in _query_xattr
>> ?cls.raise_oserr()
>>> File
"/opt/glusterfs/3.2.1/local/libexec/glusterfs/python/syncdaemon/libcxattr.py",
line 16, in raise_oserr
>> ?raise OSError(errn, os.strerror(errn))
>>> OSError: [Errno 12] Cannot allocate memory
>>
>> If seen more than once, how much does the stack trace vary? Exactly
>> the same, or not exactly but crashes in the same function (just on a
>> different code path), or not exactly but at least in libcxattr module,
>> or quite different?
>>
>> What python version do you use? If you use python 2.4.*, with external
>> ctypes, then what source you've taken ctypes from, what version?
>>
>> Thanks,
>> Csaba
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>

Gluster users - Jun 2011 - Geo-rep failing

[Gluster-users] Geo-rep failing

[Gluster-users] Geo-rep failing

[Gluster-users] Geo-rep failing