thr3ads.net - Gluster users - [Gluster-users] Mounting from fstab [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Sean Davis

2008-Dec-19 15:57 UTC

[Gluster-users] Mounting from fstab

I am running glusterfs on a bunch of linux (openSUSE) machines.  I
installed from source and have it running fine.  However, when I try
to add a mount to fstab, the system complains about glusterfs being an
unknown file system.  I probably missed a step in the installation or
some other detail.  Any pointers?

Thanks,
Sean

Harshavardhana Ranganath

2008-Dec-19 23:56 UTC

head link

[Gluster-users] Mounting from fstab

Hi Sean,

    Do you have mount.glusterfs properly installed in "/sbin"
directory and
please state the version of glusterfs
    you are using.

Regards
On Fri, Dec 19, 2008 at 9:27 PM, Sean Davis <sdavis2 at mail.nih.gov>
wrote:
> I am running glusterfs on a bunch of linux (openSUSE) machines.  I
> installed from source and have it running fine.  However, when I try
> to add a mount to fstab, the system complains about glusterfs being an
> unknown file system.  I probably missed a step in the installation or
> some other detail.  Any pointers?
>
> Thanks,
> Sean
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>


-- 
Harshavardhana
[y4m4 on #gluster at irc.freenode.net]
"Samudaya TantraShilpi"
Z Research Inc - http://www.zresearch.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081220/5d025921/attachment.html>

Sean Davis

2008-Dec-20 02:15 UTC

head link

[Gluster-users] Mounting from fstab

On Fri, Dec 19, 2008 at 6:56 PM, Harshavardhana Ranganath
<harsha at zresearch.com> wrote:> Hi Sean,
>
>     Do you have mount.glusterfs properly installed in "/sbin"
directory and
> please state the version of glusterfs
>     you are using.
Both you and Keith came to the same correct conclusion.  I do not have
mount.glusterfs in /sbin.  I am using 1.3.13 and mainly for testing,
so I will probably unmount things, uninstall, and then reinstall
1.4rcX.  It sounds like there are a number of new features that would
be worth testing.

Thanks to you and Keith for the help.

Sean
> Regards
> On Fri, Dec 19, 2008 at 9:27 PM, Sean Davis <sdavis2 at mail.nih.gov>
wrote:
>>
>> I am running glusterfs on a bunch of linux (openSUSE) machines.  I
>> installed from source and have it running fine.  However, when I try
>> to add a mount to fstab, the system complains about glusterfs being an
>> unknown file system.  I probably missed a step in the installation or
>> some other detail.  Any pointers?
>>
>> Thanks,
>> Sean
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>
>
> --
> Harshavardhana
> [y4m4 on #gluster at irc.freenode.net]
> "Samudaya TantraShilpi"
> Z Research Inc - http://www.zresearch.com
>
>

Keith Freedman

2008-Dec-20 11:09 UTC

head link

[Gluster-users] Mounting from fstab

Most definitely use 1.4 unless there''s a particular need to use 1.3 
(of which I can''t think of any).

it''s more efficient in nearly all aspects from what I can tell, 
especially with AFR.
It''s got many more features and seems to be quite a bit more stable 
(using the RC4 in my environment currently).

also, I''d look at the configure options and disable the features 
you''re sure you wont need (--disable-bdb, for example).  if
you''re
not sure what you don''t need, just go for the whole thing :)

Keith

At 06:15 PM 12/19/2008, Sean Davis wrote:>On Fri, Dec 19, 2008 at 6:56 PM, Harshavardhana Ranganath
><harsha at zresearch.com> wrote:
> > Hi Sean,
> >
> >     Do you have mount.glusterfs properly installed in
"/sbin" directory and
> > please state the version of glusterfs
> >     you are using.
>
>Both you and Keith came to the same correct conclusion.  I do not have
>mount.glusterfs in /sbin.  I am using 1.3.13 and mainly for testing,
>so I will probably unmount things, uninstall, and then reinstall
>1.4rcX.  It sounds like there are a number of new features that would
>be worth testing.
>
>Thanks to you and Keith for the help.
>
>Sean
>
> > Regards
> > On Fri, Dec 19, 2008 at 9:27 PM, Sean Davis <sdavis2 at
mail.nih.gov> wrote:
> >>
> >> I am running glusterfs on a bunch of linux (openSUSE) machines.  I
> >> installed from source and have it running fine.  However, when I
try
> >> to add a mount to fstab, the system complains about glusterfs
being an
> >> unknown file system.  I probably missed a step in the installation
or
> >> some other detail.  Any pointers?
> >>
> >> Thanks,
> >> Sean
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> >
> > --
> > Harshavardhana
> > [y4m4 on #gluster at irc.freenode.net]
> > "Samudaya TantraShilpi"
> > Z Research Inc - http://www.zresearch.com
> >
> >
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-20 11:09 UTC

head link

[Gluster-users] Mounting from fstab

Most definitely use 1.4 unless there''s a particular need to use 1.3 
(of which I can''t think of any).

it''s more efficient in nearly all aspects from what I can tell, 
especially with AFR.
It''s got many more features and seems to be quite a bit more stable 
(using the RC4 in my environment currently).

also, I''d look at the configure options and disable the features 
you''re sure you wont need (--disable-bdb, for example).  if
you''re
not sure what you don''t need, just go for the whole thing :)

Keith

At 06:15 PM 12/19/2008, Sean Davis wrote:>On Fri, Dec 19, 2008 at 6:56 PM, Harshavardhana Ranganath
><harsha at zresearch.com> wrote:
> > Hi Sean,
> >
> >     Do you have mount.glusterfs properly installed in
"/sbin" directory and
> > please state the version of glusterfs
> >     you are using.
>
>Both you and Keith came to the same correct conclusion.  I do not have
>mount.glusterfs in /sbin.  I am using 1.3.13 and mainly for testing,
>so I will probably unmount things, uninstall, and then reinstall
>1.4rcX.  It sounds like there are a number of new features that would
>be worth testing.
>
>Thanks to you and Keith for the help.
>
>Sean
>
> > Regards
> > On Fri, Dec 19, 2008 at 9:27 PM, Sean Davis <sdavis2 at
mail.nih.gov> wrote:
> >>
> >> I am running glusterfs on a bunch of linux (openSUSE) machines.  I
> >> installed from source and have it running fine.  However, when I
try
> >> to add a mount to fstab, the system complains about glusterfs
being an
> >> unknown file system.  I probably missed a step in the installation
or
> >> some other detail.  Any pointers?
> >>
> >> Thanks,
> >> Sean
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> >
> > --
> > Harshavardhana
> > [y4m4 on #gluster at irc.freenode.net]
> > "Samudaya TantraShilpi"
> > Z Research Inc - http://www.zresearch.com
> >
> >
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-20 11:09 UTC

head link

[Gluster-users] Mounting from fstab

Most definitely use 1.4 unless there''s a particular need to use 1.3 
(of which I can''t think of any).

it''s more efficient in nearly all aspects from what I can tell, 
especially with AFR.
It''s got many more features and seems to be quite a bit more stable 
(using the RC4 in my environment currently).

also, I''d look at the configure options and disable the features 
you''re sure you wont need (--disable-bdb, for example).  if
you''re
not sure what you don''t need, just go for the whole thing :)

Keith

At 06:15 PM 12/19/2008, Sean Davis wrote:>On Fri, Dec 19, 2008 at 6:56 PM, Harshavardhana Ranganath
><harsha at zresearch.com> wrote:
> > Hi Sean,
> >
> >     Do you have mount.glusterfs properly installed in
"/sbin" directory and
> > please state the version of glusterfs
> >     you are using.
>
>Both you and Keith came to the same correct conclusion.  I do not have
>mount.glusterfs in /sbin.  I am using 1.3.13 and mainly for testing,
>so I will probably unmount things, uninstall, and then reinstall
>1.4rcX.  It sounds like there are a number of new features that would
>be worth testing.
>
>Thanks to you and Keith for the help.
>
>Sean
>
> > Regards
> > On Fri, Dec 19, 2008 at 9:27 PM, Sean Davis <sdavis2 at
mail.nih.gov> wrote:
> >>
> >> I am running glusterfs on a bunch of linux (openSUSE) machines.  I
> >> installed from source and have it running fine.  However, when I
try
> >> to add a mount to fstab, the system complains about glusterfs
being an
> >> unknown file system.  I probably missed a step in the installation
or
> >> some other detail.  Any pointers?
> >>
> >> Thanks,
> >> Sean
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
> >
> >
> >
> > --
> > Harshavardhana
> > [y4m4 on #gluster at irc.freenode.net]
> > "Samudaya TantraShilpi"
> > Z Research Inc - http://www.zresearch.com
> >
> >
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Sean Davis

2008-Dec-20 19:30 UTC

head link

[Gluster-users] Mounting from fstab

On Sat, Dec 20, 2008 at 6:09 AM, Keith Freedman <freedman at
freeformit.com> wrote:> Most definitely use 1.4 unless there's a particular need to use 1.3 (of
> which I can't think of any).
>
> it's more efficient in nearly all aspects from what I can tell,
especially
> with AFR.
> It's got many more features and seems to be quite a bit more stable
(using
> the RC4 in my environment currently).
>
> also, I'd look at the configure options and disable the features
you're sure
> you wont need (--disable-bdb, for example).  if you're not sure what
you
> don't need, just go for the whole thing :)
I have installed 1.4rc6.  No compile errors, etc.  However, I still do
not have mount.glusterfs in /sbin.  I used a custom install location
(--prefix=/usr/local) as it is a shared file system for the cluster.
Can I simply copy the mount.glusterfs file to the various /sbin
locations for each machine and expect things to work?

Sean

> At 06:15 PM 12/19/2008, Sean Davis wrote:
>>
>> On Fri, Dec 19, 2008 at 6:56 PM, Harshavardhana Ranganath
>> <harsha at zresearch.com> wrote:
>> > Hi Sean,
>> >
>> >     Do you have mount.glusterfs properly installed in
"/sbin" directory
>> > and
>> > please state the version of glusterfs
>> >     you are using.
>>
>> Both you and Keith came to the same correct conclusion.  I do not have
>> mount.glusterfs in /sbin.  I am using 1.3.13 and mainly for testing,
>> so I will probably unmount things, uninstall, and then reinstall
>> 1.4rcX.  It sounds like there are a number of new features that would
>> be worth testing.
>>
>> Thanks to you and Keith for the help.
>>
>> Sean
>>
>> > Regards
>> > On Fri, Dec 19, 2008 at 9:27 PM, Sean Davis <sdavis2 at
mail.nih.gov>
>> > wrote:
>> >>
>> >> I am running glusterfs on a bunch of linux (openSUSE)
machines.  I
>> >> installed from source and have it running fine.  However, when
I try
>> >> to add a mount to fstab, the system complains about glusterfs
being an
>> >> unknown file system.  I probably missed a step in the
installation or
>> >> some other detail.  Any pointers?
>> >>
>> >> Thanks,
>> >> Sean
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>> >
>> >
>> >
>> > --
>> > Harshavardhana
>> > [y4m4 on #gluster at irc.freenode.net]
>> > "Samudaya TantraShilpi"
>> > Z Research Inc - http://www.zresearch.com
>> >
>> >
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>
>

Keith Freedman

2008-Dec-21 08:41 UTC

head link

[Gluster-users] Mounting from fstab

At 11:30 AM 12/20/2008, Sean Davis wrote:>On Sat, Dec 20, 2008 at 6:09 AM, Keith Freedman 
><freedman at freeformit.com> wrote:
> > Most definitely use 1.4 unless there''s a particular need to
use 1.3 (of
> > which I can''t think of any).
> >
> > it''s more efficient in nearly all aspects from what I can
tell, especially
> > with AFR.
> > It''s got many more features and seems to be quite a bit more
stable (using
> > the RC4 in my environment currently).
> >
> > also, I''d look at the configure options and disable the
features
> you''re sure
> > you wont need (--disable-bdb, for example).  if you''re not
sure what you
> > don''t need, just go for the whole thing :)
>
>I have installed 1.4rc6.  No compile errors, etc.  However, I still do
>not have mount.glusterfs in /sbin.  I used a custom install location
>(--prefix=/usr/local) as it is a shared file system for the cluster.
>Can I simply copy the mount.glusterfs file to the various /sbin
>locations for each machine and expect things to work?
yeah, it''s probably in /usr/local/sbin
I''d either copy it or put a hard link.
test it on one and see if that solves the problem.

Keith Freedman

2008-Dec-21 08:41 UTC

head link

[Gluster-users] Mounting from fstab

At 11:30 AM 12/20/2008, Sean Davis wrote:>On Sat, Dec 20, 2008 at 6:09 AM, Keith Freedman 
><freedman at freeformit.com> wrote:
> > Most definitely use 1.4 unless there''s a particular need to
use 1.3 (of
> > which I can''t think of any).
> >
> > it''s more efficient in nearly all aspects from what I can
tell, especially
> > with AFR.
> > It''s got many more features and seems to be quite a bit more
stable (using
> > the RC4 in my environment currently).
> >
> > also, I''d look at the configure options and disable the
features
> you''re sure
> > you wont need (--disable-bdb, for example).  if you''re not
sure what you
> > don''t need, just go for the whole thing :)
>
>I have installed 1.4rc6.  No compile errors, etc.  However, I still do
>not have mount.glusterfs in /sbin.  I used a custom install location
>(--prefix=/usr/local) as it is a shared file system for the cluster.
>Can I simply copy the mount.glusterfs file to the various /sbin
>locations for each machine and expect things to work?
yeah, it''s probably in /usr/local/sbin
I''d either copy it or put a hard link.
test it on one and see if that solves the problem.

Keith Freedman

2008-Dec-21 08:41 UTC

head link

[Gluster-users] Mounting from fstab

At 11:30 AM 12/20/2008, Sean Davis wrote:>On Sat, Dec 20, 2008 at 6:09 AM, Keith Freedman 
><freedman at freeformit.com> wrote:
> > Most definitely use 1.4 unless there''s a particular need to
use 1.3 (of
> > which I can''t think of any).
> >
> > it''s more efficient in nearly all aspects from what I can
tell, especially
> > with AFR.
> > It''s got many more features and seems to be quite a bit more
stable (using
> > the RC4 in my environment currently).
> >
> > also, I''d look at the configure options and disable the
features
> you''re sure
> > you wont need (--disable-bdb, for example).  if you''re not
sure what you
> > don''t need, just go for the whole thing :)
>
>I have installed 1.4rc6.  No compile errors, etc.  However, I still do
>not have mount.glusterfs in /sbin.  I used a custom install location
>(--prefix=/usr/local) as it is a shared file system for the cluster.
>Can I simply copy the mount.glusterfs file to the various /sbin
>locations for each machine and expect things to work?
yeah, it''s probably in /usr/local/sbin
I''d either copy it or put a hard link.
test it on one and see if that solves the problem.

Sean Davis

2008-Dec-21 14:10 UTC

head link

[Gluster-users] Mounting from fstab

On Sun, Dec 21, 2008 at 3:41 AM, Keith Freedman <freedman at
freeformit.com> wrote:> At 11:30 AM 12/20/2008, Sean Davis wrote:
>>
>> On Sat, Dec 20, 2008 at 6:09 AM, Keith Freedman <freedman at
freeformit.com>
>> wrote:
>> > Most definitely use 1.4 unless there's a particular need to
use 1.3 (of
>> > which I can't think of any).
>> >
>> > it's more efficient in nearly all aspects from what I can
tell,
>> > especially
>> > with AFR.
>> > It's got many more features and seems to be quite a bit more
stable
>> > (using
>> > the RC4 in my environment currently).
>> >
>> > also, I'd look at the configure options and disable the
features you're
>> > sure
>> > you wont need (--disable-bdb, for example).  if you're not
sure what you
>> > don't need, just go for the whole thing :)
>>
>> I have installed 1.4rc6.  No compile errors, etc.  However, I still do
>> not have mount.glusterfs in /sbin.  I used a custom install location
>> (--prefix=/usr/local) as it is a shared file system for the cluster.
>> Can I simply copy the mount.glusterfs file to the various /sbin
>> locations for each machine and expect things to work?
>
> yeah, it's probably in /usr/local/sbin
> I'd either copy it or put a hard link.
> test it on one and see if that solves the problem.
That's the weird thing.  It isn't there.  The only place I find it is
in the original src directory.  It did get made, but never got copied
to ANY destination, it appears.  I'll just link out from there--not a
problem.  I'm no expert on makefiles, but I guess I can look through
there to see if there is anything funny.

Thanks again,
Sean

Keith Freedman

2008-Dec-23 12:43 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

here''s the backtrace info from 2 of my crashes:  a logfile excerpt is 
at the end
(gdb) bt
#0  0x0000000000e6dbf2 in afr_truncate_wind (frame=0x7fada8ad9330,
     this=0xe6e770) at afr-inode-write.c:1145
#1  0x0000000000e72c7d in afr_write_pending_pre_op_cbk (frame=0x7fada8ad9330,
     cookie=0x8, this=0x185e740, op_ret=<value optimized out>,
     op_errno=<value optimized out>, xattr=<value optimized out>)
     at afr-transaction.c:431
#2  0x00000000001212e0 in default_xattrop_cbk (frame=<value optimized
out>,
     cookie=<value optimized out>, this=<value optimized out>,
op_ret=0,
     op_errno=-1465023696, dict=0xe72b30) at defaults.c:1015
#3  0x000000000060edb0 in posix_xattrop (frame=0x7fada8ad8f10, this=0x1857920,
     loc=0x7fada8ad96d0, optype=GF_XATTROP_ADD_ARRAY, xattr=0x7fada8ada440)
     at posix.c:2474
#4  0x0000000000122090 in default_xattrop (frame=0x7fada8ad79a0,
     this=0x185c9d0, loc=0x7fada8ad96d0, flags=GF_XATTROP_ADD_ARRAY,
     dict=0x7fada8ada440) at defaults.c:1026
#5  0x0000000000e7374b in afr_write_pending_pre_op (frame=0x7fada8ad9330,
     this=0x185e740) at afr-transaction.c:494
#6  0x0000000000e73985 in afr_lock_rec (frame=0x7fada8ad9330, this=0x185e740,
     child_index=2) at afr-transaction.c:690
#7  0x0000000000e74044 in afr_lock_cbk (frame=0x7fada8ad9330,
     cookie=<value optimized out>, this=0x185e740,
     op_ret=<value optimized out>, op_errno=0) at afr-transaction.c:617
#8  0x000000000081ed2c in pl_inodelk (frame=0x7fada8ad88d0, this=0x185c9d0,
     loc=<value optimized out>, cmd=7, flock=0x412e6e70) at internal.c:157
#9  0x0000000000e73d4b in afr_lock_rec (frame=0x7fada8ad9330,
     this=<value optimized out>, child_index=0) at afr-transaction.c:709
#10 0x0000000000e73f40 in afr_transaction (frame=0x7fada8ad9330,
     this=0x185e740, type=AFR_DATA_TRANSACTION) at afr-transaction.c:856
#11 0x0000000000e6f062 in afr_truncate (frame=0x7fada8ada480, this=0x185e740,
     loc=<value optimized out>, offset=0) at afr-inode-write.c:1229
#12 0x00000000018d6bc0 in fuse_setattr (req=<value optimized out>,
     ino=<value optimized out>, attr=0x412e7000, valid=<value optimized
out>,
     fi=<value optimized out>) at fuse-bridge.c:810
#13 0x0000000001099173 in do_setattr (req=0x7fada8ad91f0, nodeid=214525249896,
     inarg=<value optimized out>) at fuse_lowlevel.c:486
#14 0x00000000018d7d35 in fuse_thread_proc (data=0x185f070)
     at fuse-bridge.c:2506
#15 0x00000031f360729a in start_thread () from /lib64/libpthread.so.0
#16 0x00000031f2ae439d in clone () from /lib64/libc.so.6



(gdb) bt
#0  0x0000000000e6dbf2 in afr_truncate_wind (frame=0x1917520, this=0xe6e770)
     at afr-inode-write.c:1145
#1  0x0000000000e72c7d in afr_write_pending_pre_op_cbk (frame=0x1917520,
     cookie=0x8, this=0x1718740, op_ret=<value optimized out>,
     op_errno=<value optimized out>, xattr=<value optimized out>)
     at afr-transaction.c:431
#2  0x00000000001212e0 in default_xattrop_cbk (frame=<value optimized
out>,
     cookie=<value optimized out>, this=<value optimized out>,
op_ret=0,
     op_errno=26304544, dict=0x0) at defaults.c:1015
#3  0x000000000060edb0 in posix_xattrop (frame=0x19176e0, this=0x1711920,
     loc=0x1918520, optype=GF_XATTROP_ADD_ARRAY, xattr=0x1917610)
     at posix.c:2474
#4  0x0000000000122090 in default_xattrop (frame=0x1915f60, this=0x17169d0,
     loc=0x1918520, flags=GF_XATTROP_ADD_ARRAY, dict=0x1917610)
     at defaults.c:1026
#5  0x0000000000e7374b in afr_write_pending_pre_op (frame=0x1917520,
     this=0x1718740) at afr-transaction.c:494
#6  0x0000000000e73985 in afr_lock_rec (frame=0x1917520, this=0x1718740,
     child_index=2) at afr-transaction.c:690
#7  0x0000000000e74044 in afr_lock_cbk (frame=0x1917520,
     cookie=<value optimized out>, this=0x1718740,
     op_ret=<value optimized out>, op_errno=0) at afr-transaction.c:617
#8  0x000000000081ed2c in pl_inodelk (frame=0x1915b00, this=0x17169d0,
     loc=<value optimized out>, cmd=7, flock=0x42f4be70) at internal.c:157
#9  0x0000000000e73d4b in afr_lock_rec (frame=0x1917520,
     this=<value optimized out>, child_index=0) at afr-transaction.c:709
#10 0x0000000000e73f40 in afr_transaction (frame=0x1917520, this=0x1718740,
     type=AFR_DATA_TRANSACTION) at afr-transaction.c:856
#11 0x0000000000e6f062 in afr_truncate (frame=0x19183c0, this=0x1718740,
     loc=<value optimized out>, offset=0) at afr-inode-write.c:1229
#12 0x0000000005f00bc0 in fuse_setattr (req=<value optimized out>,
     ino=<value optimized out>, attr=0x42f4c000, valid=<value optimized
out>,
     fi=<value optimized out>) at fuse-bridge.c:810
#13 0x00000000039d7173 in do_setattr (req=0x19179f0, nodeid=214525249896,
     inarg=<value optimized out>) at fuse_lowlevel.c:486
#14 0x0000000005f01d35 in fuse_thread_proc (data=0x1719070)
     at fuse-bridge.c:2506
#15 0x00000031f360729a in start_thread () from /lib64/libpthread.so.0
#16 0x00000031f2ae439d in clone () from /lib64/libc.so.6




logfile excerpt:
  60: #end-volume
+-----
2008-12-23 00:28:38 E [socket.c:708:socket_connect_finish] home2: 
connection failed (Connection timed out)
pending frames:

Signal received: 11
configuration details:argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
tv_nsec 1
package-string: glusterfs 1.4.0rc6
/lib64/libc.so.6[0x31f2a322a0]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_truncate_wind+0x72)[0xe6dbf2]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_write_pending_pre_op_cbk+0xcd)[0xe72c7d]
/usr/local/lib/libglusterfs.so.0(default_xattrop_cbk+0x20)[0x1212e0]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/storage/posix.so(posix_xattrop+0x1e0)[0x60edb0]
/usr/local/lib/libglusterfs.so.0(default_xattrop+0xc0)[0x122090]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_write_pending_pre_op+0x4fb)[0xe7374b]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so[0xe73985]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_lock_cbk+0xa4)[0xe74044]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/features/posix-locks.so(pl_inodelk+0x11c)[0x81ed2c]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so[0xe73d4b]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_transaction+0x110)[0xe73f40]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_truncate+0x1f2)[0xe6f062]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/mount/fuse.so[0x6753bc0]
/usr/local/lib/libfuse.so.2[0x1099173]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/mount/fuse.so[0x6754d35]
/lib64/libpthread.so.0[0x31f360729a]
/lib64/libc.so.6(clone+0x6d)[0x31f2ae439d]
---------

Version      : glusterfs 1.4.0rc6 built on Dec 23 2008 00:22:39
TLA Revision : glusterfs--mainline--3.0--patch-792
Starting Time: 2008-12-23 00:41:14
Command line : /usr/local/sbin/glusterfs --log-level=WARNING 
--volfile=/etc/glusterfs/glusterfs-home.vol /home
given volfile
+-----

Keith Freedman

2008-Dec-23 12:43 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

here''s the backtrace info from 2 of my crashes:  a logfile excerpt is 
at the end
(gdb) bt
#0  0x0000000000e6dbf2 in afr_truncate_wind (frame=0x7fada8ad9330,
     this=0xe6e770) at afr-inode-write.c:1145
#1  0x0000000000e72c7d in afr_write_pending_pre_op_cbk (frame=0x7fada8ad9330,
     cookie=0x8, this=0x185e740, op_ret=<value optimized out>,
     op_errno=<value optimized out>, xattr=<value optimized out>)
     at afr-transaction.c:431
#2  0x00000000001212e0 in default_xattrop_cbk (frame=<value optimized
out>,
     cookie=<value optimized out>, this=<value optimized out>,
op_ret=0,
     op_errno=-1465023696, dict=0xe72b30) at defaults.c:1015
#3  0x000000000060edb0 in posix_xattrop (frame=0x7fada8ad8f10, this=0x1857920,
     loc=0x7fada8ad96d0, optype=GF_XATTROP_ADD_ARRAY, xattr=0x7fada8ada440)
     at posix.c:2474
#4  0x0000000000122090 in default_xattrop (frame=0x7fada8ad79a0,
     this=0x185c9d0, loc=0x7fada8ad96d0, flags=GF_XATTROP_ADD_ARRAY,
     dict=0x7fada8ada440) at defaults.c:1026
#5  0x0000000000e7374b in afr_write_pending_pre_op (frame=0x7fada8ad9330,
     this=0x185e740) at afr-transaction.c:494
#6  0x0000000000e73985 in afr_lock_rec (frame=0x7fada8ad9330, this=0x185e740,
     child_index=2) at afr-transaction.c:690
#7  0x0000000000e74044 in afr_lock_cbk (frame=0x7fada8ad9330,
     cookie=<value optimized out>, this=0x185e740,
     op_ret=<value optimized out>, op_errno=0) at afr-transaction.c:617
#8  0x000000000081ed2c in pl_inodelk (frame=0x7fada8ad88d0, this=0x185c9d0,
     loc=<value optimized out>, cmd=7, flock=0x412e6e70) at internal.c:157
#9  0x0000000000e73d4b in afr_lock_rec (frame=0x7fada8ad9330,
     this=<value optimized out>, child_index=0) at afr-transaction.c:709
#10 0x0000000000e73f40 in afr_transaction (frame=0x7fada8ad9330,
     this=0x185e740, type=AFR_DATA_TRANSACTION) at afr-transaction.c:856
#11 0x0000000000e6f062 in afr_truncate (frame=0x7fada8ada480, this=0x185e740,
     loc=<value optimized out>, offset=0) at afr-inode-write.c:1229
#12 0x00000000018d6bc0 in fuse_setattr (req=<value optimized out>,
     ino=<value optimized out>, attr=0x412e7000, valid=<value optimized
out>,
     fi=<value optimized out>) at fuse-bridge.c:810
#13 0x0000000001099173 in do_setattr (req=0x7fada8ad91f0, nodeid=214525249896,
     inarg=<value optimized out>) at fuse_lowlevel.c:486
#14 0x00000000018d7d35 in fuse_thread_proc (data=0x185f070)
     at fuse-bridge.c:2506
#15 0x00000031f360729a in start_thread () from /lib64/libpthread.so.0
#16 0x00000031f2ae439d in clone () from /lib64/libc.so.6



(gdb) bt
#0  0x0000000000e6dbf2 in afr_truncate_wind (frame=0x1917520, this=0xe6e770)
     at afr-inode-write.c:1145
#1  0x0000000000e72c7d in afr_write_pending_pre_op_cbk (frame=0x1917520,
     cookie=0x8, this=0x1718740, op_ret=<value optimized out>,
     op_errno=<value optimized out>, xattr=<value optimized out>)
     at afr-transaction.c:431
#2  0x00000000001212e0 in default_xattrop_cbk (frame=<value optimized
out>,
     cookie=<value optimized out>, this=<value optimized out>,
op_ret=0,
     op_errno=26304544, dict=0x0) at defaults.c:1015
#3  0x000000000060edb0 in posix_xattrop (frame=0x19176e0, this=0x1711920,
     loc=0x1918520, optype=GF_XATTROP_ADD_ARRAY, xattr=0x1917610)
     at posix.c:2474
#4  0x0000000000122090 in default_xattrop (frame=0x1915f60, this=0x17169d0,
     loc=0x1918520, flags=GF_XATTROP_ADD_ARRAY, dict=0x1917610)
     at defaults.c:1026
#5  0x0000000000e7374b in afr_write_pending_pre_op (frame=0x1917520,
     this=0x1718740) at afr-transaction.c:494
#6  0x0000000000e73985 in afr_lock_rec (frame=0x1917520, this=0x1718740,
     child_index=2) at afr-transaction.c:690
#7  0x0000000000e74044 in afr_lock_cbk (frame=0x1917520,
     cookie=<value optimized out>, this=0x1718740,
     op_ret=<value optimized out>, op_errno=0) at afr-transaction.c:617
#8  0x000000000081ed2c in pl_inodelk (frame=0x1915b00, this=0x17169d0,
     loc=<value optimized out>, cmd=7, flock=0x42f4be70) at internal.c:157
#9  0x0000000000e73d4b in afr_lock_rec (frame=0x1917520,
     this=<value optimized out>, child_index=0) at afr-transaction.c:709
#10 0x0000000000e73f40 in afr_transaction (frame=0x1917520, this=0x1718740,
     type=AFR_DATA_TRANSACTION) at afr-transaction.c:856
#11 0x0000000000e6f062 in afr_truncate (frame=0x19183c0, this=0x1718740,
     loc=<value optimized out>, offset=0) at afr-inode-write.c:1229
#12 0x0000000005f00bc0 in fuse_setattr (req=<value optimized out>,
     ino=<value optimized out>, attr=0x42f4c000, valid=<value optimized
out>,
     fi=<value optimized out>) at fuse-bridge.c:810
#13 0x00000000039d7173 in do_setattr (req=0x19179f0, nodeid=214525249896,
     inarg=<value optimized out>) at fuse_lowlevel.c:486
#14 0x0000000005f01d35 in fuse_thread_proc (data=0x1719070)
     at fuse-bridge.c:2506
#15 0x00000031f360729a in start_thread () from /lib64/libpthread.so.0
#16 0x00000031f2ae439d in clone () from /lib64/libc.so.6




logfile excerpt:
  60: #end-volume
+-----
2008-12-23 00:28:38 E [socket.c:708:socket_connect_finish] home2: 
connection failed (Connection timed out)
pending frames:

Signal received: 11
configuration details:argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
tv_nsec 1
package-string: glusterfs 1.4.0rc6
/lib64/libc.so.6[0x31f2a322a0]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_truncate_wind+0x72)[0xe6dbf2]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_write_pending_pre_op_cbk+0xcd)[0xe72c7d]
/usr/local/lib/libglusterfs.so.0(default_xattrop_cbk+0x20)[0x1212e0]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/storage/posix.so(posix_xattrop+0x1e0)[0x60edb0]
/usr/local/lib/libglusterfs.so.0(default_xattrop+0xc0)[0x122090]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_write_pending_pre_op+0x4fb)[0xe7374b]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so[0xe73985]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_lock_cbk+0xa4)[0xe74044]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/features/posix-locks.so(pl_inodelk+0x11c)[0x81ed2c]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so[0xe73d4b]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_transaction+0x110)[0xe73f40]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_truncate+0x1f2)[0xe6f062]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/mount/fuse.so[0x6753bc0]
/usr/local/lib/libfuse.so.2[0x1099173]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/mount/fuse.so[0x6754d35]
/lib64/libpthread.so.0[0x31f360729a]
/lib64/libc.so.6(clone+0x6d)[0x31f2ae439d]
---------

Version      : glusterfs 1.4.0rc6 built on Dec 23 2008 00:22:39
TLA Revision : glusterfs--mainline--3.0--patch-792
Starting Time: 2008-12-23 00:41:14
Command line : /usr/local/sbin/glusterfs --log-level=WARNING 
--volfile=/etc/glusterfs/glusterfs-home.vol /home
given volfile
+-----

Keith Freedman

2008-Dec-23 12:43 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

here''s the backtrace info from 2 of my crashes:  a logfile excerpt is 
at the end
(gdb) bt
#0  0x0000000000e6dbf2 in afr_truncate_wind (frame=0x7fada8ad9330,
     this=0xe6e770) at afr-inode-write.c:1145
#1  0x0000000000e72c7d in afr_write_pending_pre_op_cbk (frame=0x7fada8ad9330,
     cookie=0x8, this=0x185e740, op_ret=<value optimized out>,
     op_errno=<value optimized out>, xattr=<value optimized out>)
     at afr-transaction.c:431
#2  0x00000000001212e0 in default_xattrop_cbk (frame=<value optimized
out>,
     cookie=<value optimized out>, this=<value optimized out>,
op_ret=0,
     op_errno=-1465023696, dict=0xe72b30) at defaults.c:1015
#3  0x000000000060edb0 in posix_xattrop (frame=0x7fada8ad8f10, this=0x1857920,
     loc=0x7fada8ad96d0, optype=GF_XATTROP_ADD_ARRAY, xattr=0x7fada8ada440)
     at posix.c:2474
#4  0x0000000000122090 in default_xattrop (frame=0x7fada8ad79a0,
     this=0x185c9d0, loc=0x7fada8ad96d0, flags=GF_XATTROP_ADD_ARRAY,
     dict=0x7fada8ada440) at defaults.c:1026
#5  0x0000000000e7374b in afr_write_pending_pre_op (frame=0x7fada8ad9330,
     this=0x185e740) at afr-transaction.c:494
#6  0x0000000000e73985 in afr_lock_rec (frame=0x7fada8ad9330, this=0x185e740,
     child_index=2) at afr-transaction.c:690
#7  0x0000000000e74044 in afr_lock_cbk (frame=0x7fada8ad9330,
     cookie=<value optimized out>, this=0x185e740,
     op_ret=<value optimized out>, op_errno=0) at afr-transaction.c:617
#8  0x000000000081ed2c in pl_inodelk (frame=0x7fada8ad88d0, this=0x185c9d0,
     loc=<value optimized out>, cmd=7, flock=0x412e6e70) at internal.c:157
#9  0x0000000000e73d4b in afr_lock_rec (frame=0x7fada8ad9330,
     this=<value optimized out>, child_index=0) at afr-transaction.c:709
#10 0x0000000000e73f40 in afr_transaction (frame=0x7fada8ad9330,
     this=0x185e740, type=AFR_DATA_TRANSACTION) at afr-transaction.c:856
#11 0x0000000000e6f062 in afr_truncate (frame=0x7fada8ada480, this=0x185e740,
     loc=<value optimized out>, offset=0) at afr-inode-write.c:1229
#12 0x00000000018d6bc0 in fuse_setattr (req=<value optimized out>,
     ino=<value optimized out>, attr=0x412e7000, valid=<value optimized
out>,
     fi=<value optimized out>) at fuse-bridge.c:810
#13 0x0000000001099173 in do_setattr (req=0x7fada8ad91f0, nodeid=214525249896,
     inarg=<value optimized out>) at fuse_lowlevel.c:486
#14 0x00000000018d7d35 in fuse_thread_proc (data=0x185f070)
     at fuse-bridge.c:2506
#15 0x00000031f360729a in start_thread () from /lib64/libpthread.so.0
#16 0x00000031f2ae439d in clone () from /lib64/libc.so.6



(gdb) bt
#0  0x0000000000e6dbf2 in afr_truncate_wind (frame=0x1917520, this=0xe6e770)
     at afr-inode-write.c:1145
#1  0x0000000000e72c7d in afr_write_pending_pre_op_cbk (frame=0x1917520,
     cookie=0x8, this=0x1718740, op_ret=<value optimized out>,
     op_errno=<value optimized out>, xattr=<value optimized out>)
     at afr-transaction.c:431
#2  0x00000000001212e0 in default_xattrop_cbk (frame=<value optimized
out>,
     cookie=<value optimized out>, this=<value optimized out>,
op_ret=0,
     op_errno=26304544, dict=0x0) at defaults.c:1015
#3  0x000000000060edb0 in posix_xattrop (frame=0x19176e0, this=0x1711920,
     loc=0x1918520, optype=GF_XATTROP_ADD_ARRAY, xattr=0x1917610)
     at posix.c:2474
#4  0x0000000000122090 in default_xattrop (frame=0x1915f60, this=0x17169d0,
     loc=0x1918520, flags=GF_XATTROP_ADD_ARRAY, dict=0x1917610)
     at defaults.c:1026
#5  0x0000000000e7374b in afr_write_pending_pre_op (frame=0x1917520,
     this=0x1718740) at afr-transaction.c:494
#6  0x0000000000e73985 in afr_lock_rec (frame=0x1917520, this=0x1718740,
     child_index=2) at afr-transaction.c:690
#7  0x0000000000e74044 in afr_lock_cbk (frame=0x1917520,
     cookie=<value optimized out>, this=0x1718740,
     op_ret=<value optimized out>, op_errno=0) at afr-transaction.c:617
#8  0x000000000081ed2c in pl_inodelk (frame=0x1915b00, this=0x17169d0,
     loc=<value optimized out>, cmd=7, flock=0x42f4be70) at internal.c:157
#9  0x0000000000e73d4b in afr_lock_rec (frame=0x1917520,
     this=<value optimized out>, child_index=0) at afr-transaction.c:709
#10 0x0000000000e73f40 in afr_transaction (frame=0x1917520, this=0x1718740,
     type=AFR_DATA_TRANSACTION) at afr-transaction.c:856
#11 0x0000000000e6f062 in afr_truncate (frame=0x19183c0, this=0x1718740,
     loc=<value optimized out>, offset=0) at afr-inode-write.c:1229
#12 0x0000000005f00bc0 in fuse_setattr (req=<value optimized out>,
     ino=<value optimized out>, attr=0x42f4c000, valid=<value optimized
out>,
     fi=<value optimized out>) at fuse-bridge.c:810
#13 0x00000000039d7173 in do_setattr (req=0x19179f0, nodeid=214525249896,
     inarg=<value optimized out>) at fuse_lowlevel.c:486
#14 0x0000000005f01d35 in fuse_thread_proc (data=0x1719070)
     at fuse-bridge.c:2506
#15 0x00000031f360729a in start_thread () from /lib64/libpthread.so.0
#16 0x00000031f2ae439d in clone () from /lib64/libc.so.6




logfile excerpt:
  60: #end-volume
+-----
2008-12-23 00:28:38 E [socket.c:708:socket_connect_finish] home2: 
connection failed (Connection timed out)
pending frames:

Signal received: 11
configuration details:argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
tv_nsec 1
package-string: glusterfs 1.4.0rc6
/lib64/libc.so.6[0x31f2a322a0]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_truncate_wind+0x72)[0xe6dbf2]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_write_pending_pre_op_cbk+0xcd)[0xe72c7d]
/usr/local/lib/libglusterfs.so.0(default_xattrop_cbk+0x20)[0x1212e0]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/storage/posix.so(posix_xattrop+0x1e0)[0x60edb0]
/usr/local/lib/libglusterfs.so.0(default_xattrop+0xc0)[0x122090]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_write_pending_pre_op+0x4fb)[0xe7374b]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so[0xe73985]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_lock_cbk+0xa4)[0xe74044]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/features/posix-locks.so(pl_inodelk+0x11c)[0x81ed2c]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so[0xe73d4b]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_transaction+0x110)[0xe73f40]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/cluster/afr.so(afr_truncate+0x1f2)[0xe6f062]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/mount/fuse.so[0x6753bc0]
/usr/local/lib/libfuse.so.2[0x1099173]
/usr/local/lib/glusterfs/1.4.0rc6/xlator/mount/fuse.so[0x6754d35]
/lib64/libpthread.so.0[0x31f360729a]
/lib64/libc.so.6(clone+0x6d)[0x31f2ae439d]
---------

Version      : glusterfs 1.4.0rc6 built on Dec 23 2008 00:22:39
TLA Revision : glusterfs--mainline--3.0--patch-792
Starting Time: 2008-12-23 00:41:14
Command line : /usr/local/sbin/glusterfs --log-level=WARNING 
--volfile=/etc/glusterfs/glusterfs-home.vol /home
given volfile
+-----

Amar (ಅಮರ್ ತುಂಬಳ್ಳಿ)

2008-Dec-23 19:47 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

Hi Keith,
 Thanks for these logs. Very helpful.

Regards,

2008/12/23 Keith Freedman <freedman at freeformit.com>
> here's the backtrace info from 2 of my crashes:  a logfile excerpt is
> at the end
>


-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Super Storage!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081223/1753f81a/attachment.html>

Anand Avati

2008-Dec-23 19:58 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

Keith,
 Thanks for the reports. Fixed this bug. Will be available in the
next release or off the tla.

Thanks!

avati


2008/12/23 Keith Freedman <freedman at
freeformit.com>:> here's the backtrace info from 2 of my crashes:  a logfile excerpt is
> at the end
> (gdb) bt

Stas Oskin

2008-Dec-24 22:45 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

Hi Keith.

Sorry for the previous email, it was a bit not in-place.

Would you mind sharing how you recovered from this issue?

I'm going to stress test a solution based on GlusterFS next week, including
pulling live disk offline in middle of work, and would appreciate any hints
you might share regarding recovering from the failures.

Regards.

2008/12/23 Keith Freedman <freedman at freeformit.com>

so, I had a drive failure on one of my boxes and it lead to
discovery> of numerous issues today:
>
> 1) when a drive is failing and one of the AFR servers is dealing with
> IO errors, the other one freaks out and sometimes crashes, but
> doesn't seem to ever network timeout.
>
> 2) when starting gluster on the server with the new empty drive, it
> gave me a bunch of errors about things being out of sync and to
> delete a file from all but the preferred server.
> this struck me as odd, since the thing was empty.
> so I used the favorite child, but this isn't a preferred solution long
> term.
>
> 3) one of the directories had 20GB of data in it.... I went to do an
> ls of the directory and had to wait while it auto-healed all the
> files..  while this is helpful, it would be nice to have gotten back
> the directory listing without having to wait for 20GB of data to get
> sent over the network.
>
> 4) while the other server was down, the up server kept failing..
> signal 11?  and I had to constantly remount the filesystem.  It was
> giving me messages about the other node being down which was fine but
> then it'd just die after a while.. consistently.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081225/90d7137a/attachment.html>

Amar Tumballi (bulde)

2008-Dec-24 23:05 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

Replies inline.

> 1) when a drive is failing and one of the AFR servers is dealing with
> IO errors, the other one freaks out and sometimes crashes, but
> doesn't seem to ever network timeout.
>
This was same issue as (4)
>
> 2) when starting gluster on the server with the new empty drive, it
> gave me a bunch of errors about things being out of sync and to
> delete a file from all but the preferred server.
> this struck me as odd, since the thing was empty.
> so I used the favorite child, but this isn't a preferred solution long
> term.
>
Sure, this should not happen.. Not yet fixed. Will be looking at it today.

>
> 3) one of the directories had 20GB of data in it.... I went to do an
> ls of the directory and had to wait while it auto-healed all the
> files..  while this is helpful, it would be nice to have gotten back
> the directory listing without having to wait for 20GB of data to get
> sent over the network.
>
Currently this behavior is not going to be changed (at least til 1.4.0),
because, this can happen only if it is self-healing. And it will make sure
things are ok when accessed first time. As it works fine now, we don't want
to do a code change upto making a stable release.

>
> 4) while the other server was down, the up server kept failing..
> signal 11?  and I had to constantly remount the filesystem.  It was
> giving me messages about the other node being down which was fine but
> then it'd just die after a while.. consistently.
>
This is fixed in tla, we have made a qa release to internal team, once
passes basic tests, will be making next 'RC' release.

Regards,
Amar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081224/772501c2/attachment.html>

Keith Freedman

2008-Dec-24 23:13 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

At 03:05 PM 12/24/2008, Amar Tumballi (bulde) wrote:>3) one of the directories had 20GB of data in it.... I went to do an
>ls of the directory and had to wait while it auto-healed all the
>files..  while this is helpful, it would be nice to have gotten back
>the directory listing without having to wait for 20GB of data to get
>sent over the network.
>
>
>Currently this behavior is not going to be changed (at least til 
>1.4.0), because, this can happen only if it is self-healing. And it 
>will make sure things are ok when accessed first time. As it works 
>fine now, we don''t want to do a code change upto making a stable
release.
I understand the purpose of the functionality and normally would be 
fine with it, but it''s just an inconvenient approach.  Ideally, (in 
1.4.1, perhaps), it would return the directory listing to the request 
then do the actual data transfer in the background.  since the 
directory listing doesn''t imply that one actually cares about the 
individual file data at this point in time.
Also, if this is the case, then if one of the entries in the 
directory is a directory, does that whole directory get auto-healed 
at the same time, or just files within the current directory?  in 
other words, will this cause an auto-heal of an entire directory 
tree, which would be terribly inconvenient if one has to wait all that time.
>4) while the other server was down, the up server kept failing..
>signal 11?  and I had to constantly remount the filesystem.  It was
>giving me messages about the other node being down which was fine but
>then it''d just die after a while.. consistently.
>
>
>This is fixed in tla, we have made a qa release to internal team, 
>once passes basic tests, will be making next ''RC'' release.
I''ll do some testing once the next rc is available for download.

Keith Freedman

2008-Dec-24 23:13 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

At 03:05 PM 12/24/2008, Amar Tumballi (bulde) wrote:>3) one of the directories had 20GB of data in it.... I went to do an
>ls of the directory and had to wait while it auto-healed all the
>files..  while this is helpful, it would be nice to have gotten back
>the directory listing without having to wait for 20GB of data to get
>sent over the network.
>
>
>Currently this behavior is not going to be changed (at least til 
>1.4.0), because, this can happen only if it is self-healing. And it 
>will make sure things are ok when accessed first time. As it works 
>fine now, we don''t want to do a code change upto making a stable
release.
I understand the purpose of the functionality and normally would be 
fine with it, but it''s just an inconvenient approach.  Ideally, (in 
1.4.1, perhaps), it would return the directory listing to the request 
then do the actual data transfer in the background.  since the 
directory listing doesn''t imply that one actually cares about the 
individual file data at this point in time.
Also, if this is the case, then if one of the entries in the 
directory is a directory, does that whole directory get auto-healed 
at the same time, or just files within the current directory?  in 
other words, will this cause an auto-heal of an entire directory 
tree, which would be terribly inconvenient if one has to wait all that time.
>4) while the other server was down, the up server kept failing..
>signal 11?  and I had to constantly remount the filesystem.  It was
>giving me messages about the other node being down which was fine but
>then it''d just die after a while.. consistently.
>
>
>This is fixed in tla, we have made a qa release to internal team, 
>once passes basic tests, will be making next ''RC'' release.
I''ll do some testing once the next rc is available for download.

Keith Freedman

2008-Dec-24 23:13 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

At 03:05 PM 12/24/2008, Amar Tumballi (bulde) wrote:>3) one of the directories had 20GB of data in it.... I went to do an
>ls of the directory and had to wait while it auto-healed all the
>files..  while this is helpful, it would be nice to have gotten back
>the directory listing without having to wait for 20GB of data to get
>sent over the network.
>
>
>Currently this behavior is not going to be changed (at least til 
>1.4.0), because, this can happen only if it is self-healing. And it 
>will make sure things are ok when accessed first time. As it works 
>fine now, we don''t want to do a code change upto making a stable
release.
I understand the purpose of the functionality and normally would be 
fine with it, but it''s just an inconvenient approach.  Ideally, (in 
1.4.1, perhaps), it would return the directory listing to the request 
then do the actual data transfer in the background.  since the 
directory listing doesn''t imply that one actually cares about the 
individual file data at this point in time.
Also, if this is the case, then if one of the entries in the 
directory is a directory, does that whole directory get auto-healed 
at the same time, or just files within the current directory?  in 
other words, will this cause an auto-heal of an entire directory 
tree, which would be terribly inconvenient if one has to wait all that time.
>4) while the other server was down, the up server kept failing..
>signal 11?  and I had to constantly remount the filesystem.  It was
>giving me messages about the other node being down which was fine but
>then it''d just die after a while.. consistently.
>
>
>This is fixed in tla, we have made a qa release to internal team, 
>once passes basic tests, will be making next ''RC'' release.
I''ll do some testing once the next rc is available for download.

Keith Freedman

2008-Dec-24 23:21 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

At 02:45 PM 12/24/2008, Stas Oskin wrote:>Hi Keith.
>
>Sorry for the previous email, it was a bit not in-place.
>
>Would you mind sharing how you recovered from this issue?
I think Amar''s responses to my email will be helpful.
Especially given that some of my issues were bugs that are fixed or 
being fixed, my particular method of recovery wouldn''t necessarily
apply
>I''m going to stress test a solution based on GlusterFS next week, 
>including pulling live disk offline in middle of work, and would 
>appreciate any hints you might share regarding recovering from the failures.
I think with the fixed bugs, it should be as easy as I expected.
once you have an empty underlying filesystem (with no gluster 
extended attributes), AFR should auto-heal the entire directory 
without a problem.
it tried to do this but hit a bug, which was overcome by setting the 
option favorite-child in the AFR translator.
this isn''t necessarily an ideal production run-time configuration, 
but it''s reasonable to set this to recover from a drive failure and 
then unset it after the recovery is complete.

as for specifics of forcing auto-heal:
I used the find command from the wiki:
find /GLUSTERMOUNTPOINT -type f -exec head -1 {} \; > /dev/null

it can be interesting if you tail -f the gluster logfile in another 
window while this goes on.

I''ve found the script "whodir" posted a while go to be
helpful to
when I''m having troubles re-mounting the filesystem when gluster
crashes.

--whodir--
#!/bin/sh
DIR=$1
find /proc 2>/dev/null | grep -E ''cwd|exe'' | xargs ls -l
2>/dev/null
| grep "> $DIR" | sed ''s/  */ /g'' | cut -f8
-d'' '' | cut -f3 -d/ |
sort | uniq | while read line; do echo $line $(cat /proc/$line/cmdline); done

>Regards.
>
>2008/12/23 Keith Freedman 
><<mailto:freedman at freeformit.com>freedman at freeformit.com>
>
>so, I had a drive failure on one of my boxes and it lead to discovery
>of numerous issues today:
>
>1) when a drive is failing and one of the AFR servers is dealing with
>IO errors, the other one freaks out and sometimes crashes, but
>doesn''t seem to ever network timeout.
>
>2) when starting gluster on the server with the new empty drive, it
>gave me a bunch of errors about things being out of sync and to
>delete a file from all but the preferred server.
>this struck me as odd, since the thing was empty.
>so I used the favorite child, but this isn''t a preferred solution
long term.
>
>3) one of the directories had 20GB of data in it.... I went to do an
>ls of the directory and had to wait while it auto-healed all the
>files..  while this is helpful, it would be nice to have gotten back
>the directory listing without having to wait for 20GB of data to get
>sent over the network.
>
>4) while the other server was down, the up server kept failing..
>signal 11?  and I had to constantly remount the filesystem.  It was
>giving me messages about the other node being down which was fine but
>then it''d just die after a while.. consistently.
>
>
>_______________________________________________
>Gluster-users mailing list
><mailto:Gluster-users at gluster.org>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-24 23:21 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

At 02:45 PM 12/24/2008, Stas Oskin wrote:>Hi Keith.
>
>Sorry for the previous email, it was a bit not in-place.
>
>Would you mind sharing how you recovered from this issue?
I think Amar''s responses to my email will be helpful.
Especially given that some of my issues were bugs that are fixed or 
being fixed, my particular method of recovery wouldn''t necessarily
apply
>I''m going to stress test a solution based on GlusterFS next week, 
>including pulling live disk offline in middle of work, and would 
>appreciate any hints you might share regarding recovering from the failures.
I think with the fixed bugs, it should be as easy as I expected.
once you have an empty underlying filesystem (with no gluster 
extended attributes), AFR should auto-heal the entire directory 
without a problem.
it tried to do this but hit a bug, which was overcome by setting the 
option favorite-child in the AFR translator.
this isn''t necessarily an ideal production run-time configuration, 
but it''s reasonable to set this to recover from a drive failure and 
then unset it after the recovery is complete.

as for specifics of forcing auto-heal:
I used the find command from the wiki:
find /GLUSTERMOUNTPOINT -type f -exec head -1 {} \; > /dev/null

it can be interesting if you tail -f the gluster logfile in another 
window while this goes on.

I''ve found the script "whodir" posted a while go to be
helpful to
when I''m having troubles re-mounting the filesystem when gluster
crashes.

--whodir--
#!/bin/sh
DIR=$1
find /proc 2>/dev/null | grep -E ''cwd|exe'' | xargs ls -l
2>/dev/null
| grep "> $DIR" | sed ''s/  */ /g'' | cut -f8
-d'' '' | cut -f3 -d/ |
sort | uniq | while read line; do echo $line $(cat /proc/$line/cmdline); done

>Regards.
>
>2008/12/23 Keith Freedman 
><<mailto:freedman at freeformit.com>freedman at freeformit.com>
>
>so, I had a drive failure on one of my boxes and it lead to discovery
>of numerous issues today:
>
>1) when a drive is failing and one of the AFR servers is dealing with
>IO errors, the other one freaks out and sometimes crashes, but
>doesn''t seem to ever network timeout.
>
>2) when starting gluster on the server with the new empty drive, it
>gave me a bunch of errors about things being out of sync and to
>delete a file from all but the preferred server.
>this struck me as odd, since the thing was empty.
>so I used the favorite child, but this isn''t a preferred solution
long term.
>
>3) one of the directories had 20GB of data in it.... I went to do an
>ls of the directory and had to wait while it auto-healed all the
>files..  while this is helpful, it would be nice to have gotten back
>the directory listing without having to wait for 20GB of data to get
>sent over the network.
>
>4) while the other server was down, the up server kept failing..
>signal 11?  and I had to constantly remount the filesystem.  It was
>giving me messages about the other node being down which was fine but
>then it''d just die after a while.. consistently.
>
>
>_______________________________________________
>Gluster-users mailing list
><mailto:Gluster-users at gluster.org>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Keith Freedman

2008-Dec-24 23:21 UTC

head link

[Gluster-users] 1.4.0RC6 AFR problems

At 02:45 PM 12/24/2008, Stas Oskin wrote:>Hi Keith.
>
>Sorry for the previous email, it was a bit not in-place.
>
>Would you mind sharing how you recovered from this issue?
I think Amar''s responses to my email will be helpful.
Especially given that some of my issues were bugs that are fixed or 
being fixed, my particular method of recovery wouldn''t necessarily
apply
>I''m going to stress test a solution based on GlusterFS next week, 
>including pulling live disk offline in middle of work, and would 
>appreciate any hints you might share regarding recovering from the failures.
I think with the fixed bugs, it should be as easy as I expected.
once you have an empty underlying filesystem (with no gluster 
extended attributes), AFR should auto-heal the entire directory 
without a problem.
it tried to do this but hit a bug, which was overcome by setting the 
option favorite-child in the AFR translator.
this isn''t necessarily an ideal production run-time configuration, 
but it''s reasonable to set this to recover from a drive failure and 
then unset it after the recovery is complete.

as for specifics of forcing auto-heal:
I used the find command from the wiki:
find /GLUSTERMOUNTPOINT -type f -exec head -1 {} \; > /dev/null

it can be interesting if you tail -f the gluster logfile in another 
window while this goes on.

I''ve found the script "whodir" posted a while go to be
helpful to
when I''m having troubles re-mounting the filesystem when gluster
crashes.

--whodir--
#!/bin/sh
DIR=$1
find /proc 2>/dev/null | grep -E ''cwd|exe'' | xargs ls -l
2>/dev/null
| grep "> $DIR" | sed ''s/  */ /g'' | cut -f8
-d'' '' | cut -f3 -d/ |
sort | uniq | while read line; do echo $line $(cat /proc/$line/cmdline); done

>Regards.
>
>2008/12/23 Keith Freedman 
><<mailto:freedman at freeformit.com>freedman at freeformit.com>
>
>so, I had a drive failure on one of my boxes and it lead to discovery
>of numerous issues today:
>
>1) when a drive is failing and one of the AFR servers is dealing with
>IO errors, the other one freaks out and sometimes crashes, but
>doesn''t seem to ever network timeout.
>
>2) when starting gluster on the server with the new empty drive, it
>gave me a bunch of errors about things being out of sync and to
>delete a file from all but the preferred server.
>this struck me as odd, since the thing was empty.
>so I used the favorite child, but this isn''t a preferred solution
long term.
>
>3) one of the directories had 20GB of data in it.... I went to do an
>ls of the directory and had to wait while it auto-healed all the
>files..  while this is helpful, it would be nice to have gotten back
>the directory listing without having to wait for 20GB of data to get
>sent over the network.
>
>4) while the other server was down, the up server kept failing..
>signal 11?  and I had to constantly remount the filesystem.  It was
>giving me messages about the other node being down which was fine but
>then it''d just die after a while.. consistently.
>
>
>_______________________________________________
>Gluster-users mailing list
><mailto:Gluster-users at gluster.org>Gluster-users at gluster.org
>http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Dec 2008 - Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] Mounting from fstab

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

[Gluster-users] 1.4.0RC6 AFR problems (backtrace info attached)

[Gluster-users] 1.4.0RC6 AFR problems

[Gluster-users] 1.4.0RC6 AFR problems

[Gluster-users] 1.4.0RC6 AFR problems

[Gluster-users] 1.4.0RC6 AFR problems

[Gluster-users] 1.4.0RC6 AFR problems

[Gluster-users] 1.4.0RC6 AFR problems

[Gluster-users] 1.4.0RC6 AFR problems

[Gluster-users] 1.4.0RC6 AFR problems