Hi- I've got a problem where certain batches of files written out to gluster have disappeared. Also, newly created files sometimes don't show up to ls unless they are explicitly specified to ls and other tools. In my export folder, everything appears fine. I have found that when I touch the missing file in gluster, it comes back, shows a file size, but appears empty. I've tried umounting, restarting all glusterfsds, remounting, and it stayed the same. Also, this problem did not show up immediately after setting up the filesystem, at least during basic tests. Any ideas? My config: Fedora Core 10 x64 Infiniband transport to most hosts. 2 5 host raid0 configs with one raid1 over the top of them. 10 TB total, 5 TB effective. thx- Jeremy
Hi Jeremy, What GlusterFS version are you using and what is the configuration? You can try installing the lastest stable GlusterFS and let us know if this problem does repeat again. Pavan Pavan On 04/11/09 01:31 -0600, Jeremy Enos wrote:> Hi- > I've got a problem where certain batches of files written out to gluster > have disappeared. Also, newly created files sometimes don't show up to > ls unless they are explicitly specified to ls and other tools. > > In my export folder, everything appears fine. > > I have found that when I touch the missing file in gluster, it comes > back, shows a file size, but appears empty. I've tried umounting, > restarting all glusterfsds, remounting, and it stayed the same. Also, > this problem did not show up immediately after setting up the > filesystem, at least during basic tests. Any ideas? > > My config: > Fedora Core 10 x64 > Infiniband transport to most hosts. > 2 5 host raid0 configs with one raid1 over the top of them. > 10 TB total, 5 TB effective. > > thx- > > Jeremy > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
On Wed, Nov 04, 2009 at 01:31:30AM -0600, Jeremy Enos wrote:> Hi- > I've got a problem where certain batches of files written out to gluster > have disappeared. Also, newly created files sometimes don't show up to > ls unless they are explicitly specified to ls and other tools. > > In my export folder, everything appears fine. > > I have found that when I touch the missing file in gluster, it comes > back, shows a file size, but appears empty. I've tried umounting, > restarting all glusterfsds, remounting, and it stayed the same. Also, > this problem did not show up immediately after setting up the > filesystem, at least during basic tests. Any ideas?What is your configuration? I experienced similar problems with unify after a disk crash. The namespace (replicated) was not rebuilt correctly after replacing the failing unit and I had to add some files manually (OK, using a script, but an intervention was needed). No data loss, only a bit of tweaking and tampering ;). Krzysztof
plain text send... Jeremy Enos wrote:> What kind of tweaking and tampering was necessary to recover the lost > data? > > Jeremy > > My configuration: > Oh yes- of course- don't know why I left this out. Version and config > files follow. > > [jenos at ac glusterfs]$ rpm -qa |grep gluster > glusterfs-common-2.0.7-1.fc10.x86_64 > glusterfs-client-2.0.7-1.fc10.x86_64 > > > [jenos at ac glusterfs]$ cat glusterfs.vol > #-----------IB remotes------------------ > volume remote1 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac11 > option remote-subvolume ibstripe > end-volume > > volume remote2 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac12 > option remote-subvolume ibstripe > end-volume > > volume remote3 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac13 > option remote-subvolume ibstripe > end-volume > > volume remote4 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac14 > option remote-subvolume ibstripe > end-volume > > volume remote5 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac15 > option remote-subvolume ibstripe > end-volume > > volume remote6 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac16 > option remote-subvolume ibstripe > end-volume > > volume remote7 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac17 > option remote-subvolume ibstripe > end-volume > > volume remote8 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac18 > option remote-subvolume ibstripe > end-volume > > volume remote9 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac19 > option remote-subvolume ibstripe > end-volume > > volume remote10 > type protocol/client > option transport-type ib-verbs/client > option remote-host ac20 > option remote-subvolume ibstripe > end-volume > > #----------Stripe and Replicate------------------ > > volume stripe1 > type cluster/stripe > option block-size 1MB > subvolumes remote1 remote2 remote3 remote4 remote5 > end-volume > > volume stripe2 > type cluster/stripe > option block-size 1MB > subvolumes remote6 remote7 remote8 remote9 remote10 > end-volume > > volume replicate > type cluster/replicate > option metadata-self-heal on > subvolumes stripe1 stripe2 > end-volume > > #------------Performance Options------------------- > > volume readahead > type performance/read-ahead > option page-count 4 # 2 is default option > option force-atime-update off # default is off > subvolumes replicate > end-volume > > volume writebehind > type performance/write-behind > option cache-size 1MB > subvolumes readahead > end-volume > > volume cache > type performance/io-cache > option cache-size 1GB > subvolumes writebehind > end-volume > > [jenos at ac glusterfs]$ cat glusterfsd.vol > volume posix > type storage/posix > option directory /export > end-volume > > volume locks > type features/locks > subvolumes posix > end-volume > > volume ibstripe > type performance/io-threads > option thread-count 4 > subvolumes locks > end-volume > > volume server-ib > type protocol/server > option transport-type ib-verbs/server > option auth.addr.ibstripe.allow * > subvolumes ibstripe > end-volume > > volume server-tcp > type protocol/server > option transport-type tcp/server > option auth.addr.ibstripe.allow * > subvolumes ibstripe > end-volume > > [jenos at ac glusterfs]$ > > > > Krzysztof Strasburger wrote: >> On Wed, Nov 04, 2009 at 01:31:30AM -0600, Jeremy Enos wrote: >> >>> Hi- >>> I've got a problem where certain batches of files written out to gluster >>> have disappeared. Also, newly created files sometimes don't show up to >>> ls unless they are explicitly specified to ls and other tools. >>> >>> In my export folder, everything appears fine. >>> >>> I have found that when I touch the missing file in gluster, it comes >>> back, shows a file size, but appears empty. I've tried umounting, >>> restarting all glusterfsds, remounting, and it stayed the same. Also, >>> this problem did not show up immediately after setting up the >>> filesystem, at least during basic tests. Any ideas? >>> >> What is your configuration? I experienced similar problems with unify >> after a disk crash. The namespace (replicated) was not rebuilt correctly >> after replacing the failing unit and I had to add some files manually >> (OK, using a script, but an intervention was needed). No data loss, >> only a bit of tweaking and tampering ;). >> Krzysztof >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >>
Jeremy Enos wrote:> Krzysztof Strasburger wrote: >> On Mon, Nov 23, 2009 at 07:39:09PM -0600, Jeremy Enos wrote: >> >>> I have another clue to report: >>> So I have my export directory as: >>> /export >>> Mounted as: >>> /scratch >>> >>> If I do "ls -lR /scratch", it's supposed to synchronize all files and >>> metadata, right? Well, it doesn't seem to be doing that. >>> >>> I have approx 100 files in one problematic folder. Only 50 show up to >>> ls. That is, until I list it specifically. They also don't show up in >>> the export directory until ls'd by name in /scratch. >>> >>> ls /scratch/file* # results in files1-49 being listed >>> ls /export/file* # same result as above >>> ls /export/file50.dat # no such file or directory >>> ls /scratch/file50.dat # lists file as if nothing was ever wrong >>> ls /export/file50.dat # shows up now after specific ls call in /scratch >>> ls /scratch/file* # results in files 1-50 being listed now (magic?) >>> ls /export/file* # also results in files 1-50 being listed now >>> >> OK, this seems to be the same problem (as mine) in a different configuration. >> Only the first subvolume of replicated data is checked. >> >>>>>> volume stripe1 >>>>>> type cluster/stripe >>>>>> option block-size 1MB >>>>>> subvolumes remote1 remote2 remote3 remote4 remote5 >>>>>> end-volume >>>>>> >>>>>> volume stripe2 >>>>>> type cluster/stripe >>>>>> option block-size 1MB >>>>>> subvolumes remote6 remote7 remote8 remote9 remote10 >>>>>> end-volume >>>>>> >>>>>> volume replicate >>>>>> type cluster/replicate >>>>>> option metadata-self-heal on >>>>>> subvolumes stripe1 stripe2 >>>>>> end-volume >>>>>> >>>>>> >> Glusterfs developers claim that it is unsafe, to shuffle subvolumes, as the >> first one is used as the lock server. >> But it should be safe (IMHO) to workaround in the following manner: >> 1. umount the replicated volume on all clients, >> 2. modify the config file (everywhere!): >> subvolumes stripe2 stripe1 >> 3. mount it again. >> Now stripe2 appears as the first subvolume and ls -R should do the >> synchronization, as expected. >> Krzysztof >> >> > Thank you for the suggestion! I can't wait to try it out. One > question though- if I had it unmounted everywhere, and just used one > client to mount the fs with shuffled volumes, then ls -lR, then > unmount, then remount with unshuffled volumes, then remount > everywhere- would that be expected to have the same effect? i.e. > Just using a single client in a temporarily shuffled config to force > the sync? > thx- > > Jeremy >Hi Krzysztof- I tried your suggestion (on all clients), but the symptom I described above still exists, mostly. It used to be that once a missing (hidden) file was ls'd by name, it would appear. That's not the case anymore. It still shows up to the explicit "ls file51.dat", but doesn't show up to "ls file*". Another clue though- On one of 33 clients, a missing file shows a file size. On the other 32, it is listed as a zero byte file. thanks for your suggestions. Jeremy