Kotresh Hiremath Ravishankar
2016-Sep-22 09:15 UTC
[Gluster-users] 3.8.3 Bitrot signature process
Hi Amudhan, It's as of now, hard coded based on some testing results. That part is not tune-able yet. Only scrubber throttling is tune-able. As I have told you, because brick process has an open fd, bitrot signer process is not picking it up for scrubbing. Please raise a bug. We will take a look at it. Thanks and Regards, Kotresh H R ----- Original Message -----> From: "Amudhan P" <amudhan83 at gmail.com> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > Cc: "Gluster Users" <gluster-users at gluster.org> > Sent: Thursday, September 22, 2016 2:37:25 PM > Subject: Re: 3.8.3 Bitrot signature process > > Hi Kotresh, > > its same behaviour in replicated volume also, file fd opens after 120 > seconds in brick pid. > > for calculating signature for 100MB file it took 15m57s. > > > How can i increase CPU usage?, in your earlier mail you have said "To limit > the usage of CPU, throttling is done using token bucket algorithm". > any possibility of increasing bitrot hash calculation speed ?. > > > Thanks, > Amudhan > > > On Thu, Sep 22, 2016 at 11:44 AM, Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > > > Hi Amudhan, > > > > Thanks for the confirmation. If that's the case please try with dist-rep > > volume, > > and see if you are observing similar behavior. > > > > In any case please raise a bug for the same with your observations. We > > will work > > on it. > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > Cc: "Gluster Users" <gluster-users at gluster.org> > > > Sent: Thursday, September 22, 2016 11:25:28 AM > > > Subject: Re: 3.8.3 Bitrot signature process > > > > > > Hi Kotresh, > > > > > > 2280 is a brick process, i have not tried with dist-rep volume? > > > > > > I have not seen any fd in bitd process in any of the node's and bitd > > > process usage always 0% CPU and randomly it goes 0.3% CPU. > > > > > > > > > > > > Thanks, > > > Amudhan > > > > > > On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar < > > > khiremat at redhat.com> wrote: > > > > Hi Amudhan, > > > > > > > > No, bitrot signer is a different process by itself and is not part of > > > brick process. > > > > I believe the process 2280 is a brick process ? Did you check with > > > dist-rep volume? > > > > Is the same behavior being observed there as well? We need to figure > > out > > > why brick > > > > process is holding that fd for such a long time. > > > > > > > > Thanks and Regards, > > > > Kotresh H R > > > > > > > > ----- Original Message ----- > > > >> From: "Amudhan P" <amudhan83 at gmail.com> > > > >> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > >> Sent: Wednesday, September 21, 2016 8:15:33 PM > > > >> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process > > > >> > > > >> Hi Kotresh, > > > >> > > > >> As soon as fd closes from brick1 pid, i can see bitrot signature for > > the > > > >> file in brick. > > > >> > > > >> So, it looks like fd opened by brick process to calculate signature. > > > >> > > > >> output of the file: > > > >> > > > >> -rw-r--r-- 2 root root 250M Sep 21 18:32 > > > >> /media/disk1/brick1/data/G/test59-bs10M-c100.nul > > > >> > > > >> getfattr: Removing leading '/' from absolute path names > > > >> # file: media/disk1/brick1/data/G/test59-bs10M-c100.nul > > > >> trusted.bit-rot.signature=0x010200000000000000e9474e4cc6 > > > 73c0c227a6e807e04aa4ab1f88d3744243950a290869c53daa65df > > > >> trusted.bit-rot.version=0x020000000000000057d6af3200012a13 > > > >> trusted.ec.config=0x0000080501000200 > > > >> trusted.ec.size=0x000000003e800000 > > > >> trusted.ec.version=0x0000000000001f400000000000001f40 > > > >> trusted.gfid=0x4c091145429448468fffe358482c63e1 > > > >> > > > >> stat /media/disk1/brick1/data/G/test59-bs10M-c100.nul > > > >> File: ?/media/disk1/brick1/data/G/test59-bs10M-c100.nul? > > > >> Size: 262144000 Blocks: 512000 IO Block: 4096 regular > > file > > > >> Device: 811h/2065d Inode: 402653311 Links: 2 > > > >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ > > root) > > > >> Access: 2016-09-21 18:34:43.722712751 +0530 > > > >> Modify: 2016-09-21 18:32:41.650712946 +0530 > > > >> Change: 2016-09-21 19:14:41.698708914 +0530 > > > >> Birth: - > > > >> > > > >> > > > >> In other 2 bricks in same set, still signature is not updated for the > > > same > > > >> file. > > > >> > > > >> > > > >> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P <amudhan83 at gmail.com> > > wrote: > > > >> > > > >> > Hi Kotresh, > > > >> > > > > >> > I am very sure, No read was going on from mount point. > > > >> > > > > >> > Again i did same test but after writing data to mount point. I have > > > >> > unmounted mount point. > > > >> > > > > >> > after 120 seconds i am seeing this file fd entry in brick 1 pid > > > >> > > > > >> > getfattr -m. -e hex -d test59-bs10 > > > >> > # file: test59-bs10M-c100.nul > > > >> > trusted.bit-rot.version=0x020000000000000057bed574000ed534 > > > >> > trusted.ec.config=0x0000080501000200 > > > >> > trusted.ec.size=0x000000003e800000 > > > >> > trusted.ec.version=0x0000000000001f400000000000001f40 > > > >> > trusted.gfid=0x4c091145429448468fffe358482c63e1 > > > >> > > > > >> > > > > >> > ls -l /proc/2280/fd > > > >> > lr-x------ 1 root root 64 Sep 21 13:08 19 -> /media/disk1/brick1/. > > > >> > glusterfs/4c/09/4c091145-4294-4846-8fff-e358482c63e1 > > > >> > > > > >> > Volume is a EC - 4+1 > > > >> > > > > >> > On Wed, Sep 21, 2016 at 6:17 PM, Kotresh Hiremath Ravishankar < > > > >> > khiremat at redhat.com> wrote: > > > >> > > > > >> >> Hi Amudhan, > > > >> >> > > > >> >> If you see the ls output, some process has a fd opened in the > > backend. > > > >> >> That is the reason bitrot is not considering for the signing. > > > >> >> Could you please observe, after 120 secs of closure of > > > >> >> "/media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > >> >> 85bf-f21f99fd8764" > > > >> >> the signing happens. If so we need to figure out who holds this fd > > for > > > >> >> such a long time. > > > >> >> And also we need to figure is this issue specific to EC volume. > > > >> >> > > > >> >> Thanks and Regards, > > > >> >> Kotresh H R > > > >> >> > > > >> >> ----- Original Message ----- > > > >> >> > From: "Amudhan P" <amudhan83 at gmail.com> > > > >> >> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > >> >> > Cc: "Gluster Users" <gluster-users at gluster.org> > > > >> >> > Sent: Wednesday, September 21, 2016 4:56:40 PM > > > >> >> > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process > > > >> >> > > > > >> >> > Hi Kotresh, > > > >> >> > > > > >> >> > > > > >> >> > Writing new file. > > > >> >> > > > > >> >> > getfattr -m. -e hex -d /media/disk2/brick2/data/G/ > > > test58-bs10M-c100.nul > > > >> >> > getfattr: Removing leading '/' from absolute path names > > > >> >> > # file: media/disk2/brick2/data/G/test58-bs10M-c100.nul > > > >> >> > trusted.bit-rot.version=0x020000000000000057da8b23000b120e > > > >> >> > trusted.ec.config=0x0000080501000200 > > > >> >> > trusted.ec.size=0x000000003e800000 > > > >> >> > trusted.ec.version=0x0000000000001f400000000000001f40 > > > >> >> > trusted.gfid=0x6e7c49e6094e443585bff21f99fd8764 > > > >> >> > > > > >> >> > > > > >> >> > Running ls -l in brick 2 pid > > > >> >> > > > > >> >> > ls -l /proc/30162/fd > > > >> >> > > > > >> >> > lr-x------ 1 root root 64 Sep 21 16:22 59 -> > > > >> >> > /media/disk2/brick2/.glusterfs/quanrantine > > > >> >> > lrwx------ 1 root root 64 Sep 21 16:22 6 -> > > > >> >> > /var/lib/glusterd/vols/glsvol1/run/10.1.2.2-media- > > disk2-brick2.pid > > > >> >> > lr-x------ 1 root root 64 Sep 21 16:25 60 -> > > > >> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > >> >> 85bf-f21f99fd8764 > > > >> >> > lr-x------ 1 root root 64 Sep 21 16:22 61 -> > > > >> >> > /media/disk2/brick2/.glusterfs/quanrantine > > > >> >> > > > > >> >> > > > > >> >> > find /media/disk2/ -samefile > > > >> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > >> >> 85bf-f21f99fd8764 > > > >> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > >> >> 85bf-f21f99fd8764 > > > >> >> > /media/disk2/brick2/data/G/test58-bs10M-c100.nul > > > >> >> > > > > >> >> > > > > >> >> > > > > >> >> > On Wed, Sep 21, 2016 at 3:28 PM, Kotresh Hiremath Ravishankar < > > > >> >> > khiremat at redhat.com> wrote: > > > >> >> > > > > >> >> > > Hi Amudhan, > > > >> >> > > > > > >> >> > > Don't grep for the filename, glusterfs maintains hardlink in > > > >> >> .glusterfs > > > >> >> > > directory > > > >> >> > > for each file. Just check 'ls -l /proc/<respective brick > > pid>/fd' > > > for > > > >> >> any > > > >> >> > > fds opened > > > >> >> > > for a file in .glusterfs and check if it's the same file. > > > >> >> > > > > > >> >> > > Thanks and Regards, > > > >> >> > > Kotresh H R > > > >> >> > > > > > >> >> > > ----- Original Message ----- > > > >> >> > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > >> >> > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > >> >> > > > Cc: "Gluster Users" <gluster-users at gluster.org> > > > >> >> > > > Sent: Wednesday, September 21, 2016 1:33:10 PM > > > >> >> > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process > > > >> >> > > > > > > >> >> > > > Hi Kotresh, > > > >> >> > > > > > > >> >> > > > i have used below command to verify any open fd for file. > > > >> >> > > > > > > >> >> > > > "ls -l /proc/*/fd | grep filename". > > > >> >> > > > > > > >> >> > > > as soon as write completes there no open fd's, if there is > > any > > > >> >> alternate > > > >> >> > > > option. please let me know will also try that. > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > Also, below is my scrub status in my test setup. number of > > > skipped > > > >> >> files > > > >> >> > > > slow reducing day by day. I think files are skipped due to > > > bitrot > > > >> >> > > signature > > > >> >> > > > process is not completed yet. > > > >> >> > > > > > > >> >> > > > where can i see scrub skipped files? > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > Volume name : glsvol1 > > > >> >> > > > > > > >> >> > > > State of scrub: Active (Idle) > > > >> >> > > > > > > >> >> > > > Scrub impact: normal > > > >> >> > > > > > > >> >> > > > Scrub frequency: daily > > > >> >> > > > > > > >> >> > > > Bitrot error log location: /var/log/glusterfs/bitd.log > > > >> >> > > > > > > >> >> > > > Scrubber error log location: /var/log/glusterfs/scrub.log > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > ========================================================> > > >> >> > > > > > > >> >> > > > Node: localhost > > > >> >> > > > > > > >> >> > > > Number of Scrubbed files: 1644 > > > >> >> > > > > > > >> >> > > > Number of Skipped files: 1001 > > > >> >> > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 11:59:58 > > > >> >> > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:26 > > > >> >> > > > > > > >> >> > > > Error count: 0 > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > ========================================================> > > >> >> > > > > > > >> >> > > > Node: 10.1.2.3 > > > >> >> > > > > > > >> >> > > > Number of Scrubbed files: 1644 > > > >> >> > > > > > > >> >> > > > Number of Skipped files: 1001 > > > >> >> > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 10:50:00 > > > >> >> > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:38:17 > > > >> >> > > > > > > >> >> > > > Error count: 0 > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > ========================================================> > > >> >> > > > > > > >> >> > > > Node: 10.1.2.4 > > > >> >> > > > > > > >> >> > > > Number of Scrubbed files: 981 > > > >> >> > > > > > > >> >> > > > Number of Skipped files: 1664 > > > >> >> > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 12:38:01 > > > >> >> > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:35:19 > > > >> >> > > > > > > >> >> > > > Error count: 0 > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > ========================================================> > > >> >> > > > > > > >> >> > > > Node: 10.1.2.1 > > > >> >> > > > > > > >> >> > > > Number of Scrubbed files: 1263 > > > >> >> > > > > > > >> >> > > > Number of Skipped files: 1382 > > > >> >> > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 11:57:21 > > > >> >> > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:37:17 > > > >> >> > > > > > > >> >> > > > Error count: 0 > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > ========================================================> > > >> >> > > > > > > >> >> > > > Node: 10.1.2.2 > > > >> >> > > > > > > >> >> > > > Number of Scrubbed files: 1644 > > > >> >> > > > > > > >> >> > > > Number of Skipped files: 1001 > > > >> >> > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 11:59:25 > > > >> >> > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:18 > > > >> >> > > > > > > >> >> > > > Error count: 0 > > > >> >> > > > > > > >> >> > > > ========================================================> > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > Thanks > > > >> >> > > > Amudhan > > > >> >> > > > > > > >> >> > > > > > > >> >> > > > On Wed, Sep 21, 2016 at 11:45 AM, Kotresh Hiremath > > Ravishankar < > > > >> >> > > > khiremat at redhat.com> wrote: > > > >> >> > > > > > > >> >> > > > > Hi Amudhan, > > > >> >> > > > > > > > >> >> > > > > I don't think it's the limitation with read data from the > > > brick. > > > >> >> > > > > To limit the usage of CPU, throttling is done using token > > > bucket > > > >> >> > > > > algorithm. The log message showed is related to it. But > > even > > > then > > > >> >> > > > > I think it should not take 12 minutes for check-sum > > > calculation > > > >> >> unless > > > >> >> > > > > there is an fd open (might be internal). Could you please > > > cross > > > >> >> verify > > > >> >> > > > > if there are any fd opened on that file by looking into > > > /proc? I > > > >> >> will > > > >> >> > > > > also test it out in the mean time and get back to you. > > > >> >> > > > > > > > >> >> > > > > Thanks and Regards, > > > >> >> > > > > Kotresh H R > > > >> >> > > > > > > > >> >> > > > > ----- Original Message ----- > > > >> >> > > > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > >> >> > > > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > >> >> > > > > > Cc: "Gluster Users" <gluster-users at gluster.org> > > > >> >> > > > > > Sent: Tuesday, September 20, 2016 3:19:28 PM > > > >> >> > > > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature > > process > > > >> >> > > > > > > > > >> >> > > > > > Hi Kotresh, > > > >> >> > > > > > > > > >> >> > > > > > Please correct me if i am wrong, Once a file write > > completes > > > >> >> and as > > > >> >> > > soon > > > >> >> > > > > as > > > >> >> > > > > > closes fds, bitrot waits for 120 seconds and starts > > hashing > > > and > > > >> >> > > update > > > >> >> > > > > > signature for the file in brick. > > > >> >> > > > > > > > > >> >> > > > > > But, what i am feeling that bitrot takes too much of > > time to > > > >> >> complete > > > >> >> > > > > > hashing. > > > >> >> > > > > > > > > >> >> > > > > > below is test result i would like to share. > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > writing data in below path using dd : > > > >> >> > > > > > > > > >> >> > > > > > /mnt/gluster/data/G (mount point) > > > >> >> > > > > > -rw-r--r-- 1 root root 10M Sep 20 12:19 > > test53-bs10M-c1.nul > > > >> >> > > > > > -rw-r--r-- 1 root root 100M Sep 20 12:19 > > > test54-bs10M-c10.nul > > > >> >> > > > > > > > > >> >> > > > > > No any other write or read process is going on. > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > Checking file data in one of the brick. > > > >> >> > > > > > > > > >> >> > > > > > -rw-r--r-- 2 root root 2.5M Sep 20 12:23 > > test53-bs10M-c1.nul > > > >> >> > > > > > -rw-r--r-- 2 root root 25M Sep 20 12:23 > > > test54-bs10M-c10.nul > > > >> >> > > > > > > > > >> >> > > > > > file's stat and getfattr info from brick, after write > > > process > > > >> >> > > completed. > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > >> >> test53-bs10M-c1.nul > > > >> >> > > > > > File: ?test53-bs10M-c1.nul? > > > >> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096 > > > >> >> regular > > > >> >> > > file > > > >> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2 > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( > > > 0/ > > > >> >> > > root) > > > >> >> > > > > > Access: 2016-09-20 12:23:28.798886647 +0530 > > > >> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530 > > > >> >> > > > > > Change: 2016-09-20 12:23:28.998886646 +0530 > > > >> >> > > > > > Birth: - > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > >> >> test54-bs10M-c10.nul > > > >> >> > > > > > File: ?test54-bs10M-c10.nul? > > > >> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096 > > > >> >> regular > > > >> >> > > file > > > >> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2 > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( > > > 0/ > > > >> >> > > root) > > > >> >> > > > > > Access: 2016-09-20 12:23:42.902886624 +0530 > > > >> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530 > > > >> >> > > > > > Change: 2016-09-20 12:23:44.378886622 +0530 > > > >> >> > > > > > Birth: - > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr > > -m. > > > -e > > > >> >> hex -d > > > >> >> > > > > > test53-bs10M-c1.nul > > > >> >> > > > > > # file: test53-bs10M-c1.nul > > > >> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002 > > e5b4 > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > >> >> > > > > > trusted.ec.size=0x0000000000a00000 > > > >> >> > > > > > trusted.ec.version=0x00000000000000500000000000000050 > > > >> >> > > > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99 > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr > > -m. > > > -e > > > >> >> hex -d > > > >> >> > > > > > test54-bs10M-c10.nul > > > >> >> > > > > > # file: test54-bs10M-c10.nul > > > >> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002 > > e5b4 > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > >> >> > > > > > trusted.ec.size=0x0000000006400000 > > > >> >> > > > > > trusted.ec.version=0x00000000000003200000000000000320 > > > >> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5 > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > file's stat and getfattr info from brick, after bitrot > > > signature > > > >> >> > > updated. > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > >> >> test53-bs10M-c1.nul > > > >> >> > > > > > File: ?test53-bs10M-c1.nul? > > > >> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096 > > > >> >> regular > > > >> >> > > file > > > >> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2 > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( > > > 0/ > > > >> >> > > root) > > > >> >> > > > > > Access: 2016-09-20 12:25:31.494886450 +0530 > > > >> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530 > > > >> >> > > > > > Change: 2016-09-20 12:27:00.994886307 +0530 > > > >> >> > > > > > Birth: - > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr > > -m. > > > -e > > > >> >> hex -d > > > >> >> > > > > > test53-bs10M-c1.nul > > > >> >> > > > > > # file: test53-bs10M-c1.nul > > > >> >> > > > > > trusted.bit-rot.signature=0x0102000000000000006de7493c5c > > > >> >> > > > > 90f643357c268fbaaf461c1567e0334e4948023ce17268403aa37a > > > >> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002 > > e5b4 > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > >> >> > > > > > trusted.ec.size=0x0000000000a00000 > > > >> >> > > > > > trusted.ec.version=0x00000000000000500000000000000050 > > > >> >> > > > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99 > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > >> >> test54-bs10M-c10.nul > > > >> >> > > > > > File: ?test54-bs10M-c10.nul? > > > >> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096 > > > >> >> regular > > > >> >> > > file > > > >> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2 > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( > > > 0/ > > > >> >> > > root) > > > >> >> > > > > > Access: 2016-09-20 12:25:47.510886425 +0530 > > > >> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530 > > > >> >> > > > > > Change: 2016-09-20 12:38:05.954885243 +0530 > > > >> >> > > > > > Birth: - > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr > > -m. > > > -e > > > >> >> hex -d > > > >> >> > > > > > test54-bs10M-c10.nul > > > >> >> > > > > > # file: test54-bs10M-c10.nul > > > >> >> > > > > > trusted.bit-rot.signature=0x010200000000000000394c345f0b > > > >> >> > > > > 0c63ee652627a62eed069244d35c4d5134e4f07d4eabb51afda47e > > > >> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002 > > e5b4 > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > >> >> > > > > > trusted.ec.size=0x0000000006400000 > > > >> >> > > > > > trusted.ec.version=0x00000000000003200000000000000320 > > > >> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5 > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > (Actual time taken for reading file from brick for > > md5sum) > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum > > > >> >> > > test53-bs10M-c1.nul > > > >> >> > > > > > 8354dcaa18a1ecb52d0895bf00888c44 test53-bs10M-c1.nul > > > >> >> > > > > > > > > >> >> > > > > > real 0m0.045s > > > >> >> > > > > > user 0m0.007s > > > >> >> > > > > > sys 0m0.003s > > > >> >> > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum > > > >> >> > > > > test54-bs10M-c10.nul > > > >> >> > > > > > bed3c0a4a1407f584989b4009e9ce33f test54-bs10M-c10.nul > > > >> >> > > > > > > > > >> >> > > > > > real 0m0.166s > > > >> >> > > > > > user 0m0.062s > > > >> >> > > > > > sys 0m0.011s > > > >> >> > > > > > > > > >> >> > > > > > As you can see that 'test54-bs10M-c10.nul' file took > > around > > > 12 > > > >> >> > > minutes to > > > >> >> > > > > > update bitort signature (pls refer stat output for the > > > file). > > > >> >> > > > > > > > > >> >> > > > > > what would be the cause for such a slow read?. Any > > > limitation > > > >> >> in read > > > >> >> > > > > data > > > >> >> > > > > > from brick? > > > >> >> > > > > > > > > >> >> > > > > > Also, i am seeing this line bitd.log, what does this > > mean? > > > >> >> > > > > > [bit-rot.c:1784:br_rate_limit_signer] > > 0-glsvol1-bit-rot-0: > > > >> >> [Rate > > > >> >> > > Limit > > > >> >> > > > > > Info] "tokens/sec (rate): 131072, maxlimit: 524288 > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > Thanks > > > >> >> > > > > > Amudhan P > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > > >> >> > > > > > On Mon, Sep 19, 2016 at 1:00 PM, Kotresh Hiremath > > > Ravishankar < > > > >> >> > > > > > khiremat at redhat.com> wrote: > > > >> >> > > > > > > > > >> >> > > > > > > Hi Amudhan, > > > >> >> > > > > > > > > > >> >> > > > > > > Thanks for testing out the bitrot feature and sorry for > > > the > > > >> >> delayed > > > >> >> > > > > > > response. > > > >> >> > > > > > > Please find the answers inline. > > > >> >> > > > > > > > > > >> >> > > > > > > Thanks and Regards, > > > >> >> > > > > > > Kotresh H R > > > >> >> > > > > > > > > > >> >> > > > > > > ----- Original Message ----- > > > >> >> > > > > > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > >> >> > > > > > > > To: "Gluster Users" <gluster-users at gluster.org> > > > >> >> > > > > > > > Sent: Friday, September 16, 2016 4:14:10 PM > > > >> >> > > > > > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature > > > process > > > >> >> > > > > > > > > > > >> >> > > > > > > > Hi, > > > >> >> > > > > > > > > > > >> >> > > > > > > > Can anyone reply to this mail. > > > >> >> > > > > > > > > > > >> >> > > > > > > > On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P < > > > >> >> > > amudhan83 at gmail.com > > > > >> >> > > > > > > wrote: > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > Hi, > > > >> >> > > > > > > > > > > >> >> > > > > > > > I am testing bitrot feature in Gluster 3.8.3 with > > > disperse > > > >> >> EC > > > >> >> > > volume > > > >> >> > > > > 4+1. > > > >> >> > > > > > > > > > > >> >> > > > > > > > When i write single small file (< 10MB) after 2 > > seconds > > > i > > > >> >> can see > > > >> >> > > > > bitrot > > > >> >> > > > > > > > signature in bricks for the file, but when i write > > > multiple > > > >> >> files > > > >> >> > > > > with > > > >> >> > > > > > > > different size ( > 10MB) it takes long time (> 24hrs) > > > to see > > > >> >> > > bitrot > > > >> >> > > > > > > > signature in all the files. > > > >> >> > > > > > > > > > >> >> > > > > > > The default timeout for signing to happen is 120 > > > seconds. > > > >> >> So the > > > >> >> > > > > > > signing will happen > > > >> >> > > > > > > 120 secs after the last fd gets closed on that file. > > So > > > if > > > >> >> the > > > >> >> > > file > > > >> >> > > > > is > > > >> >> > > > > > > being written > > > >> >> > > > > > > continuously, it will not be signed until 120 secs > > after > > > >> >> it's > > > >> >> > > last > > > >> >> > > > > fd is > > > >> >> > > > > > > closed. > > > >> >> > > > > > > > > > > >> >> > > > > > > > My questions are. > > > >> >> > > > > > > > 1. I have enabled scrub schedule as hourly and > > throttle > > > as > > > >> >> > > normal, > > > >> >> > > > > does > > > >> >> > > > > > > this > > > >> >> > > > > > > > make any impact in delaying bitrot signature? > > > >> >> > > > > > > No. > > > >> >> > > > > > > > 2. other than "bitd.log" where else i can watch > > current > > > >> >> status of > > > >> >> > > > > bitrot, > > > >> >> > > > > > > > like number of files added for signature and file > > > status? > > > >> >> > > > > > > Signature will happen after 120 sec of last fd > > > closure, > > > >> >> as > > > >> >> > > said > > > >> >> > > > > above. > > > >> >> > > > > > > There is not status command which tracks the > > > signature > > > >> >> of the > > > >> >> > > > > files. > > > >> >> > > > > > > But there is bitrot status command which tracks > > the > > > >> >> number of > > > >> >> > > > > files > > > >> >> > > > > > > scrubbed. > > > >> >> > > > > > > > > > >> >> > > > > > > #gluster vol bitrot <volname> scrub status > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > 3. where i can confirm that all the files in the > > brick > > > are > > > >> >> bitrot > > > >> >> > > > > signed? > > > >> >> > > > > > > > > > >> >> > > > > > > As said, signing information of all the files is > > not > > > >> >> tracked. > > > >> >> > > > > > > > > > >> >> > > > > > > > 4. is there any file read size limit in bitrot? > > > >> >> > > > > > > > > > >> >> > > > > > > I didn't get. Could you please elaborate this ? > > > >> >> > > > > > > > > > >> >> > > > > > > > 5. options for tuning bitrot for faster signing of > > > files? > > > >> >> > > > > > > > > > >> >> > > > > > > Bitrot feature is mainly to detect silent > > corruption > > > >> >> > > (bitflips) of > > > >> >> > > > > > > files due to long > > > >> >> > > > > > > term storage. Hence the default is 120 sec of > > last fd > > > >> >> > > closure, the > > > >> >> > > > > > > signing happens. > > > >> >> > > > > > > But there is a tune able which can change the > > default > > > >> >> 120 sec > > > >> >> > > but > > > >> >> > > > > > > that's only for > > > >> >> > > > > > > testing purposes and we don't recommend it. > > > >> >> > > > > > > > > > >> >> > > > > > > gluster vol get master features.expiry-time > > > >> >> > > > > > > > > > >> >> > > > > > > For testing purposes, you can change this default > > and > > > >> >> test. > > > >> >> > > > > > > > > > > >> >> > > > > > > > Thanks > > > >> >> > > > > > > > Amudhan > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > _______________________________________________ > > > >> >> > > > > > > > Gluster-users mailing list > > > >> >> > > > > > > > Gluster-users at gluster.org > > > >> >> > > > > > > > http://www.gluster.org/ > > mailman/listinfo/gluster-users > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> > > > > >> > > > > >> > > > > > > > > > >
Hi Kotresh, I have raised bug. https://bugzilla.redhat.com/show_bug.cgi?id=1378466 Thanks Amudhan On Thu, Sep 22, 2016 at 2:45 PM, Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote:> Hi Amudhan, > > It's as of now, hard coded based on some testing results. That part is not > tune-able yet. > Only scrubber throttling is tune-able. As I have told you, because brick > process has > an open fd, bitrot signer process is not picking it up for scrubbing. > Please raise > a bug. We will take a look at it. > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "Amudhan P" <amudhan83 at gmail.com> > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > Cc: "Gluster Users" <gluster-users at gluster.org> > > Sent: Thursday, September 22, 2016 2:37:25 PM > > Subject: Re: 3.8.3 Bitrot signature process > > > > Hi Kotresh, > > > > its same behaviour in replicated volume also, file fd opens after 120 > > seconds in brick pid. > > > > for calculating signature for 100MB file it took 15m57s. > > > > > > How can i increase CPU usage?, in your earlier mail you have said "To > limit > > the usage of CPU, throttling is done using token bucket algorithm". > > any possibility of increasing bitrot hash calculation speed ?. > > > > > > Thanks, > > Amudhan > > > > > > On Thu, Sep 22, 2016 at 11:44 AM, Kotresh Hiremath Ravishankar < > > khiremat at redhat.com> wrote: > > > > > Hi Amudhan, > > > > > > Thanks for the confirmation. If that's the case please try with > dist-rep > > > volume, > > > and see if you are observing similar behavior. > > > > > > In any case please raise a bug for the same with your observations. We > > > will work > > > on it. > > > > > > Thanks and Regards, > > > Kotresh H R > > > > > > ----- Original Message ----- > > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > > Cc: "Gluster Users" <gluster-users at gluster.org> > > > > Sent: Thursday, September 22, 2016 11:25:28 AM > > > > Subject: Re: 3.8.3 Bitrot signature process > > > > > > > > Hi Kotresh, > > > > > > > > 2280 is a brick process, i have not tried with dist-rep volume? > > > > > > > > I have not seen any fd in bitd process in any of the node's and bitd > > > > process usage always 0% CPU and randomly it goes 0.3% CPU. > > > > > > > > > > > > > > > > Thanks, > > > > Amudhan > > > > > > > > On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar < > > > > khiremat at redhat.com> wrote: > > > > > Hi Amudhan, > > > > > > > > > > No, bitrot signer is a different process by itself and is not part > of > > > > brick process. > > > > > I believe the process 2280 is a brick process ? Did you check with > > > > dist-rep volume? > > > > > Is the same behavior being observed there as well? We need to > figure > > > out > > > > why brick > > > > > process is holding that fd for such a long time. > > > > > > > > > > Thanks and Regards, > > > > > Kotresh H R > > > > > > > > > > ----- Original Message ----- > > > > >> From: "Amudhan P" <amudhan83 at gmail.com> > > > > >> To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > > >> Sent: Wednesday, September 21, 2016 8:15:33 PM > > > > >> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process > > > > >> > > > > >> Hi Kotresh, > > > > >> > > > > >> As soon as fd closes from brick1 pid, i can see bitrot signature > for > > > the > > > > >> file in brick. > > > > >> > > > > >> So, it looks like fd opened by brick process to calculate > signature. > > > > >> > > > > >> output of the file: > > > > >> > > > > >> -rw-r--r-- 2 root root 250M Sep 21 18:32 > > > > >> /media/disk1/brick1/data/G/test59-bs10M-c100.nul > > > > >> > > > > >> getfattr: Removing leading '/' from absolute path names > > > > >> # file: media/disk1/brick1/data/G/test59-bs10M-c100.nul > > > > >> trusted.bit-rot.signature=0x010200000000000000e9474e4cc6 > > > > 73c0c227a6e807e04aa4ab1f88d3744243950a290869c53daa65df > > > > >> trusted.bit-rot.version=0x020000000000000057d6af3200012a13 > > > > >> trusted.ec.config=0x0000080501000200 > > > > >> trusted.ec.size=0x000000003e800000 > > > > >> trusted.ec.version=0x0000000000001f400000000000001f40 > > > > >> trusted.gfid=0x4c091145429448468fffe358482c63e1 > > > > >> > > > > >> stat /media/disk1/brick1/data/G/test59-bs10M-c100.nul > > > > >> File: ?/media/disk1/brick1/data/G/test59-bs10M-c100.nul? > > > > >> Size: 262144000 Blocks: 512000 IO Block: 4096 > regular > > > file > > > > >> Device: 811h/2065d Inode: 402653311 Links: 2 > > > > >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ > > > root) > > > > >> Access: 2016-09-21 18:34:43.722712751 +0530 > > > > >> Modify: 2016-09-21 18:32:41.650712946 +0530 > > > > >> Change: 2016-09-21 19:14:41.698708914 +0530 > > > > >> Birth: - > > > > >> > > > > >> > > > > >> In other 2 bricks in same set, still signature is not updated for > the > > > > same > > > > >> file. > > > > >> > > > > >> > > > > >> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P <amudhan83 at gmail.com> > > > wrote: > > > > >> > > > > >> > Hi Kotresh, > > > > >> > > > > > >> > I am very sure, No read was going on from mount point. > > > > >> > > > > > >> > Again i did same test but after writing data to mount point. I > have > > > > >> > unmounted mount point. > > > > >> > > > > > >> > after 120 seconds i am seeing this file fd entry in brick 1 pid > > > > >> > > > > > >> > getfattr -m. -e hex -d test59-bs10 > > > > >> > # file: test59-bs10M-c100.nul > > > > >> > trusted.bit-rot.version=0x020000000000000057bed574000ed534 > > > > >> > trusted.ec.config=0x0000080501000200 > > > > >> > trusted.ec.size=0x000000003e800000 > > > > >> > trusted.ec.version=0x0000000000001f400000000000001f40 > > > > >> > trusted.gfid=0x4c091145429448468fffe358482c63e1 > > > > >> > > > > > >> > > > > > >> > ls -l /proc/2280/fd > > > > >> > lr-x------ 1 root root 64 Sep 21 13:08 19 -> > /media/disk1/brick1/. > > > > >> > glusterfs/4c/09/4c091145-4294-4846-8fff-e358482c63e1 > > > > >> > > > > > >> > Volume is a EC - 4+1 > > > > >> > > > > > >> > On Wed, Sep 21, 2016 at 6:17 PM, Kotresh Hiremath Ravishankar < > > > > >> > khiremat at redhat.com> wrote: > > > > >> > > > > > >> >> Hi Amudhan, > > > > >> >> > > > > >> >> If you see the ls output, some process has a fd opened in the > > > backend. > > > > >> >> That is the reason bitrot is not considering for the signing. > > > > >> >> Could you please observe, after 120 secs of closure of > > > > >> >> "/media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > > >> >> 85bf-f21f99fd8764" > > > > >> >> the signing happens. If so we need to figure out who holds > this fd > > > for > > > > >> >> such a long time. > > > > >> >> And also we need to figure is this issue specific to EC volume. > > > > >> >> > > > > >> >> Thanks and Regards, > > > > >> >> Kotresh H R > > > > >> >> > > > > >> >> ----- Original Message ----- > > > > >> >> > From: "Amudhan P" <amudhan83 at gmail.com> > > > > >> >> > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > > >> >> > Cc: "Gluster Users" <gluster-users at gluster.org> > > > > >> >> > Sent: Wednesday, September 21, 2016 4:56:40 PM > > > > >> >> > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process > > > > >> >> > > > > > >> >> > Hi Kotresh, > > > > >> >> > > > > > >> >> > > > > > >> >> > Writing new file. > > > > >> >> > > > > > >> >> > getfattr -m. -e hex -d /media/disk2/brick2/data/G/ > > > > test58-bs10M-c100.nul > > > > >> >> > getfattr: Removing leading '/' from absolute path names > > > > >> >> > # file: media/disk2/brick2/data/G/test58-bs10M-c100.nul > > > > >> >> > trusted.bit-rot.version=0x020000000000000057da8b23000b120e > > > > >> >> > trusted.ec.config=0x0000080501000200 > > > > >> >> > trusted.ec.size=0x000000003e800000 > > > > >> >> > trusted.ec.version=0x0000000000001f400000000000001f40 > > > > >> >> > trusted.gfid=0x6e7c49e6094e443585bff21f99fd8764 > > > > >> >> > > > > > >> >> > > > > > >> >> > Running ls -l in brick 2 pid > > > > >> >> > > > > > >> >> > ls -l /proc/30162/fd > > > > >> >> > > > > > >> >> > lr-x------ 1 root root 64 Sep 21 16:22 59 -> > > > > >> >> > /media/disk2/brick2/.glusterfs/quanrantine > > > > >> >> > lrwx------ 1 root root 64 Sep 21 16:22 6 -> > > > > >> >> > /var/lib/glusterd/vols/glsvol1/run/10.1.2.2-media- > > > disk2-brick2.pid > > > > >> >> > lr-x------ 1 root root 64 Sep 21 16:25 60 -> > > > > >> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > > >> >> 85bf-f21f99fd8764 > > > > >> >> > lr-x------ 1 root root 64 Sep 21 16:22 61 -> > > > > >> >> > /media/disk2/brick2/.glusterfs/quanrantine > > > > >> >> > > > > > >> >> > > > > > >> >> > find /media/disk2/ -samefile > > > > >> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > > >> >> 85bf-f21f99fd8764 > > > > >> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435- > > > > >> >> 85bf-f21f99fd8764 > > > > >> >> > /media/disk2/brick2/data/G/test58-bs10M-c100.nul > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > On Wed, Sep 21, 2016 at 3:28 PM, Kotresh Hiremath > Ravishankar < > > > > >> >> > khiremat at redhat.com> wrote: > > > > >> >> > > > > > >> >> > > Hi Amudhan, > > > > >> >> > > > > > > >> >> > > Don't grep for the filename, glusterfs maintains hardlink > in > > > > >> >> .glusterfs > > > > >> >> > > directory > > > > >> >> > > for each file. Just check 'ls -l /proc/<respective brick > > > pid>/fd' > > > > for > > > > >> >> any > > > > >> >> > > fds opened > > > > >> >> > > for a file in .glusterfs and check if it's the same file. > > > > >> >> > > > > > > >> >> > > Thanks and Regards, > > > > >> >> > > Kotresh H R > > > > >> >> > > > > > > >> >> > > ----- Original Message ----- > > > > >> >> > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > > >> >> > > > To: "Kotresh Hiremath Ravishankar" <khiremat at redhat.com> > > > > >> >> > > > Cc: "Gluster Users" <gluster-users at gluster.org> > > > > >> >> > > > Sent: Wednesday, September 21, 2016 1:33:10 PM > > > > >> >> > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature > process > > > > >> >> > > > > > > > >> >> > > > Hi Kotresh, > > > > >> >> > > > > > > > >> >> > > > i have used below command to verify any open fd for file. > > > > >> >> > > > > > > > >> >> > > > "ls -l /proc/*/fd | grep filename". > > > > >> >> > > > > > > > >> >> > > > as soon as write completes there no open fd's, if there > is > > > any > > > > >> >> alternate > > > > >> >> > > > option. please let me know will also try that. > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > Also, below is my scrub status in my test setup. number > of > > > > skipped > > > > >> >> files > > > > >> >> > > > slow reducing day by day. I think files are skipped due > to > > > > bitrot > > > > >> >> > > signature > > > > >> >> > > > process is not completed yet. > > > > >> >> > > > > > > > >> >> > > > where can i see scrub skipped files? > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > Volume name : glsvol1 > > > > >> >> > > > > > > > >> >> > > > State of scrub: Active (Idle) > > > > >> >> > > > > > > > >> >> > > > Scrub impact: normal > > > > >> >> > > > > > > > >> >> > > > Scrub frequency: daily > > > > >> >> > > > > > > > >> >> > > > Bitrot error log location: /var/log/glusterfs/bitd.log > > > > >> >> > > > > > > > >> >> > > > Scrubber error log location: /var/log/glusterfs/scrub.log > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > =============================> ==========================> > > > >> >> > > > > > > > >> >> > > > Node: localhost > > > > >> >> > > > > > > > >> >> > > > Number of Scrubbed files: 1644 > > > > >> >> > > > > > > > >> >> > > > Number of Skipped files: 1001 > > > > >> >> > > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 11:59:58 > > > > >> >> > > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:26 > > > > >> >> > > > > > > > >> >> > > > Error count: 0 > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > =============================> ==========================> > > > >> >> > > > > > > > >> >> > > > Node: 10.1.2.3 > > > > >> >> > > > > > > > >> >> > > > Number of Scrubbed files: 1644 > > > > >> >> > > > > > > > >> >> > > > Number of Skipped files: 1001 > > > > >> >> > > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 10:50:00 > > > > >> >> > > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:38:17 > > > > >> >> > > > > > > > >> >> > > > Error count: 0 > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > =============================> ==========================> > > > >> >> > > > > > > > >> >> > > > Node: 10.1.2.4 > > > > >> >> > > > > > > > >> >> > > > Number of Scrubbed files: 981 > > > > >> >> > > > > > > > >> >> > > > Number of Skipped files: 1664 > > > > >> >> > > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 12:38:01 > > > > >> >> > > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:35:19 > > > > >> >> > > > > > > > >> >> > > > Error count: 0 > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > =============================> ==========================> > > > >> >> > > > > > > > >> >> > > > Node: 10.1.2.1 > > > > >> >> > > > > > > > >> >> > > > Number of Scrubbed files: 1263 > > > > >> >> > > > > > > > >> >> > > > Number of Skipped files: 1382 > > > > >> >> > > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 11:57:21 > > > > >> >> > > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:37:17 > > > > >> >> > > > > > > > >> >> > > > Error count: 0 > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > =============================> ==========================> > > > >> >> > > > > > > > >> >> > > > Node: 10.1.2.2 > > > > >> >> > > > > > > > >> >> > > > Number of Scrubbed files: 1644 > > > > >> >> > > > > > > > >> >> > > > Number of Skipped files: 1001 > > > > >> >> > > > > > > > >> >> > > > Last completed scrub time: 2016-09-20 11:59:25 > > > > >> >> > > > > > > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:18 > > > > >> >> > > > > > > > >> >> > > > Error count: 0 > > > > >> >> > > > > > > > >> >> > > > =============================> ==========================> > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > Thanks > > > > >> >> > > > Amudhan > > > > >> >> > > > > > > > >> >> > > > > > > > >> >> > > > On Wed, Sep 21, 2016 at 11:45 AM, Kotresh Hiremath > > > Ravishankar < > > > > >> >> > > > khiremat at redhat.com> wrote: > > > > >> >> > > > > > > > >> >> > > > > Hi Amudhan, > > > > >> >> > > > > > > > > >> >> > > > > I don't think it's the limitation with read data from > the > > > > brick. > > > > >> >> > > > > To limit the usage of CPU, throttling is done using > token > > > > bucket > > > > >> >> > > > > algorithm. The log message showed is related to it. But > > > even > > > > then > > > > >> >> > > > > I think it should not take 12 minutes for check-sum > > > > calculation > > > > >> >> unless > > > > >> >> > > > > there is an fd open (might be internal). Could you > please > > > > cross > > > > >> >> verify > > > > >> >> > > > > if there are any fd opened on that file by looking into > > > > /proc? I > > > > >> >> will > > > > >> >> > > > > also test it out in the mean time and get back to you. > > > > >> >> > > > > > > > > >> >> > > > > Thanks and Regards, > > > > >> >> > > > > Kotresh H R > > > > >> >> > > > > > > > > >> >> > > > > ----- Original Message ----- > > > > >> >> > > > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > > >> >> > > > > > To: "Kotresh Hiremath Ravishankar" < > khiremat at redhat.com> > > > > >> >> > > > > > Cc: "Gluster Users" <gluster-users at gluster.org> > > > > >> >> > > > > > Sent: Tuesday, September 20, 2016 3:19:28 PM > > > > >> >> > > > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature > > > process > > > > >> >> > > > > > > > > > >> >> > > > > > Hi Kotresh, > > > > >> >> > > > > > > > > > >> >> > > > > > Please correct me if i am wrong, Once a file write > > > completes > > > > >> >> and as > > > > >> >> > > soon > > > > >> >> > > > > as > > > > >> >> > > > > > closes fds, bitrot waits for 120 seconds and starts > > > hashing > > > > and > > > > >> >> > > update > > > > >> >> > > > > > signature for the file in brick. > > > > >> >> > > > > > > > > > >> >> > > > > > But, what i am feeling that bitrot takes too much of > > > time to > > > > >> >> complete > > > > >> >> > > > > > hashing. > > > > >> >> > > > > > > > > > >> >> > > > > > below is test result i would like to share. > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > writing data in below path using dd : > > > > >> >> > > > > > > > > > >> >> > > > > > /mnt/gluster/data/G (mount point) > > > > >> >> > > > > > -rw-r--r-- 1 root root 10M Sep 20 12:19 > > > test53-bs10M-c1.nul > > > > >> >> > > > > > -rw-r--r-- 1 root root 100M Sep 20 12:19 > > > > test54-bs10M-c10.nul > > > > >> >> > > > > > > > > > >> >> > > > > > No any other write or read process is going on. > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > Checking file data in one of the brick. > > > > >> >> > > > > > > > > > >> >> > > > > > -rw-r--r-- 2 root root 2.5M Sep 20 12:23 > > > test53-bs10M-c1.nul > > > > >> >> > > > > > -rw-r--r-- 2 root root 25M Sep 20 12:23 > > > > test54-bs10M-c10.nul > > > > >> >> > > > > > > > > > >> >> > > > > > file's stat and getfattr info from brick, after write > > > > process > > > > >> >> > > completed. > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > > >> >> test53-bs10M-c1.nul > > > > >> >> > > > > > File: ?test53-bs10M-c1.nul? > > > > >> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: > 4096 > > > > >> >> regular > > > > >> >> > > file > > > > >> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2 > > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) > Gid: ( > > > > 0/ > > > > >> >> > > root) > > > > >> >> > > > > > Access: 2016-09-20 12:23:28.798886647 +0530 > > > > >> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530 > > > > >> >> > > > > > Change: 2016-09-20 12:23:28.998886646 +0530 > > > > >> >> > > > > > Birth: - > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > > >> >> test54-bs10M-c10.nul > > > > >> >> > > > > > File: ?test54-bs10M-c10.nul? > > > > >> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: > 4096 > > > > >> >> regular > > > > >> >> > > file > > > > >> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2 > > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) > Gid: ( > > > > 0/ > > > > >> >> > > root) > > > > >> >> > > > > > Access: 2016-09-20 12:23:42.902886624 +0530 > > > > >> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530 > > > > >> >> > > > > > Change: 2016-09-20 12:23:44.378886622 +0530 > > > > >> >> > > > > > Birth: - > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo > getfattr > > > -m. > > > > -e > > > > >> >> hex -d > > > > >> >> > > > > > test53-bs10M-c1.nul > > > > >> >> > > > > > # file: test53-bs10M-c1.nul > > > > >> >> > > > > > trusted.bit-rot.version> 0x020000000000000057daa7b50002 > > > e5b4 > > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > > >> >> > > > > > trusted.ec.size=0x0000000000a00000 > > > > >> >> > > > > > trusted.ec.version=0x0000000000000050000000000000 > 0050 > > > > >> >> > > > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99 > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo > getfattr > > > -m. > > > > -e > > > > >> >> hex -d > > > > >> >> > > > > > test54-bs10M-c10.nul > > > > >> >> > > > > > # file: test54-bs10M-c10.nul > > > > >> >> > > > > > trusted.bit-rot.version> 0x020000000000000057daa7b50002 > > > e5b4 > > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > > >> >> > > > > > trusted.ec.size=0x0000000006400000 > > > > >> >> > > > > > trusted.ec.version=0x0000000000000320000000000000 > 0320 > > > > >> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5 > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > file's stat and getfattr info from brick, after > bitrot > > > > signature > > > > >> >> > > updated. > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > > >> >> test53-bs10M-c1.nul > > > > >> >> > > > > > File: ?test53-bs10M-c1.nul? > > > > >> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: > 4096 > > > > >> >> regular > > > > >> >> > > file > > > > >> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2 > > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) > Gid: ( > > > > 0/ > > > > >> >> > > root) > > > > >> >> > > > > > Access: 2016-09-20 12:25:31.494886450 +0530 > > > > >> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530 > > > > >> >> > > > > > Change: 2016-09-20 12:27:00.994886307 +0530 > > > > >> >> > > > > > Birth: - > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo > getfattr > > > -m. > > > > -e > > > > >> >> hex -d > > > > >> >> > > > > > test53-bs10M-c1.nul > > > > >> >> > > > > > # file: test53-bs10M-c1.nul > > > > >> >> > > > > > trusted.bit-rot.signature> 0x0102000000000000006de7493c5c > > > > >> >> > > > > 90f643357c268fbaaf461c1567e0334e4948023ce17268403aa37a > > > > >> >> > > > > > trusted.bit-rot.version> 0x020000000000000057daa7b50002 > > > e5b4 > > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > > >> >> > > > > > trusted.ec.size=0x0000000000a00000 > > > > >> >> > > > > > trusted.ec.version=0x0000000000000050000000000000 > 0050 > > > > >> >> > > > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99 > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat > > > > >> >> test54-bs10M-c10.nul > > > > >> >> > > > > > File: ?test54-bs10M-c10.nul? > > > > >> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: > 4096 > > > > >> >> regular > > > > >> >> > > file > > > > >> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2 > > > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) > Gid: ( > > > > 0/ > > > > >> >> > > root) > > > > >> >> > > > > > Access: 2016-09-20 12:25:47.510886425 +0530 > > > > >> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530 > > > > >> >> > > > > > Change: 2016-09-20 12:38:05.954885243 +0530 > > > > >> >> > > > > > Birth: - > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo > getfattr > > > -m. > > > > -e > > > > >> >> hex -d > > > > >> >> > > > > > test54-bs10M-c10.nul > > > > >> >> > > > > > # file: test54-bs10M-c10.nul > > > > >> >> > > > > > trusted.bit-rot.signature> 0x010200000000000000394c345f0b > > > > >> >> > > > > 0c63ee652627a62eed069244d35c4d5134e4f07d4eabb51afda47e > > > > >> >> > > > > > trusted.bit-rot.version> 0x020000000000000057daa7b50002 > > > e5b4 > > > > >> >> > > > > > trusted.ec.config=0x0000080501000200 > > > > >> >> > > > > > trusted.ec.size=0x0000000006400000 > > > > >> >> > > > > > trusted.ec.version=0x0000000000000320000000000000 > 0320 > > > > >> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5 > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > (Actual time taken for reading file from brick for > > > md5sum) > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum > > > > >> >> > > test53-bs10M-c1.nul > > > > >> >> > > > > > 8354dcaa18a1ecb52d0895bf00888c44 > test53-bs10M-c1.nul > > > > >> >> > > > > > > > > > >> >> > > > > > real 0m0.045s > > > > >> >> > > > > > user 0m0.007s > > > > >> >> > > > > > sys 0m0.003s > > > > >> >> > > > > > > > > > >> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum > > > > >> >> > > > > test54-bs10M-c10.nul > > > > >> >> > > > > > bed3c0a4a1407f584989b4009e9ce33f > test54-bs10M-c10.nul > > > > >> >> > > > > > > > > > >> >> > > > > > real 0m0.166s > > > > >> >> > > > > > user 0m0.062s > > > > >> >> > > > > > sys 0m0.011s > > > > >> >> > > > > > > > > > >> >> > > > > > As you can see that 'test54-bs10M-c10.nul' file took > > > around > > > > 12 > > > > >> >> > > minutes to > > > > >> >> > > > > > update bitort signature (pls refer stat output for > the > > > > file). > > > > >> >> > > > > > > > > > >> >> > > > > > what would be the cause for such a slow read?. Any > > > > limitation > > > > >> >> in read > > > > >> >> > > > > data > > > > >> >> > > > > > from brick? > > > > >> >> > > > > > > > > > >> >> > > > > > Also, i am seeing this line bitd.log, what does this > > > mean? > > > > >> >> > > > > > [bit-rot.c:1784:br_rate_limit_signer] > > > 0-glsvol1-bit-rot-0: > > > > >> >> [Rate > > > > >> >> > > Limit > > > > >> >> > > > > > Info] "tokens/sec (rate): 131072, maxlimit: 524288 > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > Thanks > > > > >> >> > > > > > Amudhan P > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > On Mon, Sep 19, 2016 at 1:00 PM, Kotresh Hiremath > > > > Ravishankar < > > > > >> >> > > > > > khiremat at redhat.com> wrote: > > > > >> >> > > > > > > > > > >> >> > > > > > > Hi Amudhan, > > > > >> >> > > > > > > > > > > >> >> > > > > > > Thanks for testing out the bitrot feature and > sorry for > > > > the > > > > >> >> delayed > > > > >> >> > > > > > > response. > > > > >> >> > > > > > > Please find the answers inline. > > > > >> >> > > > > > > > > > > >> >> > > > > > > Thanks and Regards, > > > > >> >> > > > > > > Kotresh H R > > > > >> >> > > > > > > > > > > >> >> > > > > > > ----- Original Message ----- > > > > >> >> > > > > > > > From: "Amudhan P" <amudhan83 at gmail.com> > > > > >> >> > > > > > > > To: "Gluster Users" <gluster-users at gluster.org> > > > > >> >> > > > > > > > Sent: Friday, September 16, 2016 4:14:10 PM > > > > >> >> > > > > > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot > signature > > > > process > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > Hi, > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > Can anyone reply to this mail. > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P < > > > > >> >> > > amudhan83 at gmail.com > > > > > >> >> > > > > > > wrote: > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > Hi, > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > I am testing bitrot feature in Gluster 3.8.3 with > > > > disperse > > > > >> >> EC > > > > >> >> > > volume > > > > >> >> > > > > 4+1. > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > When i write single small file (< 10MB) after 2 > > > seconds > > > > i > > > > >> >> can see > > > > >> >> > > > > bitrot > > > > >> >> > > > > > > > signature in bricks for the file, but when i > write > > > > multiple > > > > >> >> files > > > > >> >> > > > > with > > > > >> >> > > > > > > > different size ( > 10MB) it takes long time (> > 24hrs) > > > > to see > > > > >> >> > > bitrot > > > > >> >> > > > > > > > signature in all the files. > > > > >> >> > > > > > > > > > > >> >> > > > > > > The default timeout for signing to happen is 120 > > > > seconds. > > > > >> >> So the > > > > >> >> > > > > > > signing will happen > > > > >> >> > > > > > > 120 secs after the last fd gets closed on that > file. > > > So > > > > if > > > > >> >> the > > > > >> >> > > file > > > > >> >> > > > > is > > > > >> >> > > > > > > being written > > > > >> >> > > > > > > continuously, it will not be signed until 120 > secs > > > after > > > > >> >> it's > > > > >> >> > > last > > > > >> >> > > > > fd is > > > > >> >> > > > > > > closed. > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > My questions are. > > > > >> >> > > > > > > > 1. I have enabled scrub schedule as hourly and > > > throttle > > > > as > > > > >> >> > > normal, > > > > >> >> > > > > does > > > > >> >> > > > > > > this > > > > >> >> > > > > > > > make any impact in delaying bitrot signature? > > > > >> >> > > > > > > No. > > > > >> >> > > > > > > > 2. other than "bitd.log" where else i can watch > > > current > > > > >> >> status of > > > > >> >> > > > > bitrot, > > > > >> >> > > > > > > > like number of files added for signature and file > > > > status? > > > > >> >> > > > > > > Signature will happen after 120 sec of last fd > > > > closure, > > > > >> >> as > > > > >> >> > > said > > > > >> >> > > > > above. > > > > >> >> > > > > > > There is not status command which tracks the > > > > signature > > > > >> >> of the > > > > >> >> > > > > files. > > > > >> >> > > > > > > But there is bitrot status command which > tracks > > > the > > > > >> >> number of > > > > >> >> > > > > files > > > > >> >> > > > > > > scrubbed. > > > > >> >> > > > > > > > > > > >> >> > > > > > > #gluster vol bitrot <volname> scrub status > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > 3. where i can confirm that all the files in the > > > brick > > > > are > > > > >> >> bitrot > > > > >> >> > > > > signed? > > > > >> >> > > > > > > > > > > >> >> > > > > > > As said, signing information of all the files > is > > > not > > > > >> >> tracked. > > > > >> >> > > > > > > > > > > >> >> > > > > > > > 4. is there any file read size limit in bitrot? > > > > >> >> > > > > > > > > > > >> >> > > > > > > I didn't get. Could you please elaborate this > ? > > > > >> >> > > > > > > > > > > >> >> > > > > > > > 5. options for tuning bitrot for faster signing > of > > > > files? > > > > >> >> > > > > > > > > > > >> >> > > > > > > Bitrot feature is mainly to detect silent > > > corruption > > > > >> >> > > (bitflips) of > > > > >> >> > > > > > > files due to long > > > > >> >> > > > > > > term storage. Hence the default is 120 sec of > > > last fd > > > > >> >> > > closure, the > > > > >> >> > > > > > > signing happens. > > > > >> >> > > > > > > But there is a tune able which can change the > > > default > > > > >> >> 120 sec > > > > >> >> > > but > > > > >> >> > > > > > > that's only for > > > > >> >> > > > > > > testing purposes and we don't recommend it. > > > > >> >> > > > > > > > > > > >> >> > > > > > > gluster vol get master features.expiry-time > > > > >> >> > > > > > > > > > > >> >> > > > > > > For testing purposes, you can change this > default > > > and > > > > >> >> test. > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > Thanks > > > > >> >> > > > > > > > Amudhan > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > >> >> > > > > > > > _______________________________________________ > > > > >> >> > > > > > > > Gluster-users mailing list > > > > >> >> > > > > > > > Gluster-users at gluster.org > > > > >> >> > > > > > > > http://www.gluster.org/ > > > mailman/listinfo/gluster-users > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> > > > > > >> > > > > > >> > > > > > > > > > > > > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160922/e34cdd5f/attachment.html>