Ove K. Pettersen
2012-Feb-05 21:12 UTC
[Gluster-users] Would difference in size (and content) of a file on replicated bricks be healed?
Hi... Started playing with gluster. And the heal functions is my "target" for testing. Short description of my test ---------------------------- * 4 replicas on single machine * glusterfs mounted locally * Create file on glusterfs-mounted directory: date >data.txt * Append to file on one of the bricks: hostname >>data.txt * Trigger a self-heal with: stat data.txt => Modified file on the brick is still different from the three other bricks. Q1: Is the modified file supposed to get corrected in my case? Q2: Is my test-case invalid for what gluster is supposed to handle? I would be very happy to receive an opinion on this. All the details are put at the bottom Regards, Ove Details and logs ---------------- My environment * single machine with OpenSuse 12.1 * gluster 3.3beta2 * xfs is used for the brick-fs (each one to a separate disk) Volume Name: d1 Type: Replicate Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: glusterdev:/b1 Brick2: glusterdev:/b2 Brick3: glusterdev:/b3 Brick4: glusterdev:/b4 Options Reconfigured: diagnostics.brick-log-level: DEBUG diagnostics.client-log-level: DEBUG * My glustervolume is mountet as glusterfs on /d1 glusterdev:/d1 52403200 33024 52370176 1% /d1 1) I put a file on the glusterfs date >/d1/data.txt 2) Checking the storage on the bricks: ls -l /b1 /b2 /b3 /b4 /b1: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt /b2: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt /b3: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt /b4: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt Everything is OK. 3) Then append a line on to /b1/data.txt hostname >>/b1/data.txt By just appending, I keep the inode and all the extended attributes (checked with attr -l on the /b1/data.txt file). 4) Check again: ls -l /b1 /b2 /b3 /b4 /b1: total 8 -rw-r--r-- 1 root root 40 Feb 5 21:40 data.txt /b2: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt /b3: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt /b4: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt Now we have a mismatch between the bricks. Both in contents and also the size of the file. 4) Trigger a self-heal by using stat cd /d1 ; stat data.txt File: `data.txt' Size: 29 Blocks: 16 IO Block: 131072 regular file Device: 1fh/31d Inode: 18446744070822815228 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2012-02-05 21:40:32.382157697 +0100 Modify: 2012-02-05 21:40:32.388157658 +0100 Change: 2012-02-05 21:40:32.389157657 +0100 Birth: - 5) stat gives the correct size of 29 which was the size of the file stored on the glusterfs. 6) DEBUG-log from the client when running this stat-command below. 7) Check again the bricks: ls -l /b1 /b2 /b3 /b4 /b1: total 8 -rw-r--r-- 1 root root 40 Feb 5 21:40 data.txt /b2: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt /b3: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt /b4: total 8 -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt 8) Brick /b1 still has a bad copy of the file. In the logs it looks like the difference in size is detected. It also starts some self-healing actions. But then for some reason finds out no healing is needed. Marked a few lines with <<<<< ???? which looks suspect. ===============CUT Client Log [2012-02-05 21:41:01.206102] D [client-handshake.c:179:client_start_ping] 0-d1-client-0: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:01.206201] D [client-handshake.c:179:client_start_ping] 0-d1-client-1: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:01.206224] D [client-handshake.c:179:client_start_ping] 0-d1-client-2: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:01.206241] D [client-handshake.c:179:client_start_ping] 0-d1-client-3: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:03.279124] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279182] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279202] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279218] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279232] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.279251] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.279266] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.279280] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.279294] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 [2012-02-05 21:41:03.279309] D [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] 0-d1-replicate-0: returning read_child: 1 [2012-02-05 21:41:03.279324] D [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: Source selected as 1 for / [2012-02-05 21:41:03.279341] D [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: Building lookup response from 1 [2012-02-05 21:41:03.279561] D [afr-common.c:1945:afr_lookup] 0-d1-replicate-0: failed to get the gfid from dict [2012-02-05 21:41:03.280265] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280299] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280317] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280332] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280346] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.280377] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.280392] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.280405] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.280419] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 [2012-02-05 21:41:03.280433] D [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] 0-d1-replicate-0: returning read_child: 2 [2012-02-05 21:41:03.280507] D [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: Source selected as 2 for /data.txt [2012-02-05 21:41:03.280530] D [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: Building lookup response from 2 [2012-02-05 21:41:03.280548] I [afr-common.c:1076:afr_detect_self_heal_by_iatt] 0-d1-replicate-0: size differs for /data.txt [2012-02-05 21:41:03.280576] I [afr-common.c:1225:afr_launch_self_heal] 0-d1-replicate-0: background data self-heal triggered. path: /data.txt [2012-02-05 21:41:03.280612] D [afr-self-heal-metadata.c:78:afr_sh_metadata_done] 0-d1-replicate-0: proceeding to data check on /data.txt [2012-02-05 21:41:03.281307] D [afr-lk-common.c:1397:afr_nonblocking_inodelk] 0-d1-replicate-0: attempting data lock range 0 0 by 140191242417704 [2012-02-05 21:41:03.281734] D [afr-self-heal-data.c:1136:afr_sh_data_post_nonblocking_inodelk_cbk] 0-d1-replicate-0: Non Blocking data inodelks done for /data.txt by 140191242417704. Proceeding to self-heal [2012-02-05 21:41:03.282815] D [afr-self-heal-data.c:650:afr_sh_data_fix] 0-d1-replicate-0: Pending matrix for: 140191242417704 [2012-02-05 21:41:03.282896] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.282940] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.282978] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.283015] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.283050] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.283095] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.283145] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.283191] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.283229] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 <<<<<< ????? [2012-02-05 21:41:03.283264] D [afr-self-heal-data.c:657:afr_sh_data_fix] 0-d1-replicate-0: No self-heal needed for /data.txt <<<<<< ????? [2012-02-05 21:41:03.283306] D [afr-self-heal-data.c:320:afr_sh_data_finish] 0-d1-replicate-0: finishing data selfheal of /data.txt [2012-02-05 21:41:03.283344] D [afr-lk-common.c:410:transaction_lk_op] 0-d1-replicate-0: lk op is for a self heal [2012-02-05 21:41:03.283379] D [afr-lk-common.c:597:afr_unlock_inodelk] 0-d1-replicate-0: attempting data unlock range 0 0 by 140191242417704 [2012-02-05 21:41:03.283879] I [afr-self-heal-common.c:2022:afr_self_heal_completion_cbk] 0-d1-replicate-0: background data self-heal completed on /data.txt [2012-02-05 21:41:03.283960] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284004] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284042] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284098] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284136] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.284173] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.284216] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.284253] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.284290] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 [2012-02-05 21:41:03.284325] D [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] 0-d1-replicate-0: returning read_child: 2 [2012-02-05 21:41:03.284361] D [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: Source selected as 2 for /data.txt [2012-02-05 21:41:03.284398] D [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: Building lookup response from 2 [2012-02-05 21:41:03.285622] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-2: Number of locks cleared=0 [2012-02-05 21:41:03.285690] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-1: Number of locks cleared=0 [2012-02-05 21:41:03.285779] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-3: Number of locks cleared=0 [2012-02-05 21:41:03.285880] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-0: Number of locks cleared=0 ===============CUT Client Log
David Coulson
2012-Feb-05 21:20 UTC
[Gluster-users] Would difference in size (and content) of a file on replicated bricks be healed?
On 2/5/12 4:12 PM, Ove K. Pettersen wrote:> Hi... > > Started playing with gluster. And the heal functions is my "target" > for testing. > > Short description of my test > ---------------------------- > * 4 replicas on single machine > * glusterfs mounted locally > * Create file on glusterfs-mounted directory: date >data.txtUsually it is bad to write directly to the brick.> * Append to file on one of the bricks: hostname >>data.txt > * Trigger a self-heal with: stat data.txt > => Modified file on the brick is still different from the three other > bricks.What a surprise :-)> > Q1: Is the modified file supposed to get corrected in my case? > > Q2: Is my test-case invalid for what gluster is supposed to handle?Probably, since the file has no gfid attributes, gluster has no clue what to do with it. If you delete it, it should self-heal.> > I would be very happy to receive an opinion on this. > > All the details are put at the bottom > > > Regards, > > Ove > > > > > > > Details and logs > ---------------- > My environment > * single machine with OpenSuse 12.1 > * gluster 3.3beta2 > * xfs is used for the brick-fs (each one to a separate disk) > > Volume Name: d1 > Type: Replicate > Status: Started > Number of Bricks: 4 > Transport-type: tcp > Bricks: > Brick1: glusterdev:/b1 > Brick2: glusterdev:/b2 > Brick3: glusterdev:/b3 > Brick4: glusterdev:/b4 > Options Reconfigured: > diagnostics.brick-log-level: DEBUG > diagnostics.client-log-level: DEBUG > > * My glustervolume is mountet as glusterfs on /d1 > glusterdev:/d1 52403200 33024 52370176 1% /d1 > > > > 1) I put a file on the glusterfs > date >/d1/data.txt > > 2) Checking the storage on the bricks: ls -l /b1 /b2 /b3 /b4 > /b1: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > /b2: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > /b3: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > /b4: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > > Everything is OK. > > 3) Then append a line on to /b1/data.txt > hostname >>/b1/data.txt > > By just appending, I keep the inode and all the extended attributes > (checked with attr -l on the /b1/data.txt file). > > > 4) Check again: ls -l /b1 /b2 /b3 /b4 > /b1: > total 8 > -rw-r--r-- 1 root root 40 Feb 5 21:40 data.txt > > /b2: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > /b3: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > /b4: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > Now we have a mismatch between the bricks. Both in contents and also > the size of the file. > > 4) Trigger a self-heal by using stat > cd /d1 ; stat data.txt > > File: `data.txt' > Size: 29 Blocks: 16 IO Block: 131072 regular file > Device: 1fh/31d Inode: 18446744070822815228 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > Access: 2012-02-05 21:40:32.382157697 +0100 > Modify: 2012-02-05 21:40:32.388157658 +0100 > Change: 2012-02-05 21:40:32.389157657 +0100 > Birth: - > > > 5) stat gives the correct size of 29 which was the size of the file > stored on the glusterfs. > 6) DEBUG-log from the client when running this stat-command below. > > 7) Check again the bricks: ls -l /b1 /b2 /b3 /b4 > /b1: > total 8 > -rw-r--r-- 1 root root 40 Feb 5 21:40 data.txt > > /b2: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > /b3: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > /b4: > total 8 > -rw-r--r-- 1 root root 29 Feb 5 21:40 data.txt > > > 8) Brick /b1 still has a bad copy of the file. > > > > In the logs it looks like the difference in size is detected. > It also starts some self-healing actions. > But then for some reason finds out no healing is needed. > Marked a few lines with <<<<< ???? which looks suspect. > > ===============CUT Client Log > [2012-02-05 21:41:01.206102] D > [client-handshake.c:179:client_start_ping] 0-d1-client-0: returning as > transport is already disconnected OR there are no frames (0 || 0) > [2012-02-05 21:41:01.206201] D > [client-handshake.c:179:client_start_ping] 0-d1-client-1: returning as > transport is already disconnected OR there are no frames (0 || 0) > [2012-02-05 21:41:01.206224] D > [client-handshake.c:179:client_start_ping] 0-d1-client-2: returning as > transport is already disconnected OR there are no frames (0 || 0) > [2012-02-05 21:41:01.206241] D > [client-handshake.c:179:client_start_ping] 0-d1-client-3: returning as > transport is already disconnected OR there are no frames (0 || 0) > [2012-02-05 21:41:03.279124] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.279182] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.279202] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.279218] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.279232] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 0 character innocent > [2012-02-05 21:41:03.279251] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 1 character innocent > [2012-02-05 21:41:03.279266] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 2 character innocent > [2012-02-05 21:41:03.279280] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 3 character innocent > [2012-02-05 21:41:03.279294] D > [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number > of sources: 0 > [2012-02-05 21:41:03.279309] D > [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] > 0-d1-replicate-0: returning read_child: 1 > [2012-02-05 21:41:03.279324] D > [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: > Source selected as 1 for / > [2012-02-05 21:41:03.279341] D > [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: > Building lookup response from 1 > [2012-02-05 21:41:03.279561] D [afr-common.c:1945:afr_lookup] > 0-d1-replicate-0: failed to get the gfid from dict > [2012-02-05 21:41:03.280265] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.280299] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.280317] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.280332] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.280346] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 0 character innocent > [2012-02-05 21:41:03.280377] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 1 character innocent > [2012-02-05 21:41:03.280392] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 2 character innocent > [2012-02-05 21:41:03.280405] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 3 character innocent > [2012-02-05 21:41:03.280419] D > [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number > of sources: 0 > [2012-02-05 21:41:03.280433] D > [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] > 0-d1-replicate-0: returning read_child: 2 > [2012-02-05 21:41:03.280507] D > [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: > Source selected as 2 for /data.txt > [2012-02-05 21:41:03.280530] D > [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: > Building lookup response from 2 > [2012-02-05 21:41:03.280548] I > [afr-common.c:1076:afr_detect_self_heal_by_iatt] 0-d1-replicate-0: > size differs for /data.txt > [2012-02-05 21:41:03.280576] I > [afr-common.c:1225:afr_launch_self_heal] 0-d1-replicate-0: background > data self-heal triggered. path: /data.txt > [2012-02-05 21:41:03.280612] D > [afr-self-heal-metadata.c:78:afr_sh_metadata_done] 0-d1-replicate-0: > proceeding to data check on /data.txt > [2012-02-05 21:41:03.281307] D > [afr-lk-common.c:1397:afr_nonblocking_inodelk] 0-d1-replicate-0: > attempting data lock range 0 0 by 140191242417704 > [2012-02-05 21:41:03.281734] D > [afr-self-heal-data.c:1136:afr_sh_data_post_nonblocking_inodelk_cbk] > 0-d1-replicate-0: Non Blocking data inodelks done for /data.txt by > 140191242417704. Proceeding to self-heal > [2012-02-05 21:41:03.282815] D > [afr-self-heal-data.c:650:afr_sh_data_fix] 0-d1-replicate-0: Pending > matrix for: 140191242417704 > [2012-02-05 21:41:03.282896] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.282940] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.282978] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.283015] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.283050] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 0 character innocent > [2012-02-05 21:41:03.283095] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 1 character innocent > [2012-02-05 21:41:03.283145] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 2 character innocent > [2012-02-05 21:41:03.283191] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 3 character innocent > [2012-02-05 21:41:03.283229] D > [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number > of sources: 0 <<<<<< ????? > [2012-02-05 21:41:03.283264] D > [afr-self-heal-data.c:657:afr_sh_data_fix] 0-d1-replicate-0: No > self-heal needed for /data.txt <<<<<< ????? > [2012-02-05 21:41:03.283306] D > [afr-self-heal-data.c:320:afr_sh_data_finish] 0-d1-replicate-0: > finishing data selfheal of /data.txt > [2012-02-05 21:41:03.283344] D [afr-lk-common.c:410:transaction_lk_op] > 0-d1-replicate-0: lk op is for a self heal > [2012-02-05 21:41:03.283379] D > [afr-lk-common.c:597:afr_unlock_inodelk] 0-d1-replicate-0: attempting > data unlock range 0 0 by 140191242417704 > [2012-02-05 21:41:03.283879] I > [afr-self-heal-common.c:2022:afr_self_heal_completion_cbk] > 0-d1-replicate-0: background data self-heal completed on /data.txt > [2012-02-05 21:41:03.283960] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.284004] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.284042] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.284098] D > [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] > 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] > [2012-02-05 21:41:03.284136] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 0 character innocent > [2012-02-05 21:41:03.284173] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 1 character innocent > [2012-02-05 21:41:03.284216] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 2 character innocent > [2012-02-05 21:41:03.284253] D > [afr-self-heal-common.c:604:afr_find_child_character_type] > 0-d1-replicate-0: child 3 character innocent > [2012-02-05 21:41:03.284290] D > [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number > of sources: 0 > [2012-02-05 21:41:03.284325] D > [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] > 0-d1-replicate-0: returning read_child: 2 > [2012-02-05 21:41:03.284361] D > [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: > Source selected as 2 for /data.txt > [2012-02-05 21:41:03.284398] D > [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: > Building lookup response from 2 > [2012-02-05 21:41:03.285622] D > [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-2: Number of > locks cleared=0 > [2012-02-05 21:41:03.285690] D > [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-1: Number of > locks cleared=0 > [2012-02-05 21:41:03.285779] D > [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-3: Number of > locks cleared=0 > [2012-02-05 21:41:03.285880] D > [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-0: Number of > locks cleared=0 > > ===============CUT Client Log > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Whit Blauvelt
2012-Feb-05 21:22 UTC
[Gluster-users] Would difference in size (and content) of a file on replicated bricks be healed?
Not sure if you're asking your questions precisely enough. The clues may be in your inclusions, but I'm not going to read all that to figure it out so I'll ask directly:> Short description of my test > ---------------------------- > * 4 replicas on single machine > * glusterfs mounted locally > * Create file on glusterfs-mounted directory: date >data.txtDid you write it through a gluster (or nfs) mount of the gluster file system, or sneak around behind it with a direct local mount of the partition?> * Append to file on one of the bricks: hostname >>data.txtAgain, through a gluster/nfs mount, or a local mount?> * Trigger a self-heal with: stat data.txtAgain, stat'ing it on a gluster/nfs mount, or a local mount?> => Modified file on the brick is still different from the three > other bricks.Again ... well you know.> Q1: Is the modified file supposed to get corrected in my case? > > Q2: Is my test-case invalid for what gluster is supposed to handle?If you're writing to files through mounts which gluster isn't handling, you'll get inconsistent stuff. Eventually gluster should catch up. But by design you should do everything through gluster. (An exception may be if you want a faster local file read ... anything that writes or touches the file should be through gluster in any case.) Best, Whit