Hi All I run rsync in an automated environment and it sometime will crash and leave a core dump file. from core dump, gdb shows that gdb) bt #0 add_dirs_to_tree (parent_ndx=-1, from_flist=0x56c590, dir_cnt=1) at flist.c:1422 #1 0x0000000000409eab in send_file_list (f=16, argc=-1, argv=0x56c238) at flist.c:2068 #2 0x0000000000419052 in client_run (f_in=16, f_out=16, pid=-1, argc=1, argv=0x56c230) at main.c:1033 #3 0x000000000041a09a in main (argc=2, argv=0x56c230) at main.c:1260 (gdb) bt f #0 add_dirs_to_tree (parent_ndx=-1, from_flist=0x56c590, dir_cnt=1) at flist.c:1422 file = (struct file_struct *) 0x0 ~~~~it crashes at add_dirs_to_tree() when reference a NULL pointer. i = 2 dp = (int32_t *) 0x2a983f2f28 parent_dp = (int32_t *) 0x0 (gdb) p *((struct file_list *)0x56c590)->sorted[0] $4 = {dirname = 0x0, modtime = 1197492871, len32 = 4096, mode = 16895, flags = 5, basename = "."} (gdb) p *((struct file_list *)0x56c590)->sorted[1] $5 = {dirname = 0x0, modtime = 1197488045, len32 = 16384, mode = 16832, flags = 4, basename = "l"} from the mode, it looks that both of them are directories, so S_ISDIR() should be 1 and thus "dir_cnt--" should get executed, but later show dir_cnt is still 1. weired. (gdb) p *((struct file_list *)0x56c590)->sorted[2] Cannot access memory at address 0x0 ~~~~this is where file become NULL pointer when i is 2. (gdb) p dir_cnt $11 = 1 Any idea about what is going on here? I can provide the core dump and rsync binary or other information if need. Thanks! -- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 --------------------------------------------
On Wed, Dec 12, 2007 at 09:44:47PM -0500, Ming Zhang wrote:> from the mode, it looks that both of them are directories, so S_ISDIR() > should be 1 and thus "dir_cnt--" should get executed, but later show > dir_cnt is still 1.I wonder if dir_cnt was 3 when there was only 2 dir entries? I see a way in the code that the dir_count global could get executed and then later in the make_file() call it could return a NULL indicating that the current entry should not get included. I'm moving the incrementing of the dir_count variable to avoid this. If you would, please test the latest dev version (from the most recent nightly tar file or the git repo). Thanks for your very detailed bug report! ..wayne..
ps, not sure if you saw the half open socket issue i reported last month? will you fix it or leave it there? Ming On Mon, 2008-01-14 at 17:12 -0500, Matt McCutchen wrote:> On Fri, 2007-12-14 at 13:34 -0500, Ming Zhang wrote: > > sorry but seems it still crashes somehow. > > > > global dir_count is 23. > > > > (gdb) bt > > #0 0x000000000040544a in add_dirs_to_tree (parent_ndx=10, > > from_flist=0x57c5d0, dir_cnt=4) at flist.c:1423 > > #1 0x00000000004090ef in send_extra_file_list (f=17, at_least=1000) at > > flist.c:1729 > > #2 0x00000000004122f3 in send_files (f_in=17, f_out=17) at sender.c:189 > > #3 0x0000000000419087 in client_run (f_in=17, f_out=17, pid=-1, argc=1, > > argv=0x56c230) at main.c:1041 > > #4 0x000000000041a09a in main (argc=2, argv=0x56c230) at main.c:1260 > > (gdb) bt f > > #0 0x000000000040544a in add_dirs_to_tree (parent_ndx=10, > > from_flist=0x57c5d0, dir_cnt=4) at flist.c:1423 > > file = (struct file_struct *) 0x13 > > i = 0 > > dp = (int32_t *) 0x0 > > parent_dp = (int32_t *) 0x2a983f3c68 > > #1 0x00000000004090ef in send_extra_file_list (f=17, at_least=1000) at > > flist.c:1729 > > file = Variable "file" is not available. > > (gdb) p *(struct file_list *)0x57c5d0 > > $4 = {next = 0x0, prev = 0x57c570, files = 0x0, sorted = 0x0, file_pool > > = 0x56da70, pool_boundary = 0x2a984f5138, used = 0, malloced = 0, low > > 0, high = -1, ndx_start = 192, ndx_end = 187, > > parent_ndx = 10, in_progress = 0, to_redo = 0} > > Ming, can you still reproduce this crash with the latest development > rsync? If so, I hope Wayne will investigate it before rsync 3.0.0 is > released. > > Matt >-- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 --------------------------------------------
Hi Matt No news is good news, so far Q&A and I have never seen crash again. Thanks a lot for the fix! Ming On Mon, 2008-01-14 at 17:12 -0500, Matt McCutchen wrote:> On Fri, 2007-12-14 at 13:34 -0500, Ming Zhang wrote: > > sorry but seems it still crashes somehow. > > > > global dir_count is 23. > > > > (gdb) bt > > #0 0x000000000040544a in add_dirs_to_tree (parent_ndx=10, > > from_flist=0x57c5d0, dir_cnt=4) at flist.c:1423 > > #1 0x00000000004090ef in send_extra_file_list (f=17, at_least=1000) at > > flist.c:1729 > > #2 0x00000000004122f3 in send_files (f_in=17, f_out=17) at sender.c:189 > > #3 0x0000000000419087 in client_run (f_in=17, f_out=17, pid=-1, argc=1, > > argv=0x56c230) at main.c:1041 > > #4 0x000000000041a09a in main (argc=2, argv=0x56c230) at main.c:1260 > > (gdb) bt f > > #0 0x000000000040544a in add_dirs_to_tree (parent_ndx=10, > > from_flist=0x57c5d0, dir_cnt=4) at flist.c:1423 > > file = (struct file_struct *) 0x13 > > i = 0 > > dp = (int32_t *) 0x0 > > parent_dp = (int32_t *) 0x2a983f3c68 > > #1 0x00000000004090ef in send_extra_file_list (f=17, at_least=1000) at > > flist.c:1729 > > file = Variable "file" is not available. > > (gdb) p *(struct file_list *)0x57c5d0 > > $4 = {next = 0x0, prev = 0x57c570, files = 0x0, sorted = 0x0, file_pool > > = 0x56da70, pool_boundary = 0x2a984f5138, used = 0, malloced = 0, low > > 0, high = -1, ndx_start = 192, ndx_end = 187, > > parent_ndx = 10, in_progress = 0, to_redo = 0} > > Ming, can you still reproduce this crash with the latest development > rsync? If so, I hope Wayne will investigate it before rsync 3.0.0 is > released. > > Matt >-- Ming Zhang @#$%^ purging memory... (*!% http://blackmagic02881.wordpress.com/ http://www.linkedin.com/in/blackmagic02881 --------------------------------------------