When I use the preallocate patch and create a 77GB file using the function I get a CPU spike on the server-side. The spike lasts about 20 minutes and uses about 20%-25% of the cpu associated with the rsync instance creating the file. The spike is directly linked to the time it takes to create the file. I compiled rsync using cygwin CVS. I initially suspected the implementation of posix_fallocate in cygwin. I posted the same question to their board and Corinna indicated she can create a 40GB file in a few milliseconds. I'm wondering if there is something else going on at this point. Does anyone else get this issue too? I want to confirm it isn't just a problem with my environment. I've replicated it on machines but they are all using the same cygwin build I created. It would be great to get a confirmation (positive or negative) from someone running it under Linux. The preallocate is working, it just is taking a while to create the file. Rob
On Mon, 2007-11-05 at 06:25 -0700, Rob Bosch wrote:> When I use the preallocate patch and create a 77GB file using the function I > get a CPU spike on the server-side.I don't think reproducing the CPU spike on Linux will be meaningful because Linux treats posix_fallocate differently from Cygwin: Cygwin just reserves space for the file while Linux actually writes a byte into every disk block of the file. We need to figure out why rsync on your Cygwin is giving a different result from Corinna's Cygwin. I did notice that the preallocation code incorrectly uses 32-bit ints for some file offsets; it should use OFF_Ts. The attached patch fixes this. However, the mistake didn't affect the main call to posix_fallocate, so it doesn't explain the problem you are having. Here are two things I suggest that you try: 1. Strace rsync and make sure posix_fallocate is getting called with the correct arguments. 2. Write your own program that allocates a 77GB file with posix_fallocate and see if the same thing happens. Matt -------------- next part -------------- A non-text attachment was scrubbed... Name: preallocate-off-t.diff Type: text/x-patch Size: 850 bytes Desc: not available Url : http://lists.samba.org/archive/rsync/attachments/20071118/29d2c77a/preallocate-off-t.bin
On Sat, 2008-02-23 at 16:43 -0700, Rob Bosch wrote:> Matt's patch worked great for cygwin (preallocate.diff). The same approach > is not working well under CentOS since it writes out all those 0's for the > files using the posix_fallocate function. It seems to me that under Linux > the ftruncate function may be better suited for the implementation?IIUC, ftruncate doesn't preallocate the file; it only changes the logical size. I did a little test on my computer (using reiserfs) of copying a 146 MB file to a new file, a posix_fallocated file, and an ftruncated file. The copies took 6.1 s, 4.9 s, and 6.3 s, respectively; ftruncate was no better than calling nothing at all.> I tested using ftruncate in lieu of posix_fallocate and it appears to create > the files almost instantaneously on both XFS and Ext3 (the only file system > formats I've tested). Filefrag reports just a single extent on both file > systems (file created was 77GB). Oddly, filefrag reports a perfect > fragmentation of 172 with the file I created on Ext2 but it only had 1 > extent. I'm not sure I fully understand the "perfect" fragmentation number.For a meaningful test, you should actually write 77GB of data into a new file and an ftruncated file and see if there's any difference in the resulting fragmentation. In your patch, you should use fallocate in place of ftruncate. If your glibc is like mine and doesn't provide direct access to fallocate, you'll have to use syscall and __NR_fallocate . Matt
Rob Bosch wrote:> I took a stab at modifying the preallocate.diff patch file replacing it with > ftruncate (attached). Do you think the file looks OK for Linux (obviously > cygwin should use posix_fallocate)? I replaced posix_fallocate with > ftruncate and also removed the check for HAVE_POSIX_FALLOCATE. > > Maybe this should be preallocate-linux.diff?How will this help on Linux? Changing the file size with ftruncate() doesn't change the allocation process of disk blocks at all on standard Linux filesystems; it just changes the size field in the inode. You can preallocate file space with recent Linux kernels, but it's not done with ftruncate(). -- Jamie
Rob Bosch wrote:> Destination file on XFS> - ftruncate, 59GB file, Execution time 52776 secs, 1235 extents > - posix_fallocate, 59GB file, Execution time 53919 secs, 11 extentsAny idea why Glibc's posix_fallocate makes any difference? Doesn't it simply write a lot of zeros? In that case, why doesn't rsync writing lots of data sequentially result in the same number of extents? -- Jamie
On Mon, Feb 25, 2008 at 08:48:22AM -0700, Rob Bosch wrote:> The odd thing is that a huge amount of the file was resent again even > though the files are identical at the source and destination.A local transfer needs --no-whole-file if you want it to use the rsync algorithm (which uses more disk I/O, so it's not the default). ..wayne..
Rob Bosch wrote:> > Any idea why Glibc's posix_fallocate makes any difference? > > > > Doesn't it simply write a lot of zeros? In that case, why doesn't > > rsync writing lots of data sequentially result in the same number of > > extents? > > The destination server had a lot of other processes running at the same > time. I suspect this concurrency is causing the additional extents since > many processes writing to the same disk array.Oh, that's quite reasonable.> > I thought only 1235 extents without posix_fallocate was pretty good. Under > cygwin and NTFS, an equivalent file was generating around 25000 extents.I guess it will depend a lot on what the other processes are doing. Though, did I get the right impression that NTFS generates lots of extents for small writes even when nothing else is running? -- Jamie
> Though, did I get the right impression that NTFS generates lots of > extents for small writes even when nothing else is running?The fragmentation on NTFS was a problem even when nothing else was running on the server. The preallocation patch made all the difference on NTFS and cygwin. In that world it is a must have since it increases performance on the reads by orders of magnitude while not impacting the write-side since the cygwin posix_fallocate is very efficient. Rob
Rob Bosch wrote:> > Though, did I get the right impression that NTFS generates lots of > > extents for small writes even when nothing else is running? > > The fragmentation on NTFS was a problem even when nothing else was running > on the server. The preallocation patch made all the difference on NTFS and > cygwin. In that world it is a must have since it increases performance on > the reads by orders of magnitude while not impacting the write-side since > the cygwin posix_fallocate is very efficient.Was that simply due to writing too-small block to NTFS? In other words, would increasing the size of write() calls have fixed it instead, without leaving allocated but unused disk space in the case of a user-abort with --partial, --partial-dir or --inplace? -- Jamie
Rob Bosch wrote:> > Was that simply due to writing too-small block to NTFS? In other > > words, would increasing the size of write() calls have fixed it > > instead, without leaving allocated but unused disk space in the case > > of a user-abort with --partial, --partial-dir or --inplace? > > It could have been a function of the block size but I don't think so. I > never tested the strategies you list. Under cygwin the posix_fallocate > function is extremely efficient in that it immediately allocates the files > and does no writing, yet still provides a single-extent file if it can be > provided (just like fallocate if supported in the kernel). Given that > solved the problem I didn't pursue any other alternatives.I'm thinking that if rsync is aborted after it calls posix_fallocate, you might have some large amount of disk space used, but not indicated in the file size. That seems like a bad state to leave a filesystem in, because it's mostly invisible. That means the prealloc option should be off by default (on all systems). I'm thinking if large writes would fix the NTFS problem, they wouldn't leave the filesystem like that following an abort, so that could be safely turned on by default, and everyone would benefit from improved NTFS performance. Another possibility is to incrementally preallocate large chunks, e.g. 1MiB at a time while writing. That wouldn't leave huge preallocated space, and should improve NTFS performance enough, so could be turned on by default perhaps. -- Jamie
Rob Bosch wrote:> The patch truncates the file with ftruncate if a transfer fails in > receiver.c. This should avoid the problem you mention.I was thinking of a user-abort (Control-C) or crash, but this is good.> Even if this didn't > occur, the file would exist on the FS with the predefined size. It would be > in the allocation table and exist on the disk (you can see it under Windows > explorer). It wouldn't have data in the entire file size but it is still a > valid, if sparse, file.Ok. This is both better and worse :-) Better than I thought, because some other filesystems (e.g. Veritas VxFS?) allow space to be reserved without it counting in the file size, so it's more hidden then. Worse, because it means --prealloc will break --append --whole-file, and the combination would sometimes be useful (e.g. for copying a large, growing log file), and occasionally the copy is aborted by a crash or user-abort.> The writes in larger chunks won't fully solve the problem unless you have a > machine that does not do much concurrency. My Windows machine using NTFS > experienced high fragmentation in ALL files, not just large ones.I'm thinking large writes would fix smaller files even better than large files, since a large write would write _all_ of a small file at once :-) For perspective, I'm thinking of write() calls whose size is a megabyte or a few. It's not the number of fragments which limits throughput; it's the size of them. Perhaps NTFS would not fragment individal large write() calls - so even high concurrency wouldn't make any difference? -- Jamie
I ran rsync on the 59GB file again without preallocate on XFS. It created only 383 extents...very low fragmentation. Rob
On Mon, Feb 25, 2008 at 09:28:18PM -0700, Rob Bosch wrote:> I reran this test with the --no-whole-file option and received the exact > same results. Any idea on why some much data is being sent when the files > are exactly the same on both sides?Yeah, I hadn't noticed that your transfer had already implied that option. I'm looking into the sending of large files to see if there is a problem with the checksum matching algorithm. ..wayne..
Wayne, thanks for your help on this issue. It turned out to be a user error (me) since the client was the pre5 client instead of the pre10. I reran the test with the pre10 client as you suggested and here are the results. The only odd thing I noticed is that even though all the data matched, the file was recreated on the receiving side. If there is 100% match shouldn't it just leave the file as is even if the -I option is selected? Or is that caused by a different option I have set up? FYI, the fragmentation was only 27 extents for a 59GB file...I really like XFS! false_alarms=53852 hash_hits=460671 matches=460671 sender finished FILENAME send_files phase=1 send_files phase=2 send files finished total: matches=460671 hash_hits=460671 false_alarms=53852 data=0 rsync[4552] (sender) heap statistics: arena: 524288 (bytes from sbrk) ordblks: 2 (chunks not in use) smblks: 0 hblks: 0 (chunks from mmap) hblkhd: 2686976 (bytes from mmap) allmem: 3211264 (bytes from sbrk + mmap) usmblks: 22151168 fsmblks: 0 uordblks: 3202504 (bytes used) fordblks: 8760 (bytes free) keepcost: 8624 (bytes in releasable chunk) Number of files: 1 Number of files transferred: 1 Total file size: 60381007872 bytes Total transferred file size: 60381007872 bytes Literal data: 0 bytes Matched data: 60381007872 bytes File list size: 33 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 96 Total bytes received: 3685395 sent 96 bytes received 3685395 bytes 492.61 bytes/sec total size is 60381007872 speedup is 16383.44 _exit_cleanup(code=0, file=main.c, line=1060): about to call exit(0) ptime 1.0 for Win32, Freeware - http://www.pc-tools.net/ Copyright(C) 2002, Jem Berkes === rsync.exe -I --no-whole-file --port=888 -vvv --compress-level=9 --stats Execution time: 7481.359 s