Robin Lee Powell
2022-Aug-04 00:39 UTC
Is there a better way to transfer data that doesn't use so much cache?
On Wed, Aug 03, 2022 at 02:04:22PM -0400, Rob Campbell via rsync wrote:> I've created a script that syncs (and removes) data from as many as 4 > places and puts them all in one of 2 directories. The commands are: > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef' > -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' > "$D850/DCIM/100ND850/" $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.nef' > -f'+ *.jp*g' -f'+ *.tif' -f'+ *.xmp' -f'+ /*' -f'- *' "$Z9/DCIM/100NCZ_9/" > $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ /*' -f'- *' "$DASHCAM/CARDV/VIDEO/" $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'- > Screenshots/' -f'+ *.nef' -f'+ *.jpg' -f'+ *.jp*g' -f'+ *.png' -f'+ *.dng' > -f'+ *.gif' -f'- *.thumbnails' -f'- *.android' -f'+ */' -f'+ DCIM/*' -f'+ > Snapbridge/*' -f'+ Pictures/*' -f'+ Download/*' -f'+ Textgram/*' -f'- *' > $PHONE/ $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'- *' > --files-from=<(find $PHONE -type f ! -path "*Download*" ! -path > "*.trashed*" ! -iname ????????????????????????????????.mp4 ! -iname > '*.mp4\.*')/ $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ *.avi' -f'+ *.mov' -f'+ *.mp*g' -f'+ *.3gp' -f'+ Movies/*' -f'+ > *Recordings/*' -f'+ DCIM/*' -f'+ Snapbridge/*' -f'- */' -f'- *' $PHONE/ > $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'- *' --files-from=<(find $PHONE -iname > ????????????????????????????????.mp4) / $STAGINGV/TIKTOK/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *' > $PHONE/Downloads/ $COMPUTER/Downloads/ > > rsync -avt --progress --remove-source-files --info=progress2 -f'- > screenshot*' -'f- Screenshot*' -f'- Boondocks/' -f'- Dilbert/' -f'+ *.png' > -f'+ *.jp*g' -f'+ *.dng' -f'+ *.gif' -f'- *20*/' -f'- *' -f'+ */' -f'- > $STAGINGP/' $MYPICS/ $STAGINGP/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ > Screenshot*.png' -f'- Staging/' -f'- *' $MYPICS/ $STAGINGP/Screenshots/ | > tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.3gpp' > -f'+ *.mp4' -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'- *' > $HOME/Downloads $STAGINGV/ | tee -a $LOG > > rsync -avt --progress --remove-source-files --info=progress2 -f'+ *.mp4' > -f'+ *.mp*g' -f'+ *.avi' -f'+ *.asf' -f'+ *.wmv' -f'+ *.3gpp' -f'- *' > $MYVIDEOS/ $STAGINGV/ | tee -a $LOG > > > The problem isn't that there are many syncs because the problem happens on > the first one that runs.You didn't actually say what the problem *is*. I can infer from the subject that you think it's bad that rsync is using a bunch of disk/buffer cache, but that's not rsync, that's Linux, and it's by design; Linux uses as much RAM as it possibly can for disk cache, always. This improves performance. In a well-performing Linux system, the "free" column of "free -h" is very low, and the "available" column is very high.> Before any of them run I run: > > sudo free -w -h;sync && echo 1 > /proc/sys/vm/drop_caches;free -w -h > > I do not run this before each one because it sometimes takes a while to > /proc/sys/vm/drop_cachesThat's a great way to substantially reduce performance; why are you doing that?> Is there something in the logic that can be done to make this perform > better or should I use something other than rsync or is what I am getting > as good as it will get regardless of what I use? > > Some of these directories can be over a gig. Most of these are media files > and should have exif data that has the timestamp so maybe I can get rid of > -t but it is easier to keep the timestamp of the file rather than running > exiftool to also use the create date to "touch" the file but maybe using > exiftool is a faster way? > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > In all things, Be Intentional.> -- > Please use reply-all for most replies to avoid omitting the mailing list. > To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Dan Stromberg
2022-Aug-04 02:09 UTC
Is there a better way to transfer data that doesn't use so much cache?
On Wed, Aug 3, 2022 at 5:41 PM Robin Lee Powell via rsync < rsync at lists.samba.org> wrote:> On Wed, Aug 03, 2022 at 02:04:22PM -0400, Rob Campbell via rsync wrote: > > The problem isn't that there are many syncs because the problem happens > on > > the first one that runs. > > You didn't actually say what the problem *is*. > > I can infer from the subject that you think it's bad that rsync is > using a bunch of disk/buffer cache, but that's not rsync, that's > Linux, and it's by design; Linux uses as much RAM as it possibly can > for disk cache, always. This improves performance. In a > well-performing Linux system, the "free" column of "free -h" is very > low, and the "available" column is very high. >Linux does indeed try to put your RAM to good use, and often that means caching data from disk in RAM. However, if you transfer a large amount of data and do not intend to retransmit that data any time soon, then the memory isn't really put to good use, and can actually cause your system to slow down significantly - particularly if there's a lot of such data transferred. It is, however, theoretically possible to skip the buffer cache using O_DIRECT, but that requires your application to have O_DIRECT support, or to use something like https://stromberg.dnsalias.org/~strombrg/libodirect/ HTH. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.samba.org/pipermail/rsync/attachments/20220803/16189ed8/attachment.htm>