Dear useRs, I have some kind of a weird issue with md5sum() and I'm not sure where I should start. I have a repository on GitHub, with a local Git installation and connected with RStudio. I am working on Windows 10 and a colleague of mine works on Linux. We both pull the latest commits of all files, but the checksums are different. Even stranger (to me at least), I get a different checksum from the local file (downloaded through Git via pulling) and the same file that I manually download from GitHub. The checksum of the manual download from GitHub is the same as that of my colleague on Linux. This happens to all text-based files (Rmd, MD, CSV...) but not to non-editable files (PDF, XLSX...). For example (I have shortened the paths): > library(tools) > md5sum(file.choose()) # local repo D:\\...\\SSFAcomparisonPaper\\README.md "e3b08fc2ab8b3c8b57e681f862a77f32" > md5sum(file.choose()) # downloaded from GitHub C:\\Users\\...\\Downloads\\README.md "05fab51e18b962a9f3266c7b79016ce6" > md5sum(file.choose()) # local repo D:\\...\\SSFAcomparisonPaper\\...\\SSFA_GuineaPigs_plot.pdf "d9b331642bfd0d192e4eff5808b2a30f" > md5sum(file.choose()) # downloaded from GitHub C:\\Users\\...\\Downloads\\SSFA_GuineaPigs_plot.pdf "d9b331642bfd0d192e4eff5808b2a30f" I am not sure whether it is an issue with the algorithm of md5sum(), whether it's a R/RStudio/Git/GitHub/Windows issue, so I would be grateful if you could help me sorting it out. Thank you in advance, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra
Sounds like a newline discrepancy issue. Highly unlikely to be an R issue. On February 2, 2021 8:01:05 AM PST, Ivan Calandra <calandra at rgzm.de> wrote:>Dear useRs, > >I have some kind of a weird issue with md5sum() and I'm not sure where >I >should start. > >I have a repository on GitHub, with a local Git installation and >connected with RStudio. >I am working on Windows 10 and a colleague of mine works on Linux. >We both pull the latest commits of all files, but the checksums are >different. >Even stranger (to me at least), I get a different checksum from the >local file (downloaded through Git via pulling) and the same file that >I >manually download from GitHub. The checksum of the manual download from > >GitHub is the same as that of my colleague on Linux. >This happens to all text-based files (Rmd, MD, CSV...) but not to >non-editable files (PDF, XLSX...). > >For example (I have shortened the paths): > > library(tools) > > > md5sum(file.choose()) # local repo >D:\\...\\SSFAcomparisonPaper\\README.md >"e3b08fc2ab8b3c8b57e681f862a77f32" > > > md5sum(file.choose()) # downloaded from GitHub >C:\\Users\\...\\Downloads\\README.md >"05fab51e18b962a9f3266c7b79016ce6" > > > md5sum(file.choose()) # local repo >D:\\...\\SSFAcomparisonPaper\\...\\SSFA_GuineaPigs_plot.pdf >"d9b331642bfd0d192e4eff5808b2a30f" > > > md5sum(file.choose()) # downloaded from GitHub >C:\\Users\\...\\Downloads\\SSFA_GuineaPigs_plot.pdf >"d9b331642bfd0d192e4eff5808b2a30f" > >I am not sure whether it is an issue with the algorithm of md5sum(), >whether it's a R/RStudio/Git/GitHub/Windows issue, so I would be >grateful if you could help me sorting it out. > >Thank you in advance, >Ivan-- Sent from my phone. Please excuse my brevity.
On Tue, 2 Feb 2021 17:01:05 +0100 Ivan Calandra <calandra at rgzm.de> wrote:> This happens to all text-based files (Rmd, MD, CSV...) but not to > non-editable files (PDF, XLSX...).This is probably caused by Git helpfully converting text files from LF (0x10) line endings to CR LF (0x13 0x10) when checking out the repository clone on Windows (and back when checking in). This configuration option is described in Pro Git: https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#_core_autocrlf -- Best regards, Ivan