gabrielle aban steinberg
2021-Nov-02 22:30 UTC
[R] Fwd: Merging multiple csv files to new file
Hello, I would like to merge 18 csv files into a master data csv file, but each file has a different number of columns (mostly found in one or more of the other cvs files) and different number of rows. I have tried something like the following in R Studio (cloud): all_data_fit_files <- rbind("dailyActivity_merged.csv", "dailyCalories_merged.csv", "dailyIntensities_merged.csv", "dailySteps_merged.csv", "heartrate_seconds_merged.csv", "hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", "hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", "minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", ?minuteStepsWide_merged.csv", "sleepDay_merged.csv", "minuteStepsWide_merged.csv", "sleepDay_merged.csv", "weightLogInfo_merged.csv") But I am getting the following error: Error: unexpected input in "rlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv" (Maybe the R Studio free trial/usage is underpowered for my project?) [[alternative HTML version deleted]]
Gabrielle, Why would you expect that to work? rbind() binds rows of internal R data structures that are some variety of data.frame with exactly the same columns in the same order into a larger object of that type. You are not providing rbind() with the names of variables holding the info but file names of Comma Separated Values. If you have many files with different numbers of columns of data with some overlap, you need to decide on quite a few things first. If a file has say 4 columns out of a possible 20 unique columns across the files, do you want to add 16 columns to the contents of the file, after reading it in, and re-arrange it into a specific order by column? What will you fill in the new columns with? NA is a popular choice but you need to decide. You then need to repeat the same thing with all the other files and read in 6 columns then add 14 filled as you wish and rearrange the columns to the same order. When done, you have an assortment of variables of class data.frame (or other similar ones) and you can use rbind() on those variables to get a result. But it may not be what you want. You may actually want more of a database merge type of operation combining columns from each into the same userID field or whatever. rbind() is not the function to do that with and I won't go on to give a long tutorial. My main point is what you are doing is at the wrong level. You need to read all the files into variable before doing additional calculations in R. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of gabrielle aban steinberg Sent: Tuesday, November 2, 2021 6:31 PM To: r-help at r-project.org Subject: [R] Fwd: Merging multiple csv files to new file Hello, I would like to merge 18 csv files into a master data csv file, but each file has a different number of columns (mostly found in one or more of the other cvs files) and different number of rows. I have tried something like the following in R Studio (cloud): all_data_fit_files <- rbind("dailyActivity_merged.csv", "dailyCalories_merged.csv", "dailyIntensities_merged.csv", "dailySteps_merged.csv", "heartrate_seconds_merged.csv", "hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", "hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", "minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", ?minuteStepsWide_merged.csv", "sleepDay_merged.csv", "minuteStepsWide_merged.csv", "sleepDay_merged.csv", "weightLogInfo_merged.csv") But I am getting the following error: Error: unexpected input in "rlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv" (Maybe the R Studio free trial/usage is underpowered for my project?) [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1. Think more carefully about the appropriate data structure for what you wish to do. It's unlikely to be .csv files, however. In the absence of the above, a simple (but perhaps inappropriate) default is: 2. Read the files into R and combine into a list.(You will need to read about lists in R if you don't know what these are, of course). 3. Save your list as an .Rdata file. See ?save and ?load for details. But do note that such files are special binary files only (easily anyway) readable by R. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Nov 3, 2021 at 8:36 AM gabrielle aban steinberg < gabrielleabansteinberg at gmail.com> wrote:> Hello, I would like to merge 18 csv files into a master data csv file, but > each file has a different number of columns (mostly found in one or more of > the other cvs files) and different number of rows. > > I have tried something like the following in R Studio (cloud): > > all_data_fit_files <- rbind("dailyActivity_merged.csv", > "dailyCalories_merged.csv", "dailyIntensities_merged.csv", > "dailySteps_merged.csv", "heartrate_seconds_merged.csv", > "hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", > "hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", > "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", > "minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", > ?minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "weightLogInfo_merged.csv") > > > > But I am getting the following error: > > Error: unexpected input in "rlySteps_merged.csv", > "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", > "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv" > > > (Maybe the R Studio free trial/usage is underpowered for my project?) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
It might be easier to settle on the desired final csv layout and use Python to copy the rows via line reads. Python doesn't care about the data type in a given "cell", numeric or char, whereas the type errors R would encounter would make the task very difficult. On Wed, Nov 3, 2021, 10:36 AM gabrielle aban steinberg < gabrielleabansteinberg at gmail.com> wrote:> Hello, I would like to merge 18 csv files into a master data csv file, but > each file has a different number of columns (mostly found in one or more of > the other cvs files) and different number of rows. > > I have tried something like the following in R Studio (cloud): > > all_data_fit_files <- rbind("dailyActivity_merged.csv", > "dailyCalories_merged.csv", "dailyIntensities_merged.csv", > "dailySteps_merged.csv", "heartrate_seconds_merged.csv", > "hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", > "hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", > "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", > "minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", > ?minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "weightLogInfo_merged.csv") > > > > But I am getting the following error: > > Error: unexpected input in "rlySteps_merged.csv", > "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", > "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv" > > > (Maybe the R Studio free trial/usage is underpowered for my project?) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
The error message arises because you are sometimes delimiting character strings using non-ASCII open and close double quotes, '?' and '?', instead of the old-fashioned ones, '"', which have no open or close variants. This is a language syntax error, so R didn't try to compute anything. The others' comments are still valid - you need to read the files named by these strings to produce R datasets and combine the datasets. -Bill On Wed, Nov 3, 2021 at 8:36 AM gabrielle aban steinberg < gabrielleabansteinberg at gmail.com> wrote:> Hello, I would like to merge 18 csv files into a master data csv file, but > each file has a different number of columns (mostly found in one or more of > the other cvs files) and different number of rows. > > I have tried something like the following in R Studio (cloud): > > all_data_fit_files <- rbind("dailyActivity_merged.csv", > "dailyCalories_merged.csv", "dailyIntensities_merged.csv", > "dailySteps_merged.csv", "heartrate_seconds_merged.csv", > "hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", > "hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", > "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", > "minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", > ?minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "weightLogInfo_merged.csv") > > > > But I am getting the following error: > > Error: unexpected input in "rlySteps_merged.csv", > "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", > "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv" > > > (Maybe the R Studio free trial/usage is underpowered for my project?) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
>(Maybe the R Studio free trial/usage is underpowered for my project?)- R is a computer language, as well as a program for interpreting R source code. - RStudio Desktop is an editor with "features" intended to make using R easy. It cannot "do" anything without R being installed. - R is completely free. There is no "trial" period for using R. There are no "crippled" versions of R. - RStudio Desktop has both free and paid versions, but they have very nearly identical capabilities. The most significant difference is that you get tech support with the paid version. [1] So no, your difficulty lies not with what you downloaded but with how you are expressing your desires with the R language (with or without RStudio), and others have suggested ways you could correct that. [1] https://www.rstudio.com/products/rstudio/ On November 2, 2021 3:30:46 PM PDT, gabrielle aban steinberg <gabrielleabansteinberg at gmail.com> wrote:>Hello, I would like to merge 18 csv files into a master data csv file, but >each file has a different number of columns (mostly found in one or more of >the other cvs files) and different number of rows. > >I have tried something like the following in R Studio (cloud): > >all_data_fit_files <- rbind("dailyActivity_merged.csv", >"dailyCalories_merged.csv", "dailyIntensities_merged.csv", >"dailySteps_merged.csv", "heartrate_seconds_merged.csv", >"hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", >"hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", >"minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", >"minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", >"minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", >?minuteStepsWide_merged.csv", "sleepDay_merged.csv", >"minuteStepsWide_merged.csv", "sleepDay_merged.csv", >"weightLogInfo_merged.csv") > > > >But I am getting the following error: > >Error: unexpected input in "rlySteps_merged.csv", >"minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", >"minuteIntensitiesNarrow_merged.csv", >"minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv" > > >(Maybe the R Studio free trial/usage is underpowered for my project?) > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
On 02/11/2021 6:30 p.m., gabrielle aban steinberg wrote:> Hello, I would like to merge 18 csv files into a master data csv file, but > each file has a different number of columns (mostly found in one or more of > the other cvs files) and different number of rows. > > I have tried something like the following in R Studio (cloud): > > all_data_fit_files <- rbind("dailyActivity_merged.csv", > "dailyCalories_merged.csv", "dailyIntensities_merged.csv", > "dailySteps_merged.csv", "heartrate_seconds_merged.csv", > "hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", > "hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", > "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", > "minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", > ?minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "weightLogInfo_merged.csv") >That just puts the names together. You need to read each file, figure out how the resulting dataframes get merged (they have different columns, so that will take some thinking), and then do it and write it out. Duncan Murdoch
Hi Gabrielle, I get the feeling that you are trying to merge data in which each file contains different variables, but the same subjects have contributed the data. This a very wild guess, but it may provide some insight. # assume that subjects are identified by a variable named "subjectID" # create a vector of all your filenames my_filenames<-c("dailyActivity_merged.csv", "dailyCalories_merged.csv", "dailyIntensities_merged.csv", ...) # step through the filenames, reading each one and merging it into the final data frame for(filename in my_filenames) { if(!exists(my_df)) my_df<-read.csv(filename) else { next_df<-read.csv(filename) my_df<-merge(my_df,next_df,by="subjectID",fill=TRUE) } } I doubt that this will work first time, but it will be a lot easier to debug than throwing it all into a black box and seeing what comes out. Jim On Thu, Nov 4, 2021 at 2:36 AM gabrielle aban steinberg <gabrielleabansteinberg at gmail.com> wrote:> > Hello, I would like to merge 18 csv files into a master data csv file, but > each file has a different number of columns (mostly found in one or more of > the other cvs files) and different number of rows. > > I have tried something like the following in R Studio (cloud): > > all_data_fit_files <- rbind("dailyActivity_merged.csv", > "dailyCalories_merged.csv", "dailyIntensities_merged.csv", > "dailySteps_merged.csv", "heartrate_seconds_merged.csv", > "hourlyCalories_merged.csv", "hourlyIntensities_merged.csv", > "hourlySteps_merged.csv", "minuteCaloriesNarrow_merged.csv", > "minuteCaloriesWide_merged.csv", "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv", > "minuteSleep_merged.csv", "minuteStepsNarrow_merged.csv", > ?minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "minuteStepsWide_merged.csv", "sleepDay_merged.csv", > "weightLogInfo_merged.csv") > > > > But I am getting the following error: > > Error: unexpected input in "rlySteps_merged.csv", > "minuteCaloriesNarrow_merged.csv", "minuteCaloriesWide_merged.csv", > "minuteIntensitiesNarrow_merged.csv", > "minuteIntensitiesWide_merged.csv", "minuteMETsNarrow_merged.csv" > > > (Maybe the R Studio free trial/usage is underpowered for my project?) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.