Jim Lemon
2022-Sep-01 09:31 UTC
[R] Combine two dataframe with different row number and interpolation between values
Hi Javad, You seem to have the data frame join worked out, so here is a function to interpolate over sequences of NAs. I have found it quite useful. # interpolate over sequences of NAs # NA sequences at the beginning of a file are replaced with the first nonNA value # NA sequences at the end are filled with the last nonNA value # sequences of all NAs are returned unaltered interpNA<-function(x) { if(any(is.na(x))) { xrle<-rle(is.na(x)) begin<-end<-1 # if the sequence begins with NA # set the initial run of NAs to the 1st non-NA if(is.na(x[1])) x[1:xrle$lengths[1]]<-x[xrle$lengths[1]+1] for(i in 1:length(xrle$values)) { # if this is a run of NA, fill it if(xrle$values[i]) { end<-begin+xrle$lengths[i]+(i>1) if(is.na(x[end])) x[begin:(end-1)]<-x[begin] else x[begin:end]<-seq(x[begin],x[end],length.out=1+end-begin) begin<-end+xrle$lengths[i]-1 } else { # set begin to the end of the non-NA values begin<-end+xrle$lengths[i]-1 } } } return(x) } Jim On Thu, Sep 1, 2022 at 3:29 PM javad bayat <j.bayat194 at gmail.com> wrote:> > Dear Tim; > The dplyr did not work for me. My data frames have exactly similar columns > but different row numbers. > Mr Petr sent this code and the code worked but it copied the second > dataframe into the first one and did not replace the corresponding row. > > df3 = merge(df1, df2, all = TRUE) > Regarding filling the "NA" data, it does not matter for me to interpolate > between numbers or put the mean of numbers. > Sincerely > > > > > > > On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote: > > > Can I interest you in the join functions in dplyr? > > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/ > > > > Filling in missing data is a useful practice when the fake (simulated) > > data is a small proportion of all data. When 2/3 of the data is fake one > > must wonder if anything based on those numbers is real or an artifact of > > assumptions made to generate the numbers. > > > > I deal with weather data and some weather stations are set up to average > > measurements between recorded values and others take a single reading at > > regular intervals. How you interpolate might depend on which option > > describes your data. The alternative is to use the method for recording ws > > and apply it to the other data that will be merged with these data. I > > assume there is more data, otherwise I see little point in expanding these > > values out. > > > > Tim > > > > -----Original Message----- > > From: R-help <r-help-bounces at r-project.org> On Behalf Of javad bayat > > Sent: Wednesday, August 31, 2022 2:09 AM > > To: r-help at r-project.org > > Subject: [R] Combine two dataframe with different row number and > > interpolation between values > > > > [External Email] > > > > Dear all, > > I am trying to combine two large dataframe in order to make a dataframe > > with exactly the dimension of the second dataframe. > > The first df is as follows: > > > > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d > > rep(c(1:365,1:365,1:365,1:365,1:365),each=8), > > h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by > > 3),seq(3,24, by = 3),seq(3,24, by = 3)),365), > > ws = rnorm(1:14600, mean=20)) > > > head(df1) > > y d h ws > > 1 2010 1 3 20.71488 > > 2 2010 1 6 19.70125 > > 3 2010 1 9 21.00180 > > 4 2010 1 12 20.29236 > > 5 2010 1 15 20.12317 > > 6 2010 1 18 19.47782 > > > > The data in the "ws" column were measured with 3 hours frequency and I > > need data with one hour frequency. I have made a second df as follows with > > one hour frequency for the "ws" column. > > > > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d > > rep(c(1:365,1:365,1:365,1:365,1:365),each=24), > > h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA") > > > head(df2) > > y d h ws > > 1 2010 1 1 NA > > 2 2010 1 2 NA > > 3 2010 1 3 NA > > 4 2010 1 4 NA > > 5 2010 1 5 NA > > 6 2010 1 6 NA > > > > What I am trying to do is combine these two dataframes so as to the rows in > > df1 (based on the values of "y", "d", "h" columns) that have values > > exactly similar to df2's rows copied in its place in the new df (df3). > > For example, in the first dataframe the first row was measured at 3 > > o'clock on the first day of 2010 and this row must be placed on the third > > row of the second dataframe which has a similar value (2010, 1, 3). Like > > the below > > table: > > y d h ws > > 1 2010 1 1 NA > > 2 2010 1 2 NA > > 3 2010 1 3 20.71488 > > 4 2010 1 4 NA > > 5 2010 1 5 NA > > 6 2010 1 6 19.70125 > > > > But regarding the values of the "ws" column for df2 that do not have value > > (at 4 and 5 o'clock), I need to interpolate between the before and after > > values to fill in the missing data of the "ws". > > I have tried the following codes but they did not work correctly. > > > > > df3 = merge(df1, df2, by = "y") > > Error: cannot allocate vector of size 487.9 Mb or > > > library(dplyr) > > > df3<- df1%>% full_join(df2) > > > > > > Is there any way to do this? > > Sincerely > > > > > > > > > > > > -- > > Best Regards > > Javad Bayat > > M.Sc. Environment Engineering > > Alternative Mail: bayat194 at yahoo.com > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D&reserved=0 > > PLEASE do read the posting guide > > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3D&reserved=0 > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Best Regards > Javad Bayat > M.Sc. Environment Engineering > Alternative Mail: bayat194 at yahoo.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
PIKAL Petr
2022-Sep-01 10:29 UTC
[R] Combine two dataframe with different row number and interpolation between values
Hi You could also use approxfun to fill NA values fff <- approxfun(1:dim(df3)[1], df3$ws.x) df3$ws2 <- fff(1:dim(df3)[1]) head(df3) y d h ws.x ws.y ws2 1 2010 1 1 NA NA NA 2 2010 1 2 NA NA NA 3 2010 1 3 20.46314 NA 20.46314 4 2010 1 4 NA NA 20.74001 5 2010 1 5 NA NA 21.01689 6 2010 1 6 21.29376 NA 21.29376 And fill starting or trailing NA by na.locf function from zoo package or change starting or trailing NA to some value before calling approxfun. Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Jim Lemon > Sent: Thursday, September 1, 2022 11:31 AM > To: javad bayat <j.bayat194 at gmail.com>; r-help mailing list <r-help at r- > project.org> > Subject: Re: [R] Combine two dataframe with different row number and > interpolation between values > > Hi Javad, > You seem to have the data frame join worked out, so here is a function > to interpolate over sequences of NAs. I have found it quite useful. > > # interpolate over sequences of NAs > # NA sequences at the beginning of a file are replaced with the first > nonNA value > # NA sequences at the end are filled with the last nonNA value > # sequences of all NAs are returned unaltered > interpNA<-function(x) { > if(any(is.na(x))) { > xrle<-rle(is.na(x)) > begin<-end<-1 > # if the sequence begins with NA > # set the initial run of NAs to the 1st non-NA > if(is.na(x[1])) x[1:xrle$lengths[1]]<-x[xrle$lengths[1]+1] > for(i in 1:length(xrle$values)) { > # if this is a run of NA, fill it > if(xrle$values[i]) { > end<-begin+xrle$lengths[i]+(i>1) > if(is.na(x[end])) x[begin:(end-1)]<-x[begin] > else x[begin:end]<-seq(x[begin],x[end],length.out=1+end-begin) > begin<-end+xrle$lengths[i]-1 > } > else { > # set begin to the end of the non-NA values > begin<-end+xrle$lengths[i]-1 > } > } > } > return(x) > } > > Jim > > On Thu, Sep 1, 2022 at 3:29 PM javad bayat <j.bayat194 at gmail.com> wrote: > > > > Dear Tim; > > The dplyr did not work for me. My data frames have exactly similarcolumns> > but different row numbers. > > Mr Petr sent this code and the code worked but it copied the second > > dataframe into the first one and did not replace the corresponding row. > > > df3 = merge(df1, df2, all = TRUE) > > Regarding filling the "NA" data, it does not matter for me tointerpolate> > between numbers or put the mean of numbers. > > Sincerely > > > > > > > > > > > > > > On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert at ufl.edu> > wrote: > > > > > Can I interest you in the join functions in dplyr? > > > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/ > > > > > > Filling in missing data is a useful practice when the fake (simulated) > > > data is a small proportion of all data. When 2/3 of the data is fakeone> > > must wonder if anything based on those numbers is real or an artifactof> > > assumptions made to generate the numbers. > > > > > > I deal with weather data and some weather stations are set up toaverage> > > measurements between recorded values and others take a single readingat> > > regular intervals. How you interpolate might depend on which option > > > describes your data. The alternative is to use the method forrecording ws> > > and apply it to the other data that will be merged with these data. I > > > assume there is more data, otherwise I see little point in expandingthese> > > values out. > > > > > > Tim > > > > > > -----Original Message----- > > > From: R-help <r-help-bounces at r-project.org> On Behalf Of javad bayat > > > Sent: Wednesday, August 31, 2022 2:09 AM > > > To: r-help at r-project.org > > > Subject: [R] Combine two dataframe with different row number and > > > interpolation between values > > > > > > [External Email] > > > > > > Dear all, > > > I am trying to combine two large dataframe in order to make adataframe> > > with exactly the dimension of the second dataframe. > > > The first df is as follows: > > > > > > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d > > > rep(c(1:365,1:365,1:365,1:365,1:365),each=8), > > > h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by > > > 3),seq(3,24, by = 3),seq(3,24, by = 3)),365), > > > ws = rnorm(1:14600, mean=20)) > > > > head(df1) > > > y d h ws > > > 1 2010 1 3 20.71488 > > > 2 2010 1 6 19.70125 > > > 3 2010 1 9 21.00180 > > > 4 2010 1 12 20.29236 > > > 5 2010 1 15 20.12317 > > > 6 2010 1 18 19.47782 > > > > > > The data in the "ws" column were measured with 3 hours frequency and I > > > need data with one hour frequency. I have made a second df as followswith> > > one hour frequency for the "ws" column. > > > > > > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d > > > rep(c(1:365,1:365,1:365,1:365,1:365),each=24), > > > h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA") > > > > head(df2) > > > y d h ws > > > 1 2010 1 1 NA > > > 2 2010 1 2 NA > > > 3 2010 1 3 NA > > > 4 2010 1 4 NA > > > 5 2010 1 5 NA > > > 6 2010 1 6 NA > > > > > > What I am trying to do is combine these two dataframes so as to therows in> > > df1 (based on the values of "y", "d", "h" columns) that have values > > > exactly similar to df2's rows copied in its place in the new df (df3). > > > For example, in the first dataframe the first row was measured at 3 > > > o'clock on the first day of 2010 and this row must be placed on thethird> > > row of the second dataframe which has a similar value (2010, 1, 3).Like> > > the below > > > table: > > > y d h ws > > > 1 2010 1 1 NA > > > 2 2010 1 2 NA > > > 3 2010 1 3 20.71488 > > > 4 2010 1 4 NA > > > 5 2010 1 5 NA > > > 6 2010 1 6 19.70125 > > > > > > But regarding the values of the "ws" column for df2 that do not havevalue> > > (at 4 and 5 o'clock), I need to interpolate between the before andafter> > > values to fill in the missing data of the "ws". > > > I have tried the following codes but they did not work correctly. > > > > > > > df3 = merge(df1, df2, by = "y") > > > Error: cannot allocate vector of size 487.9 Mb or > > > > library(dplyr) > > > > df3<- df1%>% full_join(df2) > > > > > > > > > Is there any way to do this? > > > Sincerely > > > > > > > > > > > > > > > > > > -- > > > Best Regards > > > Javad Bayat > > > M.Sc. Environment Engineering > > > Alternative Mail: bayat194 at yahoo.com > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.et > hz.ch%2Fmailman%2Flistinfo%2Fr- > help&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d90 > 8da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C6379 > 75232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&a > mp;sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D&a > mp;reserved=0 > > > PLEASE do read the posting guide > > > > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r- > project.org%2Fposting- > guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc > 23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7 > C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C% > 7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3 > D&reserved=0 > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > -- > > Best Regards > > Javad Bayat > > M.Sc. Environment Engineering > > Alternative Mail: bayat194 at yahoo.com > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.