javad bayat
2022-Sep-01 05:23 UTC
[R] Combine two dataframe with different row number and interpolation between values
Dear Tim; The dplyr did not work for me. My data frames have exactly similar columns but different row numbers. Mr Petr sent this code and the code worked but it copied the second dataframe into the first one and did not replace the corresponding row.> df3 = merge(df1, df2, all = TRUE)Regarding filling the "NA" data, it does not matter for me to interpolate between numbers or put the mean of numbers. Sincerely On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert at ufl.edu> wrote:> Can I interest you in the join functions in dplyr? > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/ > > Filling in missing data is a useful practice when the fake (simulated) > data is a small proportion of all data. When 2/3 of the data is fake one > must wonder if anything based on those numbers is real or an artifact of > assumptions made to generate the numbers. > > I deal with weather data and some weather stations are set up to average > measurements between recorded values and others take a single reading at > regular intervals. How you interpolate might depend on which option > describes your data. The alternative is to use the method for recording ws > and apply it to the other data that will be merged with these data. I > assume there is more data, otherwise I see little point in expanding these > values out. > > Tim > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of javad bayat > Sent: Wednesday, August 31, 2022 2:09 AM > To: r-help at r-project.org > Subject: [R] Combine two dataframe with different row number and > interpolation between values > > [External Email] > > Dear all, > I am trying to combine two large dataframe in order to make a dataframe > with exactly the dimension of the second dataframe. > The first df is as follows: > > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d > rep(c(1:365,1:365,1:365,1:365,1:365),each=8), > h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by > 3),seq(3,24, by = 3),seq(3,24, by = 3)),365), > ws = rnorm(1:14600, mean=20)) > > head(df1) > y d h ws > 1 2010 1 3 20.71488 > 2 2010 1 6 19.70125 > 3 2010 1 9 21.00180 > 4 2010 1 12 20.29236 > 5 2010 1 15 20.12317 > 6 2010 1 18 19.47782 > > The data in the "ws" column were measured with 3 hours frequency and I > need data with one hour frequency. I have made a second df as follows with > one hour frequency for the "ws" column. > > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d > rep(c(1:365,1:365,1:365,1:365,1:365),each=24), > h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA") > > head(df2) > y d h ws > 1 2010 1 1 NA > 2 2010 1 2 NA > 3 2010 1 3 NA > 4 2010 1 4 NA > 5 2010 1 5 NA > 6 2010 1 6 NA > > What I am trying to do is combine these two dataframes so as to the rows in > df1 (based on the values of "y", "d", "h" columns) that have values > exactly similar to df2's rows copied in its place in the new df (df3). > For example, in the first dataframe the first row was measured at 3 > o'clock on the first day of 2010 and this row must be placed on the third > row of the second dataframe which has a similar value (2010, 1, 3). Like > the below > table: > y d h ws > 1 2010 1 1 NA > 2 2010 1 2 NA > 3 2010 1 3 20.71488 > 4 2010 1 4 NA > 5 2010 1 5 NA > 6 2010 1 6 19.70125 > > But regarding the values of the "ws" column for df2 that do not have value > (at 4 and 5 o'clock), I need to interpolate between the before and after > values to fill in the missing data of the "ws". > I have tried the following codes but they did not work correctly. > > > df3 = merge(df1, df2, by = "y") > Error: cannot allocate vector of size 487.9 Mb or > > library(dplyr) > > df3<- df1%>% full_join(df2) > > > Is there any way to do this? > Sincerely > > > > > > -- > Best Regards > Javad Bayat > M.Sc. Environment Engineering > Alternative Mail: bayat194 at yahoo.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D&reserved=0 > PLEASE do read the posting guide > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3D&reserved=0 > and provide commented, minimal, self-contained, reproducible code. >-- Best Regards Javad Bayat M.Sc. Environment Engineering Alternative Mail: bayat194 at yahoo.com [[alternative HTML version deleted]]
Jim Lemon
2022-Sep-01 09:31 UTC
[R] Combine two dataframe with different row number and interpolation between values
Hi Javad,
You seem to have the data frame join worked out, so here is a function
to interpolate over sequences of NAs. I have found it quite useful.
# interpolate over sequences of NAs
# NA sequences at the beginning of a file are replaced with the first
nonNA value
# NA sequences at the end are filled with the last nonNA value
# sequences of all NAs are returned unaltered
interpNA<-function(x) {
if(any(is.na(x))) {
xrle<-rle(is.na(x))
begin<-end<-1
# if the sequence begins with NA
# set the initial run of NAs to the 1st non-NA
if(is.na(x[1])) x[1:xrle$lengths[1]]<-x[xrle$lengths[1]+1]
for(i in 1:length(xrle$values)) {
# if this is a run of NA, fill it
if(xrle$values[i]) {
end<-begin+xrle$lengths[i]+(i>1)
if(is.na(x[end])) x[begin:(end-1)]<-x[begin]
else x[begin:end]<-seq(x[begin],x[end],length.out=1+end-begin)
begin<-end+xrle$lengths[i]-1
}
else {
# set begin to the end of the non-NA values
begin<-end+xrle$lengths[i]-1
}
}
}
return(x)
}
Jim
On Thu, Sep 1, 2022 at 3:29 PM javad bayat <j.bayat194 at gmail.com>
wrote:>
> Dear Tim;
> The dplyr did not work for me. My data frames have exactly similar columns
> but different row numbers.
> Mr Petr sent this code and the code worked but it copied the second
> dataframe into the first one and did not replace the corresponding row.
> > df3 = merge(df1, df2, all = TRUE)
> Regarding filling the "NA" data, it does not matter for me to
interpolate
> between numbers or put the mean of numbers.
> Sincerely
>
>
>
>
>
>
> On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert at
ufl.edu> wrote:
>
> > Can I interest you in the join functions in dplyr?
> > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/
> >
> > Filling in missing data is a useful practice when the fake (simulated)
> > data is a small proportion of all data. When 2/3 of the data is fake
one
> > must wonder if anything based on those numbers is real or an artifact
of
> > assumptions made to generate the numbers.
> >
> > I deal with weather data and some weather stations are set up to
average
> > measurements between recorded values and others take a single reading
at
> > regular intervals. How you interpolate might depend on which option
> > describes your data. The alternative is to use the method for
recording ws
> > and apply it to the other data that will be merged with these data. I
> > assume there is more data, otherwise I see little point in expanding
these
> > values out.
> >
> > Tim
> >
> > -----Original Message-----
> > From: R-help <r-help-bounces at r-project.org> On Behalf Of
javad bayat
> > Sent: Wednesday, August 31, 2022 2:09 AM
> > To: r-help at r-project.org
> > Subject: [R] Combine two dataframe with different row number and
> > interpolation between values
> >
> > [External Email]
> >
> > Dear all,
> > I am trying to combine two large dataframe in order to make a
dataframe
> > with exactly the dimension of the second dataframe.
> > The first df is as follows:
> >
> > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d
> > rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
> > h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by >
> 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
> > ws = rnorm(1:14600, mean=20))
> > > head(df1)
> > y d h ws
> > 1 2010 1 3 20.71488
> > 2 2010 1 6 19.70125
> > 3 2010 1 9 21.00180
> > 4 2010 1 12 20.29236
> > 5 2010 1 15 20.12317
> > 6 2010 1 18 19.47782
> >
> > The data in the "ws" column were measured with 3 hours
frequency and I
> > need data with one hour frequency. I have made a second df as follows
with
> > one hour frequency for the "ws" column.
> >
> > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d
> > rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
> > h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
> > > head(df2)
> > y d h ws
> > 1 2010 1 1 NA
> > 2 2010 1 2 NA
> > 3 2010 1 3 NA
> > 4 2010 1 4 NA
> > 5 2010 1 5 NA
> > 6 2010 1 6 NA
> >
> > What I am trying to do is combine these two dataframes so as to the
rows in
> > df1 (based on the values of "y", "d",
"h" columns) that have values
> > exactly similar to df2's rows copied in its place in the new df
(df3).
> > For example, in the first dataframe the first row was measured at 3
> > o'clock on the first day of 2010 and this row must be placed on
the third
> > row of the second dataframe which has a similar value (2010, 1, 3).
Like
> > the below
> > table:
> > y d h ws
> > 1 2010 1 1 NA
> > 2 2010 1 2 NA
> > 3 2010 1 3 20.71488
> > 4 2010 1 4 NA
> > 5 2010 1 5 NA
> > 6 2010 1 6 19.70125
> >
> > But regarding the values of the "ws" column for df2 that do
not have value
> > (at 4 and 5 o'clock), I need to interpolate between the before and
after
> > values to fill in the missing data of the "ws".
> > I have tried the following codes but they did not work correctly.
> >
> > > df3 = merge(df1, df2, by = "y")
> > Error: cannot allocate vector of size 487.9 Mb or
> > > library(dplyr)
> > > df3<- df1%>% full_join(df2)
> >
> >
> > Is there any way to do this?
> > Sincerely
> >
> >
> >
> >
> >
> > --
> > Best Regards
> > Javad Bayat
> > M.Sc. Environment Engineering
> > Alternative Mail: bayat194 at yahoo.com
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
> >
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D&reserved=0
> > PLEASE do read the posting guide
> >
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Best Regards
> Javad Bayat
> M.Sc. Environment Engineering
> Alternative Mail: bayat194 at yahoo.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
PIKAL Petr
2022-Sep-01 09:33 UTC
[R] Combine two dataframe with different row number and interpolation between values
Hallo
This code do the merging without repeating the NA row.
df3 = merge(df1, df2, by=c("y", "d", "h"), all.y =
TRUE)
Cheers
Petr
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of javad
bayat
> Sent: Thursday, September 1, 2022 7:24 AM
> To: Ebert,Timothy Aaron <tebert at ufl.edu>
> Cc: r-help at r-project.org
> Subject: Re: [R] Combine two dataframe with different row number and
> interpolation between values
>
> Dear Tim;
> The dplyr did not work for me. My data frames have exactly similar columns
but> different row numbers.
> Mr Petr sent this code and the code worked but it copied the second
dataframe> into the first one and did not replace the corresponding row.
> > df3 = merge(df1, df2, all = TRUE)
> Regarding filling the "NA" data, it does not matter for me to
interpolate
between> numbers or put the mean of numbers.
> Sincerely
>
>
>
>
>
>
> On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert at
ufl.edu>
> wrote:
>
> > Can I interest you in the join functions in dplyr?
> > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/
> >
> > Filling in missing data is a useful practice when the fake (simulated)
> > data is a small proportion of all data. When 2/3 of the data is fake
> > one must wonder if anything based on those numbers is real or an
> > artifact of assumptions made to generate the numbers.
> >
> > I deal with weather data and some weather stations are set up to
> > average measurements between recorded values and others take a single
> > reading at regular intervals. How you interpolate might depend on
> > which option describes your data. The alternative is to use the method
> > for recording ws and apply it to the other data that will be merged
> > with these data. I assume there is more data, otherwise I see little
> > point in expanding these values out.
> >
> > Tim
> >
> > -----Original Message-----
> > From: R-help <r-help-bounces at r-project.org> On Behalf Of
javad bayat
> > Sent: Wednesday, August 31, 2022 2:09 AM
> > To: r-help at r-project.org
> > Subject: [R] Combine two dataframe with different row number and
> > interpolation between values
> >
> > [External Email]
> >
> > Dear all,
> > I am trying to combine two large dataframe in order to make a
> > dataframe with exactly the dimension of the second dataframe.
> > The first df is as follows:
> >
> > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d
> > = rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
> > h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by >
> 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
> > ws = rnorm(1:14600, mean=20))
> > > head(df1)
> > y d h ws
> > 1 2010 1 3 20.71488
> > 2 2010 1 6 19.70125
> > 3 2010 1 9 21.00180
> > 4 2010 1 12 20.29236
> > 5 2010 1 15 20.12317
> > 6 2010 1 18 19.47782
> >
> > The data in the "ws" column were measured with 3 hours
frequency and I
> > need data with one hour frequency. I have made a second df as follows
> > with one hour frequency for the "ws" column.
> >
> > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d
> > = rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
> > h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
> > > head(df2)
> > y d h ws
> > 1 2010 1 1 NA
> > 2 2010 1 2 NA
> > 3 2010 1 3 NA
> > 4 2010 1 4 NA
> > 5 2010 1 5 NA
> > 6 2010 1 6 NA
> >
> > What I am trying to do is combine these two dataframes so as to the
> > rows in
> > df1 (based on the values of "y", "d",
"h" columns) that have values
> > exactly similar to df2's rows copied in its place in the new df
(df3).
> > For example, in the first dataframe the first row was measured at 3
> > o'clock on the first day of 2010 and this row must be placed on
the
> > third row of the second dataframe which has a similar value (2010, 1,
> > 3). Like the below
> > table:
> > y d h ws
> > 1 2010 1 1 NA
> > 2 2010 1 2 NA
> > 3 2010 1 3 20.71488
> > 4 2010 1 4 NA
> > 5 2010 1 5 NA
> > 6 2010 1 6 19.70125
> >
> > But regarding the values of the "ws" column for df2 that do
not have
> > value (at 4 and 5 o'clock), I need to interpolate between the
before
> > and after values to fill in the missing data of the "ws".
> > I have tried the following codes but they did not work correctly.
> >
> > > df3 = merge(df1, df2, by = "y")
> > Error: cannot allocate vector of size 487.9 Mb or
> > > library(dplyr)
> > > df3<- df1%>% full_join(df2)
> >
> >
> > Is there any way to do this?
> > Sincerely
> >
> >
> >
> >
> >
> > --
> > Best Regards
> > Javad Bayat
> > M.Sc. Environment Engineering
> > Alternative Mail: bayat194 at yahoo.com
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> > .ethz.ch%2Fmailman%2Flistinfo%2Fr-
> help&data=05%7C01%7Ctebert%40ufl
> >
> .edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace6
> 0a62331e
> >
> 1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiMC4w
> >
> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
> %7C%7C
> >
> &sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D
> &rese
> > rved=0
> > PLEASE do read the posting guide
> >
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> > -project.org%2Fposting-
> guide.html&data=05%7C01%7Ctebert%40ufl.edu%
> >
> 7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a623
> 31e1b84%
> >
> 7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwM
> >
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7
> C&
> >
> sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3D&r
> eserved=0
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Best Regards
> Javad Bayat
> M.Sc. Environment Engineering
> Alternative Mail: bayat194 at yahoo.com
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.