thr3ads.net - R help - [R] add an automatized linear regression in a function [May 2012]

If this information is useful, please help other people find it:
Share via:

jeff6868

2012-May-03 13:45 UTC

[R] add an automatized linear regression in a function

Dear R users,

For the moment, I have a script and a function which calculates correlation
matrices between all my data files. Then, it chooses the best correlation
for each data and take it in order to fill missing data in the analysed file
(so the data from the best correlation file is put automatically into the
missing data gaps of the first file (because my files are containing missing
values (NAs))). If the best correlated file doesn't contain data , it takes
the data from the second best correlated file. 
The problem is that for the moment, it takes raw data from the best
correlated file. 

So I need to adapt this raw data to the file that is going to be filled. As
a consequence, I'd like to automatize the calculation of a linear regression
(after the selection of the best or the second best correlated data file)
between the two files.
Instead of taking the raw data from the best correlated file to fill the
first one, it should take the estimated data from the regression to fill it
(in order to have more precise filled data). 
The idea is so to do an lm() between these two files, to extract the
coefficients of the straight line (from the regression) and to calculate the
estimated data for all my file (NA included), and finally to fill the gaps
with this estimated data. Hope you've understand my problem.
Here's the function:

process.all <- function(df.list, mat){
        f <- function(station)
             na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
             
        g <- function(station){
        x <- df.list[[station]]
        if(any(is.na(x$data))){
                mat[row(mat) == col(mat)] <- -Inf
                nas <- which(is.na(x$data))
                ord <- order(mat[station, ], decreasing = TRUE)[-c(1,
ncol(mat))]
                for(i in nas){
                        for(y in ord){
                                if(!is.na(df.list[[y]]$data[i])){
                                        x$data[i] <- df.list[[y]]$data[i]
                                        break
                                }
                        }
                }
        }
        x
    }                
        
        n <- length(df.list)
        nms <- names(df.list)
        max.cor <- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
        df.list <- lapply(seq.int(n), f)
        df.list <- lapply(seq.int(n), g) 
        names(df.list) <- nms
        df.list
    }

I succeded for a small data.frame I've created, but I don't know how to
do
it in this particular case.
Thanks a lot for your help!


--
View this message in context:
http://r.789695.n4.nabble.com/add-an-automatized-linear-regression-in-a-function-tp4606047.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-May-04 16:34 UTC

head link

[R] add an automatized linear regression in a function

Em 04-05-2012 11:00, jeff6868 <geoffrey_klein at etu.u-bourgogne.fr>
escreveu:> Date: Thu, 3 May 2012 06:45:59 -0700 (PDT)
> From: jeff6868<geoffrey_klein at etu.u-bourgogne.fr>
> To:r-help at r-project.org
> Subject: [R] add an automatized linear regression in a function
> Message-ID:<1336052759474-4606047.post at n4.nabble.com>
> Content-Type: text/plain; charset=us-ascii
>
> Dear R users,
>
> For the moment, I have a script and a function which calculates correlation
> matrices between all my data files. Then, it chooses the best correlation
> for each data and take it in order to fill missing data in the analysed
file
> (so the data from the best correlation file is put automatically into the
> missing data gaps of the first file (because my files are containing
missing
> values (NAs))). If the best correlated file doesn't contain data , it
takes
> the data from the second best correlated file.
> The problem is that for the moment, it takes raw data from the best
> correlated file.
>
> So I need to adapt this raw data to the file that is going to be filled. As
> a consequence, I'd like to automatize the calculation of a linear
regression
> (after the selection of the best or the second best correlated data file)
> between the two files.
> Instead of taking the raw data from the best correlated file to fill the
> first one, it should take the estimated data from the regression to fill it
> (in order to have more precise filled data).
> The idea is so to do an lm() between these two files, to extract the
> coefficients of the straight line (from the regression) and to calculate
the
> estimated data for all my file (NA included), and finally to fill the gaps
> with this estimated data. Hope you've understand my problem.
> Here's the function:
>
> process.all<- function(df.list, mat){
>          f<- function(station)
>               na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
>
>          g<- function(station){
>          x<- df.list[[station]]
>          if(any(is.na(x$data))){
>                  mat[row(mat) == col(mat)]<- -Inf
>                  nas<- which(is.na(x$data))
>                  ord<- order(mat[station, ], decreasing = TRUE)[-c(1,
> ncol(mat))]
>                  for(i in nas){
>                          for(y in ord){
>                                  if(!is.na(df.list[[y]]$data[i])){
>                                          x$data[i]<-
df.list[[y]]$data[i]
>                                          break
>                                  }
>                          }
>                  }
>          }
>          x
>      }
>
>          n<- length(df.list)
>          nms<- names(df.list)
>          max.cor<- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
>          df.list<- lapply(seq.int(n), f)
>          df.list<- lapply(seq.int(n), g)
>          names(df.list)<- nms
>          df.list
>      }
>
> I succeded for a small data.frame I've created, but I don't know
how to do
> it in this particular case.
> Thanks a lot for your help!
>Statistically speaking, I don't believe in what you want, but a solution 
could be

na.fill <- function(x, y){
     i <- is.na(x$data)
     xx <- y$data
     new <- data.frame(xx=xx)
     x$data[i] <- predict(lm(x$data~xx, na.action=na.exclude), new)[i]
     x
}

and in process.all, change function g() to

     g <- function(station){
         x <- df.list[[station]]
         if(any(is.na(x$data))){
             mat[row(mat) == col(mat)] <- -Inf
             nas <- which(is.na(x$data))
             ord <- order(mat[station, ], decreasing = TRUE)[-c(1, 
ncol(mat))]
             for(y in ord){
                 if(all(!is.na(df.list[[y]]$data[nas]))){
                     xx <- df.list[[y]]$data
                     new <- data.frame(xx=xx)
                     x$data[nas] <- predict(lm(x$data~xx, 
na.action=na.exclude), new)[nas]
                     break
                 }
             }
         }
         x
     }


Hope this helps,

Rui Barradas

Rui Barradas

2012-May-04 16:45 UTC

head link

[R] add an automatized linear regression in a function

Em 04-05-2012 11:00, jeff6868 <geoffrey_klein at etu.u-bourgogne.fr>
escreveu:> Date: Thu, 3 May 2012 06:45:59 -0700 (PDT)
> From: jeff6868<geoffrey_klein at etu.u-bourgogne.fr>
> To:r-help at r-project.org
> Subject: [R] add an automatized linear regression in a function
> Message-ID:<1336052759474-4606047.post at n4.nabble.com>
> Content-Type: text/plain; charset=us-ascii
>
> Dear R users,
>
> For the moment, I have a script and a function which calculates correlation
> matrices between all my data files. Then, it chooses the best correlation
> for each data and take it in order to fill missing data in the analysed
file
> (so the data from the best correlation file is put automatically into the
> missing data gaps of the first file (because my files are containing
missing
> values (NAs))). If the best correlated file doesn't contain data , it
takes
> the data from the second best correlated file.
> The problem is that for the moment, it takes raw data from the best
> correlated file.
>
> So I need to adapt this raw data to the file that is going to be filled. As
> a consequence, I'd like to automatize the calculation of a linear
regression
> (after the selection of the best or the second best correlated data file)
> between the two files.
> Instead of taking the raw data from the best correlated file to fill the
> first one, it should take the estimated data from the regression to fill it
> (in order to have more precise filled data).
> The idea is so to do an lm() between these two files, to extract the
> coefficients of the straight line (from the regression) and to calculate
the
> estimated data for all my file (NA included), and finally to fill the gaps
> with this estimated data. Hope you've understand my problem.
> Here's the function:
>
> process.all<- function(df.list, mat){
>          f<- function(station)
>               na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
>
>          g<- function(station){
>          x<- df.list[[station]]
>          if(any(is.na(x$data))){
>                  mat[row(mat) == col(mat)]<- -Inf
>                  nas<- which(is.na(x$data))
>                  ord<- order(mat[station, ], decreasing = TRUE)[-c(1,
> ncol(mat))]
>                  for(i in nas){
>                          for(y in ord){
>                                  if(!is.na(df.list[[y]]$data[i])){
>                                          x$data[i]<-
df.list[[y]]$data[i]
>                                          break
>                                  }
>                          }
>                  }
>          }
>          x
>      }
>
>          n<- length(df.list)
>          nms<- names(df.list)
>          max.cor<- sapply(seq.int(n), get.max.cor, corhiver2008capt1)
>          df.list<- lapply(seq.int(n), f)
>          df.list<- lapply(seq.int(n), g)
>          names(df.list)<- nms
>          df.list
>      }
>
> I succeded for a small data.frame I've created, but I don't know
how to do
> it in this particular case.
> Thanks a lot for your help!
>Statistically speaking, I don't believe in what you want, but a solution 
could be

na.fill <- function(x, y){
     i <- is.na(x$data)
     xx <- y$data
     new <- data.frame(xx=xx)
     x$data[i] <- predict(lm(x$data~xx, na.action=na.exclude), new)[i]
     x
}

and in process.all, change function g() to

     g <- function(station){
         x <- df.list[[station]]
         if(any(is.na(x$data))){
             mat[row(mat) == col(mat)] <- -Inf
             nas <- which(is.na(x$data))
             ord <- order(mat[station, ], decreasing = TRUE)[-c(1, 
ncol(mat))]
             for(y in ord){
                 if(all(!is.na(df.list[[y]]$data[nas]))){
                     xx <- df.list[[y]]$data
                     new <- data.frame(xx=xx)
                     x$data[nas] <- predict(lm(x$data~xx, 
na.action=na.exclude), new)[nas]
                     break
                 }
             }
         }
         x
     }


Hope this helps,

Rui Barradas

Reasonably Related Threads

Search for more apparently analagous threads

R help - May 2012 - add an automatized linear regression in a function

[R] add an automatized linear regression in a function

[R] add an automatized linear regression in a function

[R] add an automatized linear regression in a function

Reasonably Related Threads