Dear users, I'm quite a new french R-user, and I have a problem about doing a correlation matrix. I have temperature data for each weather station of my study area and for each year (for example, a data file for the weather station N?1 for the year 2009, a data file for the N?2 for the year 2010, ....). So I have 70 weather stations with one data file per year since 2005. Each station has 4 temperature sensors. Each data file has exactly the same structure: date&hour, sensor1, sensor2, sensor3, sensor4. Here's an example: time sensor1 sensor2 sensor3sensor4 01/01/2008 00:00 -0.25 -2.43 -3.25 -2.37 01/01/2008 00:15 -0.18 -2.37 -3.18 -2.25 01/01/2008 00:30 -0.25 -2.5 -3.37 -2.56 01/01/2008 00:45 -0.25 -2.37 -3.31 -2.37 I need to do a matrix correlation between each same sensors of the different stations (one correlation matrix between all the sensors 1 of the 70 stations, another one for sensor 2, ...). I have to find for each year and each station the best correlation. For example, which one of the 70 weather stations is the most well correlated with station 1 for the sensor 1? and with station 2? ... and so one for each sensor and each station. Example: Sensor 1 for the year 2009 Station 1 Station 2 Station 3 [...] Station 1 1 0.910 0.748 Station 2 0.910 1 0.6 Station 3 0.748 0.6 1 [...] And the same for year 2005,2006,2007,2008,2009,2010,2011 for each of the 4 sensors. Have you got any idea how can I do this on R? Should I first merge all the sensors in one file or could I do it with data in separate files (like I have for the moment)? Thank you very much for all your answers! -- View this message in context: http://r.789695.n4.nabble.com/correlation-matrix-between-data-from-different-files-tp4552226p4552226.html Sent from the R help mailing list archive at Nabble.com.
Rui Barradas
2012-Apr-13 16:18 UTC
[R] correlation matrix between data from different files
Hello, jeff6868 wrote> > Dear users, > > I'm quite a new french R-user, and I have a problem about doing a > correlation matrix. > I have temperature data for each weather station of my study area and for > each year (for example, a data file for the weather station N?1 for the > year 2009, a data file for the N?2 for the year 2010, ....). So I have 70 > weather stations with one data file per year since 2005. Each station has > 4 temperature sensors. > Each data file has exactly the same structure: date&hour, sensor1, > sensor2, sensor3, sensor4. Here's an example: > > time sensor1 sensor2 sensor3sensor4 > 01/01/2008 00:00 -0.25 -2.43 -3.25 -2.37 > 01/01/2008 00:15 -0.18 -2.37 -3.18 -2.25 > 01/01/2008 00:30 -0.25 -2.5 -3.37 -2.56 > 01/01/2008 00:45 -0.25 -2.37 -3.31 -2.37 > > I need to do a matrix correlation between each same sensors of the > different stations (one correlation matrix between all the sensors 1 of > the 70 stations, another one for sensor 2, ...). > I have to find for each year and each station the best correlation. For > example, which one of the 70 weather stations is the most well correlated > with station 1 for the sensor 1? and with station 2? ... and so one for > each sensor and each station. > > Example: > > Sensor 1 for the year 2009 > > Station 1 Station 2 Station 3 [...] > Station 1 1 0.910 0.748 > Station 2 0.910 1 0.6 > Station 3 0.748 0.6 1 > [...] > > And the same for year 2005,2006,2007,2008,2009,2010,2011 for each of the 4 > sensors. > > Have you got any idea how can I do this on R? > Should I first merge all the sensors in one file or could I do it with > data in separate files (like I have for the moment)? > Thank you very much for all your answers! >You don't need to merge all files, but you must do some preprocessing. If you put all data of one year in a 3d array, then simply use 'cor'. I've made up some fake data, in files named "station1_2009.dat", etc (only 6 stations), each of them with the same number of observations. If you have 70 stations per year, you'll need an automated process to access them. Something like the function below would solve part of that problem. What follows assumes that the n. obs. is the same in all files. # This function gives file names with the pattern above filenames <- function(y, n=70){ tmp <- paste("station", seq_len(n), sep="") tmp <- paste(tmp, y, sep="_") paste(tmp, "dat", sep=".") } Sensors <- paste("sensor", 1:4, sep="") Stations <- paste("station", 1:6, sep="") nsensors <- length(Sensors) nstations <- length(Stations) year <- 2009 fnames <- filenames(year, nstations) # If nobs is the same in all files, any one will do. nobs <- nrow(read.table(fnames[1], header=TRUE)) yr2009 <- array(NA, dim=c(nobs, nsensors, nstations)) for(i in seq_len(nstations)){ tmp <- read.table(fnames[i], header=TRUE) yr2009[ , , i] <- as.matrix(tmp[, Sensors]) } dimnames(yr2009) <- list(seq.int(nobs), Sensors, Stations) # correlations for sensor 1 cor(yr2009[ , 1, ]) # a list of correlations for the 4 sensors cor2009 <- lapply(Sensors, function(s) cor(yr2009[ , s, ])) names(cor2009) <- Sensors cor2009$sensor1 Don't pay much attention to the files part, what's relevant is to create and fill the array. Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/correlation-matrix-between-data-from-different-files-tp4552226p4555317.html Sent from the R help mailing list archive at Nabble.com.