thr3ads.net - R help - [R] reading long matrix [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Colin Beale

2005-Dec-22 17:06 UTC

[R] reading long matrix

Hi,

I'm needing some help finding a function to read a large text file into an
array in R. The data are essentially presence / absence / na data for many
species and come as a grid with each species name (after two spaces) at the
beginning of the matrix defining the map for that species. An excerpt could
therefore be:

  SPECIES1
999001099
900110109
011101000
901100101
110100019
901110019

  SPECIES2
999000099
900110119
011101100
901010101
110000019
900000019

  SPECIES3
999001099
900100109
011100010
901100100
110100019
901110019

where 9 is actually na, 0 is absence and 1 presence. The final array I want to
create should have dimensions that are the x and y coordinates and the number of
species (known in advance). (In this example dim = c(9,6,3)). It would be sort
of neat if the code could also read the species name into the appropriate names
attribute, but this is a refinement that I could probably do if someone can help
me read the data into R and into an array in the first place. I'm currently
thinking a line by line approach using readLines might be the best option, but
I've got a very long file - well over 100 species, each a matrix of 70 x 100
datapoints. making this option rther time consuming, I expect - especially as
the next dataset has 1300 species and a much larger grid...

Any hints would be gratefully recieved.

Colin Beale
Macaulay Land Use Research Institute

jim holtman

2005-Dec-22 19:07 UTC

head link

[R] reading long matrix

Here is a way of reading the data into a 'list'.  You can convert the
list
to any array of the proper dimensions.
> input <- scan('/tempxx.txt.r', what='')
Read 21 items> input [1] "SPECIES1"  "999001099" "900110109"
"011101000" "901100101" "110100019"
 [7] "901110019" "SPECIES2"  "999000099"
"900110119" "011101100" "901010101"
[13] "110000019" "900000019" "SPECIES3" 
"999001099" "900100109" "011100010"
[19] "901100100" "110100019"
"901110019"> # find the names
> breaks <- grep("[[:alpha:]][[:alnum:]]+", input)
> # determine the sizes
> map <- cbind(breaks, diff(c(breaks, length(input)+1)))
> out <- list()
> # repeat for each data block
> for (i in 1:nrow(map)){+     .set <- NULL
+     for (j in 1:(map[i, 2] - 1)){
+         .set <- rbind(.set, strsplit(input[map[i, 1] + j],
'')[[1]])
+     }
+     out[[input[map[i, 1]]]] <- .set
+ }> out$SPECIES1
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "9"  "9"  "9"  "0"  "0" 
"1"  "0"  "9"  "9"
[2,] "9"  "0"  "0"  "1"  "1" 
"0"  "1"  "0"  "9"
[3,] "0"  "1"  "1"  "1"  "0" 
"1"  "0"  "0"  "0"
[4,] "9"  "0"  "1"  "1"  "0" 
"0"  "1"  "0"  "1"
[5,] "1"  "1"  "0"  "1"  "0" 
"0"  "0"  "1"  "9"
[6,] "9"  "0"  "1"  "1"  "1" 
"0"  "0"  "1"  "9"

$SPECIES2
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "9"  "9"  "9"  "0"  "0" 
"0"  "0"  "9"  "9"
[2,] "9"  "0"  "0"  "1"  "1" 
"0"  "1"  "1"  "9"
[3,] "0"  "1"  "1"  "1"  "0" 
"1"  "1"  "0"  "0"
[4,] "9"  "0"  "1"  "0"  "1" 
"0"  "1"  "0"  "1"
[5,] "1"  "1"  "0"  "0"  "0" 
"0"  "0"  "1"  "9"
[6,] "9"  "0"  "0"  "0"  "0" 
"0"  "0"  "1"  "9"

$SPECIES3
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "9"  "9"  "9"  "0"  "0" 
"1"  "0"  "9"  "9"
[2,] "9"  "0"  "0"  "1"  "0" 
"0"  "1"  "0"  "9"
[3,] "0"  "1"  "1"  "1"  "0" 
"0"  "0"  "1"  "0"
[4,] "9"  "0"  "1"  "1"  "0" 
"0"  "1"  "0"  "0"
[5,] "1"  "1"  "0"  "1"  "0" 
"0"  "0"  "1"  "9"
[6,] "9"  "0"  "1"  "1"  "1" 
"0"  "0"  "1"  "9"
>


On 12/22/05, Colin Beale <c.beale@macaulay.ac.uk>
wrote:>
> Hi,
>
> I'm needing some help finding a function to read a large text file into
an
> array in R. The data are essentially presence / absence / na data for many
> species and come as a grid with each species name (after two spaces) at the
> beginning of the matrix defining the map for that species. An excerpt could
> therefore be:
>
> SPECIES1
> 999001099
> 900110109
> 011101000
> 901100101
> 110100019
> 901110019
>
> SPECIES2
> 999000099
> 900110119
> 011101100
> 901010101
> 110000019
> 900000019
>
> SPECIES3
> 999001099
> 900100109
> 011100010
> 901100100
> 110100019
> 901110019
>
> where 9 is actually na, 0 is absence and 1 presence. The final array I
> want to create should have dimensions that are the x and y coordinates and
> the number of species (known in advance). (In this example dim = c(9,6,3)).
> It would be sort of neat if the code could also read the species name into
> the appropriate names attribute, but this is a refinement that I could
> probably do if someone can help me read the data into R and into an array
in
> the first place. I'm currently thinking a line by line approach using
> readLines might be the best option, but I've got a very long file -
well
> over 100 species, each a matrix of 70 x 100 datapoints. making this option
> rther time consuming, I expect - especially as the next dataset has 1300
> species and a much larger grid...
>
> Any hints would be gratefully recieved.
>
> Colin Beale
> Macaulay Land Use Research Institute
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>


--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?

	[[alternative HTML version deleted]]

Liaw, Andy

2005-Dec-22 19:13 UTC

head link

[R] reading long matrix

Here's one possibility, if you know the number of species and the numbers of
rows and columns before hand, and the dimension for all species are the
same.

readSpeciesMap <- function(fname, nspecies, nr, nc) {
    spcnames <- character(nspecies)
    spcdata <- array(0, c(nc, nr, nspecies))
    ## open the file for reading, and close it upon exit.
    f <- file(fname, open="r")
    on.exit(close(f))
    for (i in seq(along=spcnames)) {
        ## read the name
        spcnames[i] <- readLines(f, 1)[[1]]
        ## read the grid
        spcdata[, , i] <- as.numeric(unlist(strsplit(readLines(f, nr),
"")))
        ## pick up the empty line
        readLines(f, 1)
    }
    ## replace the 9s with NAs
    spcdata[spcdata == 9] <- NA
    dimnames(spcdata)[[3]] <- spcnames
    ## "transpose" the array in each species
    aperm(spcdata, c(2, 1, 3))
}

Using the example you supplied (saved in the file "species.txt"):
> readSpeciesMap("species.txt", 3, 6, 9), ,   SPECIES1

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]   NA   NA   NA    0    0    1    0   NA   NA
[2,]   NA    0    0    1    1    0    1    0   NA
[3,]    0    1    1    1    0    1    0    0    0
[4,]   NA    0    1    1    0    0    1    0    1
[5,]    1    1    0    1    0    0    0    1   NA
[6,]   NA    0    1    1    1    0    0    1   NA

, ,   SPECIES2

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]   NA   NA   NA    0    0    0    0   NA   NA
[2,]   NA    0    0    1    1    0    1    1   NA
[3,]    0    1    1    1    0    1    1    0    0
[4,]   NA    0    1    0    1    0    1    0    1
[5,]    1    1    0    0    0    0    0    1   NA
[6,]   NA    0    0    0    0    0    0    1   NA

, ,   SPECIES3

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]   NA   NA   NA    0    0    1    0   NA   NA
[2,]   NA    0    0    1    0    0    1    0   NA
[3,]    0    1    1    1    0    0    0    1    0
[4,]   NA    0    1    1    0    0    1    0    0
[5,]    1    1    0    1    0    0    0    1   NA
[6,]   NA    0    1    1    1    0    0    1   NA

Andy


From: Colin Beale> 
> Hi,
> 
> I'm needing some help finding a function to read a large text 
> file into an array in R. The data are essentially presence / 
> absence / na data for many species and come as a grid with 
> each species name (after two spaces) at the beginning of the 
> matrix defining the map for that species. An excerpt could 
> therefore be:
> 
>   SPECIES1
> 999001099
> 900110109
> 011101000
> 901100101
> 110100019
> 901110019
> 
>   SPECIES2
> 999000099
> 900110119
> 011101100
> 901010101
> 110000019
> 900000019
> 
>   SPECIES3
> 999001099
> 900100109
> 011100010
> 901100100
> 110100019
> 901110019
> 
> where 9 is actually na, 0 is absence and 1 presence. The 
> final array I want to create should have dimensions that are 
> the x and y coordinates and the number of species (known in 
> advance). (In this example dim = c(9,6,3)). It would be sort 
> of neat if the code could also read the species name into the 
> appropriate names attribute, but this is a refinement that I 
> could probably do if someone can help me read the data into R 
> and into an array in the first place. I'm currently thinking 
> a line by line approach using readLines might be the best 
> option, but I've got a very long file - well over 100 
> species, each a matrix of 70 x 100 datapoints. making this 
> option rther time consuming, I expect - especially as the 
> next dataset has 1300 species and a much larger grid...
> 
> Any hints would be gratefully recieved.
> 
> Colin Beale
> Macaulay Land Use Research Institute
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Gabor Grothendieck

2005-Dec-23 01:53 UTC

head link

[R] reading long matrix

One way to do this is to use read.fwf.  I have borrowed Jim's
use of scan and use a similar calculation to get the indexes
of the breaks, breaks.  We then determine the common number
of rows and columns in each species.

The second group of statements replaces all 9's with spaces
so that upon parsing them as numbers they will be NAs and then sets
up a text connection to the resulting character vector.  These are then
read in by read.fwf, nr rows at a time and the result is
unlist'ed to a numeric vector, nums.  The last statement
reshapes it into an array and adds the species names as
the last dimension names.

# read data in
L <- scan("clipboard", what = "")
breaks <- grep("^[[:alpha:]]", L)
nr <- breaks[2] - breaks[1] - 1; nc <- nchar(L[2])

# parse numbers
n <- length(L[-breaks]) / nr
con <- textConnection(gsub("9", " ", L[-breaks]))
nums <- unlist(replicate(n, read.fwf(con, widths = rep(1, nc), n = nr)))
result <- array(nums, c(6,9,3), c(NULL, NULL, L[breaks]))


On 12/22/05, Colin Beale <c.beale at macaulay.ac.uk>
wrote:> Hi,
>
> I'm needing some help finding a function to read a large text file into
an array in R. The data are essentially presence / absence / na data for many
species and come as a grid with each species name (after two spaces) at the
beginning of the matrix defining the map for that species. An excerpt could
therefore be:
>
>  SPECIES1
> 999001099
> 900110109
> 011101000
> 901100101
> 110100019
> 901110019
>
>  SPECIES2
> 999000099
> 900110119
> 011101100
> 901010101
> 110000019
> 900000019
>
>  SPECIES3
> 999001099
> 900100109
> 011100010
> 901100100
> 110100019
> 901110019
>
> where 9 is actually na, 0 is absence and 1 presence. The final array I want
to create should have dimensions that are the x and y coordinates and the number
of species (known in advance). (In this example dim = c(9,6,3)). It would be
sort of neat if the code could also read the species name into the appropriate
names attribute, but this is a refinement that I could probably do if someone
can help me read the data into R and into an array in the first place. I'm
currently thinking a line by line approach using readLines might be the best
option, but I've got a very long file - well over 100 species, each a matrix
of 70 x 100 datapoints. making this option rther time consuming, I expect -
especially as the next dataset has 1300 species and a much larger grid...
>
> Any hints would be gratefully recieved.
>
> Colin Beale
> Macaulay Land Use Research Institute
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Reasonably Related Threads

Search for more maybe matching threads

R help - Dec 2005 - reading long matrix

[R] reading long matrix

[R] reading long matrix

[R] reading long matrix

[R] reading long matrix

Reasonably Related Threads