Felix Wave
2007-Apr-12 12:47 UTC
[R] data file import - numbers and letters in a matrix(!)
Hello, I have a problem with the import of a date file. I seems verry tricky. I have a text file (end of the mail). Every file has a different number of measurments witch start with "START OF HEIGHT DATA" and ende with "END OF HEIGHT DATA". I imported the file in a matrix but the letters before the numbers are my problem (S= ,S=,x=,y=). Because through the letters and the space after "S=" I got a different number of columns in my matrix and with letters in my matrix I can't count. My question. Is it possible to import the file to got 3 columns only with numbers and no letters like x=, y=? Thank's a lot Felix My R Code: ---------- # na.strings = "S=" Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip = 20, what = character() ), 5063, 3, byrow = TRUE) Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE) My data file: ----------- FILEDATE:02.02.2007 ... START OF HEIGHT DATA S= 0 y=0.0 x=0.00000000 S= 0 y=0.1 x=0.00055643 ... S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0.00000000 S=10 y=0.1 x=0.00075557 ... S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ... START OF HEIGHT DATA S= 0 y=0.0 x=0.00000000 S= 0 y=0.1 x=0.00055643 The imported matrix:>[,1] [,2] [,3] [,4] [6,] "S=" "9" "y=4.9" "x=1.67278117" [7,] "S=" "9" "y=5.0" "x=1.74873257" [8,] "S=10" "y=0.0" "x=0.00000000" "S=10" [9,] "y=0.1" "x=0.00075557" "S=10" "y=0.2" [10,] "x=0.00277444" "S=10" "y=0.3" "x=0.00605958"
Gabor Grothendieck
2007-Apr-12 14:19 UTC
[R] data file import - numbers and letters in a matrix(!)
Try pasting this into an R session: Lines.raw <- "FILEDATE:02.02.2007 ... START OF HEIGHT DATA S= 0 y=0.0 x=0.00000000 S= 0 y=0.1 x=0.00055643 ... S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0.00000000 S=10 y=0.1 x=0.00075557 ... S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ... START OF HEIGHT DATA S= 0 y=0.0 x=0.00000000 S= 0 y=0.1 x=0.00055643 " # next line would be replaced by # somthing like: Lines <- readLines("myfile.dat") Lines <- readLines(textConnection(Lines.raw)) # extract those lines that contain an Lines <- grep("=", Lines, value = TRUE) # get col names by removing all but letters & spaces from line 1 cn <- gsub("[^a-zA-Z ]", "", Lines[1]) cn <- scan(textConnection(cn), what = "") # remove anything that is not a number, dot or space and read in Lines <- gsub("[^ .0-9]", "", Lines) DF <- read.table(textConnection(Lines), col.names = cn) closeAllConnections() DF On 4/12/07, Felix Wave <felix-wave at vr-web.de> wrote:> Hello, > I have a problem with the import of a date file. I seems verry tricky. > I have a text file (end of the mail). Every file has a different number of measurments > witch start with "START OF HEIGHT DATA" and ende with "END OF HEIGHT DATA". > > I imported the file in a matrix but the letters before the numbers are my problem > (S= ,S=,x=,y=). > Because through the letters and the space after "S=" I got a different number > of columns in my matrix and with letters in my matrix I can't count. > > > My question. Is it possible to import the file to got 3 columns only with numbers and > no letters like x=, y=? > > Thank's a lot > Felix > > > > > My R Code: > ---------- > > # na.strings = "S=" > > Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip = 20, what = character() ), 5063, 3, byrow = TRUE) > Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE) > > > > My data file: > ----------- > > FILEDATE:02.02.2007 > ... > > START OF HEIGHT DATA > S= 0 y=0.0 x=0.00000000 > S= 0 y=0.1 x=0.00055643 > ... > S= 9 y=4.9 x=1.67278117 > S= 9 y=5.0 x=1.74873257 > S=10 y=0.0 x=0.00000000 > S=10 y=0.1 x=0.00075557 > ... > S=99 y=5.3 x=1.94719490 > END OF HEIGHT DATA > ... > > START OF HEIGHT DATA > S= 0 y=0.0 x=0.00000000 > S= 0 y=0.1 x=0.00055643 > > > > The imported matrix: > > > [,1] [,2] [,3] [,4] > [6,] "S=" "9" "y=4.9" "x=1.67278117" > [7,] "S=" "9" "y=5.0" "x=1.74873257" > [8,] "S=10" "y=0.0" "x=0.00000000" "S=10" > [9,] "y=0.1" "x=0.00075557" "S=10" "y=0.2" > [10,] "x=0.00277444" "S=10" "y=0.3" "x=0.00605958" > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Adaikalavan Ramasamy
2007-Apr-12 15:34 UTC
[R] data file import - numbers and letters in a matrix(!)
Here is the contents of my "testdata.txt" : ----------------------------------------------------- START OF HEIGHT DATA S= 0 y=0.0 x=0.00000000 S= 0 y=0.1 x=0.00055643 S= 9 y=4.9 x=1.67278117 S= 9 y=5.0 x=1.74873257 S=10 y=0.0 x=0.00000000 S=10 y=0.1 x=0.00075557 S=99 y=5.3 x=1.94719490 END OF HEIGHT DATA ----------------------------------------------------- If you have access to a shell command, you can try changing the input file for read.delim using cat testdata.txt | grep -v "^START" | grep -v "^END" | sed 's/ //g' | sed 's/S=//' | sed 's/y=/\t/' | sed 's/x=/\t/' or here is my ugly fix in R my.read.file <- function(file=file){ v1 <- readLines( con=file, n=-1) v2 <- v1[ - grep( "^START|^END", v1 ) ] v3 <- gsub(" ", "", v2) v4 <- gsub( "S=|y=|x=", " ", v3 ) v5 <- gsub("^ ", "", v4) m <- t( sapply( strsplit(v5, split=" "), as.numeric ) ) colnames(m) <- c("S", "y", "x" ) return(m) } my.read.file( "testdata.txt" ) Regards, Adai Felix Wave wrote:> Hello, > I have a problem with the import of a date file. I seems verry tricky. > I have a text file (end of the mail). Every file has a different number of measurments > witch start with "START OF HEIGHT DATA" and ende with "END OF HEIGHT DATA". > > I imported the file in a matrix but the letters before the numbers are my problem > (S= ,S=,x=,y=). > Because through the letters and the space after "S=" I got a different number > of columns in my matrix and with letters in my matrix I can't count. > > > My question. Is it possible to import the file to got 3 columns only with numbers and > no letters like x=, y=? > > Thank's a lot > Felix > > > > > My R Code: > ---------- > > # na.strings = "S=" > > Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip = 20, what = character() ), 5063, 3, byrow = TRUE) > Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220, what = character() ), 5063, 3, byrow = TRUE) > > > > My data file: > ----------- > > FILEDATE:02.02.2007 > ... > > START OF HEIGHT DATA > S= 0 y=0.0 x=0.00000000 > S= 0 y=0.1 x=0.00055643 > ... > S= 9 y=4.9 x=1.67278117 > S= 9 y=5.0 x=1.74873257 > S=10 y=0.0 x=0.00000000 > S=10 y=0.1 x=0.00075557 > ... > S=99 y=5.3 x=1.94719490 > END OF HEIGHT DATA > ... > > START OF HEIGHT DATA > S= 0 y=0.0 x=0.00000000 > S= 0 y=0.1 x=0.00055643 > > > > The imported matrix: > [,1] [,2] [,3] [,4] > [6,] "S=" "9" "y=4.9" "x=1.67278117" > [7,] "S=" "9" "y=5.0" "x=1.74873257" > [8,] "S=10" "y=0.0" "x=0.00000000" "S=10" > [9,] "y=0.1" "x=0.00075557" "S=10" "y=0.2" > [10,] "x=0.00277444" "S=10" "y=0.3" "x=0.00605958" > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >
Possibly Parallel Threads
- GREP - Choosing values between two borders
- Dataimport with readLines using skip= and nlines= ?
- simple "for loop" program for merging datasets?
- strategy for writing out file with lines header initiated with comment sign
- Re : Bootstrap sampling for repeated measures