Felix Wave
2007-Apr-12 12:47 UTC
[R] data file import - numbers and letters in a matrix(!)
Hello,
I have a problem with the import of a date file. I seems verry tricky.
I have a text file (end of the mail). Every file has a different number of
measurments
witch start with "START OF HEIGHT DATA" and ende with "END OF
HEIGHT DATA".
I imported the file in a matrix but the letters before the numbers are my
problem
(S= ,S=,x=,y=).
Because through the letters and the space after "S=" I got a different
number
of columns in my matrix and with letters in my matrix I can't count.
My question. Is it possible to import the file to got 3 columns only with
numbers and
no letters like x=, y=?
Thank's a lot
Felix
My R Code:
----------
# na.strings = "S="
Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip = 20, what
= character() ), 5063, 3, byrow = TRUE)
Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220, what
= character() ), 5063, 3, byrow = TRUE)
My data file:
-----------
FILEDATE:02.02.2007
...
START OF HEIGHT DATA
S= 0 y=0.0 x=0.00000000
S= 0 y=0.1 x=0.00055643
...
S= 9 y=4.9 x=1.67278117
S= 9 y=5.0 x=1.74873257
S=10 y=0.0 x=0.00000000
S=10 y=0.1 x=0.00075557
...
S=99 y=5.3 x=1.94719490
END OF HEIGHT DATA
...
START OF HEIGHT DATA
S= 0 y=0.0 x=0.00000000
S= 0 y=0.1 x=0.00055643
The imported matrix: >
[,1] [,2] [,3] [,4]
[6,] "S=" "9" "y=4.9"
"x=1.67278117"
[7,] "S=" "9" "y=5.0"
"x=1.74873257"
[8,] "S=10" "y=0.0" "x=0.00000000"
"S=10"
[9,] "y=0.1" "x=0.00075557" "S=10"
"y=0.2"
[10,] "x=0.00277444" "S=10" "y=0.3"
"x=0.00605958"
Gabor Grothendieck
2007-Apr-12 14:19 UTC
[R] data file import - numbers and letters in a matrix(!)
Try pasting this into an R session:
Lines.raw <- "FILEDATE:02.02.2007
...
START OF HEIGHT DATA
S= 0 y=0.0 x=0.00000000
S= 0 y=0.1 x=0.00055643
...
S= 9 y=4.9 x=1.67278117
S= 9 y=5.0 x=1.74873257
S=10 y=0.0 x=0.00000000
S=10 y=0.1 x=0.00075557
...
S=99 y=5.3 x=1.94719490
END OF HEIGHT DATA
...
START OF HEIGHT DATA
S= 0 y=0.0 x=0.00000000
S= 0 y=0.1 x=0.00055643
"
# next line would be replaced by
# somthing like: Lines <- readLines("myfile.dat")
Lines <- readLines(textConnection(Lines.raw))
# extract those lines that contain an Lines <- grep("=", Lines,
value = TRUE)
# get col names by removing all but letters & spaces from line 1
cn <- gsub("[^a-zA-Z ]", "", Lines[1])
cn <- scan(textConnection(cn), what = "")
# remove anything that is not a number, dot or space and read in
Lines <- gsub("[^ .0-9]", "", Lines)
DF <- read.table(textConnection(Lines), col.names = cn)
closeAllConnections()
DF
On 4/12/07, Felix Wave <felix-wave at vr-web.de>
wrote:> Hello,
> I have a problem with the import of a date file. I seems verry tricky.
> I have a text file (end of the mail). Every file has a different number of
measurments
> witch start with "START OF HEIGHT DATA" and ende with "END
OF HEIGHT DATA".
>
> I imported the file in a matrix but the letters before the numbers are my
problem
> (S= ,S=,x=,y=).
> Because through the letters and the space after "S=" I got a
different number
> of columns in my matrix and with letters in my matrix I can't count.
>
>
> My question. Is it possible to import the file to got 3 columns only with
numbers and
> no letters like x=, y=?
>
> Thank's a lot
> Felix
>
>
>
>
> My R Code:
> ----------
>
> # na.strings = "S="
>
> Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip = 20,
what = character() ), 5063, 3, byrow = TRUE)
> Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220,
what = character() ), 5063, 3, byrow = TRUE)
>
>
>
> My data file:
> -----------
>
> FILEDATE:02.02.2007
> ...
>
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
> ...
> S= 9 y=4.9 x=1.67278117
> S= 9 y=5.0 x=1.74873257
> S=10 y=0.0 x=0.00000000
> S=10 y=0.1 x=0.00075557
> ...
> S=99 y=5.3 x=1.94719490
> END OF HEIGHT DATA
> ...
>
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
>
>
>
> The imported matrix:
> >
> [,1] [,2] [,3] [,4]
> [6,] "S=" "9" "y=4.9"
"x=1.67278117"
> [7,] "S=" "9" "y=5.0"
"x=1.74873257"
> [8,] "S=10" "y=0.0"
"x=0.00000000" "S=10"
> [9,] "y=0.1" "x=0.00075557" "S=10"
"y=0.2"
> [10,] "x=0.00277444" "S=10" "y=0.3"
"x=0.00605958"
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Adaikalavan Ramasamy
2007-Apr-12 15:34 UTC
[R] data file import - numbers and letters in a matrix(!)
Here is the contents of my "testdata.txt" :
-----------------------------------------------------
START OF HEIGHT DATA
S= 0 y=0.0 x=0.00000000
S= 0 y=0.1 x=0.00055643
S= 9 y=4.9 x=1.67278117
S= 9 y=5.0 x=1.74873257
S=10 y=0.0 x=0.00000000
S=10 y=0.1 x=0.00075557
S=99 y=5.3 x=1.94719490
END OF HEIGHT DATA
-----------------------------------------------------
If you have access to a shell command, you can try changing the input
file for read.delim using
cat testdata.txt | grep -v "^START" | grep -v "^END" | sed
's/ //g' |
sed 's/S=//' | sed 's/y=/\t/' | sed 's/x=/\t/'
or here is my ugly fix in R
my.read.file <- function(file=file){
v1 <- readLines( con=file, n=-1)
v2 <- v1[ - grep( "^START|^END", v1 ) ]
v3 <- gsub(" ", "", v2)
v4 <- gsub( "S=|y=|x=", " ", v3 )
v5 <- gsub("^ ", "", v4)
m <- t( sapply( strsplit(v5, split=" "), as.numeric ) )
colnames(m) <- c("S", "y", "x" )
return(m)
}
my.read.file( "testdata.txt" )
Regards, Adai
Felix Wave wrote:> Hello,
> I have a problem with the import of a date file. I seems verry tricky.
> I have a text file (end of the mail). Every file has a different number of
measurments
> witch start with "START OF HEIGHT DATA" and ende with "END
OF HEIGHT DATA".
>
> I imported the file in a matrix but the letters before the numbers are my
problem
> (S= ,S=,x=,y=).
> Because through the letters and the space after "S=" I got a
different number
> of columns in my matrix and with letters in my matrix I can't count.
>
>
> My question. Is it possible to import the file to got 3 columns only with
numbers and
> no letters like x=, y=?
>
> Thank's a lot
> Felix
>
>
>
>
> My R Code:
> ----------
>
> # na.strings = "S="
>
> Measure1 <- matrix(scan("data.dat", n= 5063 * 4, skip = 20,
what = character() ), 5063, 3, byrow = TRUE)
> Measure2 <- matrix(scan("data.dat", n= 5063 * 4, skip = 5220,
what = character() ), 5063, 3, byrow = TRUE)
>
>
>
> My data file:
> -----------
>
> FILEDATE:02.02.2007
> ...
>
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
> ...
> S= 9 y=4.9 x=1.67278117
> S= 9 y=5.0 x=1.74873257
> S=10 y=0.0 x=0.00000000
> S=10 y=0.1 x=0.00075557
> ...
> S=99 y=5.3 x=1.94719490
> END OF HEIGHT DATA
> ...
>
> START OF HEIGHT DATA
> S= 0 y=0.0 x=0.00000000
> S= 0 y=0.1 x=0.00055643
>
>
>
> The imported matrix:
> [,1] [,2] [,3] [,4]
> [6,] "S=" "9" "y=4.9"
"x=1.67278117"
> [7,] "S=" "9" "y=5.0"
"x=1.74873257"
> [8,] "S=10" "y=0.0"
"x=0.00000000" "S=10"
> [9,] "y=0.1" "x=0.00075557" "S=10"
"y=0.2"
> [10,] "x=0.00277444" "S=10" "y=0.3"
"x=0.00605958"
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
Maybe Matching Threads
- GREP - Choosing values between two borders
- Dataimport with readLines using skip= and nlines= ?
- simple "for loop" program for merging datasets?
- strategy for writing out file with lines header initiated with comment sign
- Re : Bootstrap sampling for repeated measures