Dear R-WinEdit users, I got a simple question, but somehow I cannot find the answer, although I have tried a lot! I got an ASCII-file and I want to import it into R, so that every character is defined by [i;j]. The rows are not of the same length. the file looks like the following shortened abstract example: name: xxxxx xxxx age: 9.9.99 record number: 999 title: xxxxx xxxx xxx keywords: xxx xx "white space" name: yyyy yyyyyyyyyyyy age: 8.8.88 record number: 8 title: yyyy yy yyyy keywords: yyyyyyyyyyy yyyyyyyy yyy "white space" I would be very grateful for your help! Michael Graber michael.graber at mail.uni-wuerzburg.de
Dear R-WinEdit users, I have a simple question, but somehow I cannot find the answer even though I tried a lot! I have an unstructured ASCII-file and I want to import it into a matrix m in R, so that every character is defined by m[i;j]. The rows are not of the same length. The file looks like the following shortened abstract example: name: xxxxx xxxx age: 9.9.99 record number: 999 title: xxxxx xxxx xxx keywords: xxx xx "white space" name: yyyy yyyyyyyyyyyy age: 8.8.88 record number: 8 title: yyyy yy yyyy keywords: yyyyyyyyyyy yyyyyyyy yyy "white space" The result should be for example: m[1;1]=n I would be very grateful for your help! Michael Graber michael.graber at mail.uni-wuerzburg.de
Michael: Ah ... the bane of real data analysts everywhere: getting the data from its original format into (R )- usable form for data analysis This has nothing to do with R-WinEdit, AFAICS. My approach would be to simply use readLines() to read the lines in as character strings and then process them by grep and/or regexpr() to extract the bits I wanted. If the formatting is fixed, substring() may also be useful. You will also need to convert the resulting character representation of numerics to numerics via as.numeric(). If you haven't worked through regular expressions before (?regexp), you will find this a bit of a chore; but it is well worth the effort, as they are invaluable for this sort of thing. There are numerous web tutorials to help (google on 'regular expressions'). Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Michael Graber > Sent: Tuesday, May 10, 2005 8:45 AM > To: r-help at stat.math.ethz.ch > Subject: [R] converting an ASCII file to a matrix > > Dear R-WinEdit users, > > I got a simple question, but somehow I cannot find the > answer, although > I have > tried a lot! > I got an ASCII-file and I want to import it into R, so that every > character is defined by [i;j]. > The rows are not of the same length. > > the file looks like the following shortened abstract example: > > name: xxxxx xxxx > age: 9.9.99 > record number: 999 > title: xxxxx xxxx xxx > keywords: xxx xx > > "white space" > > name: yyyy yyyyyyyyyyyy > age: 8.8.88 > record number: 8 > title: yyyy yy yyyy > keywords: yyyyyyyyyyy yyyyyyyy yyy > > "white space" > I would be very grateful for your help! > > Michael Graber > michael.graber at mail.uni-wuerzburg.de > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Michael Graber wrote:> Dear R-WinEdit users,a) What is an R-WinEdit user? b) I guess you mean R-WinEdt (without an i) implying the plug-in for the WinEdt editor? WinEdit is another editor that does not support R very closely, AFAIK. c) The following questions are completely unrelated to any editor, so why do you ask only a very small (empty?) subset of the R community?> I have a simple question, but somehow I cannot find the answer even > though I tried a lot! > > I have an unstructured ASCII-file and I want to import it into a matrix > m in R, so that every character is defined by m[i;j]. The rows are notd) What does m[i;j] mean? If we are speaking R, I guess you mean m[i,j]?> of the same length. > > The file looks like the following shortened abstract example: > > name: xxxxx xxxx > age: 9.9.99 > record number: 999 > title: xxxxx xxxx xxx > keywords: xxx xx > > "white space" > > name: yyyy yyyyyyyyyyyy > age: 8.8.88 > record number: 8 > title: yyyy yy yyyy > keywords: yyyyyyyyyyy yyyyyyyy yyy > > "white space" > > The result should be for example: m[1;1]=nSo what aboutreading all lines, and storing separate characters as vectors in a list using strsplit(). L <- strsplit(readLines(filename), "") L[[i]][j] A matrix seems to be the wrong way with unequal line lengths. Uwe Ligges> I would be very grateful for your help! > > Michael Graber > michael.graber at mail.uni-wuerzburg.de > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
Uwe Ligges wrote:> Michael Graber wrote: > >> Dear R-WinEdit users, > > > a) What is an R-WinEdit user? > > b) I guess you mean R-WinEdt (without an i) implying the plug-in for the > WinEdt editor? WinEdit is another editor that does not support R very > closely, AFAIK. > > c) The following questions are completely unrelated to any editor, so > why do you ask only a very small (empty?) subset of the R community? > > >> I have a simple question, but somehow I cannot find the answer even >> though I tried a lot! >> >> I have an unstructured ASCII-file and I want to import it into a matrix >> m in R, so that every character is defined by m[i;j]. The rows are not > > > d) What does m[i;j] mean? If we are speaking R, I guess you mean m[i,j]? > > >> of the same length. >> >> The file looks like the following shortened abstract example: >> >> name: xxxxx xxxx >> age: 9.9.99 >> record number: 999 >> title: xxxxx xxxx xxx >> keywords: xxx xx >> >> "white space" >> >> name: yyyy yyyyyyyyyyyy >> age: 8.8.88 >> record number: 8 >> title: yyyy yy yyyy >> keywords: yyyyyyyyyyy yyyyyyyy yyy >> >> "white space" >> >> The result should be for example: m[1;1]=n > > > So what aboutreading all lines, and storing separate characters as > vectors in a list using strsplit(). > L <- strsplit(readLines(filename), "") > L[[i]][j] > > A matrix seems to be the wrong way with unequal line lengths.Let me add, what about reading it in using read.dcf(), a function that is designed for the data specified above! And much more appropriate than looking at single characters, I think. Uwe Ligges> Uwe Ligges > > >> I would be very grateful for your help! >> >> Michael Graber >> michael.graber at mail.uni-wuerzburg.de >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html > > >
This seems to work but it's a bit ugly with the loop (I'm sure you could replace the loop with "apply"). asc2mat <- function(fname) { x <- sapply(scan(fname, "character", sep="\n"), strsplit, "") rlen <- sapply(x, length) res <- matrix(nrow=length(bar), ncol=max(rlen)) for (i in 1:nrow(res)) { res[i,1:rlen[i]] <- x[[i]] } return(res) } Norm Olsen -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Berton Gunter Sent: Tuesday, May 10, 2005 9:07 AM To: 'Michael Graber'; r-help at stat.math.ethz.ch Subject: RE: [R] converting an ASCII file to a matrix Michael: Ah ... the bane of real data analysts everywhere: getting the data from its original format into (R )- usable form for data analysis This has nothing to do with R-WinEdit, AFAICS. My approach would be to simply use readLines() to read the lines in as character strings and then process them by grep and/or regexpr() to extract the bits I wanted. If the formatting is fixed, substring() may also be useful. You will also need to convert the resulting character representation of numerics to numerics via as.numeric(). If you haven't worked through regular expressions before (?regexp), you will find this a bit of a chore; but it is well worth the effort, as they are invaluable for this sort of thing. There are numerous web tutorials to help (google on 'regular expressions'). Cheers, -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Michael Graber > Sent: Tuesday, May 10, 2005 8:45 AM > To: r-help at stat.math.ethz.ch > Subject: [R] converting an ASCII file to a matrix > > Dear R-WinEdit users, > > I got a simple question, but somehow I cannot find the answer, > although I have tried a lot! > I got an ASCII-file and I want to import it into R, so that every > character is defined by [i;j]. > The rows are not of the same length. > > the file looks like the following shortened abstract example: > > name: xxxxx xxxx > age: 9.9.99 > record number: 999 > title: xxxxx xxxx xxx > keywords: xxx xx > > "white space" > > name: yyyy yyyyyyyyyyyy > age: 8.8.88 > record number: 8 > title: yyyy yy yyyy > keywords: yyyyyyyyyyy yyyyyyyy yyy > > "white space" > I would be very grateful for your help! > > Michael Graber > michael.graber at mail.uni-wuerzburg.de > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html