Eva Nordstrom
2010-Sep-12 17:58 UTC
[R] using read.table, removing extra quotation mark from a text field? (e.g. ""cat" )
I am using read.table to import a text file within R. There are several "errors" in my text file. An "extra" quotation mark has inadvertently been included within a few text fields. e.g. for a pipe (|) delimited text file, I have something similar to this: 1|7|30| "dog" 2|6|25| ""cat" 3|4|20|"" 4|5| 56| "mouse" 5|3|56| ""horse" 6|56| "" In the above example| there are extra quotation marks within the fields for cat and horse. (row 2 and row 5) e.g. "cat , "horse One solution is to simply edit the text file and remove the extra quotation mark. Is there a "good solution" I can implement form within R? I am OK with just importing the extra quotation marks and having nit show up as part of the text field within R. e.g, "cat "horse Thanks. [[alternative HTML version deleted]]
Hi, I have a list of several hundred 2 dimensional matrices, where each matrix is n x m. What I need to do is that for each n,m I need an average over all the lists. This would collapse it down to just one nxm matrix. Any easy ways to do that? As always, I'd like to avoid a for loop to keep computational time low! Thanks again everyone! Cheers, G
jim holtman
2010-Sep-12 19:58 UTC
[R] using read.table, removing extra quotation mark from a text field? (e.g. ""cat" )
You can use the 'gsub' command to remove the quote marks. You could readLines/writeLines the file to clean it up with gsub before using read.table on it so it can all be done within R. On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom <eva.nordstrom at yahoo.com> wrote:> I am using read.table to import a text file within R. > > There are several "errors" in my text file.? An "extra" quotation mark has > inadvertently been included within a few text fields. > > > e.g. for a pipe (|) delimited text file, I have something similar to this: > > 1|7|30| "dog" > 2|6|25| ""cat" > 3|4|20|"" > 4|5| 56| "mouse" > 5|3|56| ""horse" > 6|56| "" > > In the above example| there are extra quotation marks within the fields for cat > and horse. (row 2 and row 5) > > e.g. "cat , "horse > > One solution is to simply edit the text file and remove the extra quotation > mark. > > Is there a "good solution" I can implement form within R? > > I am OK with just importing the extra quotation marks and having nit show up as > part of the text field within R. > > e.g, > "cat > "horse > > Thanks. > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Hi, Thanks, I was using the square brackets instead of "(". The "(" makes it work. However, some of my matrices have NA for some values. I need those NA's to basically not be counted. And if all the lists have NA for a specific (n,m), I want it to remain an (n,m). By using the Reduce('+', mymats), any entries with NA in at least one of the lists will give me back an NA for the sum (rather than making it NA if it at least on list has an NA for that element). Thanks again for everyone's help! Kind regards, Greg On Sep 12, 2010, at 4:54 PM, Dennis Murphy wrote:> Hi: > > Here's a more concrete example of Phil's point. > > mymats <- vector('list', 5) > set.seed(246) > > # Generate a list of five 3 x 3 matrices > for(i in 1:5) mymats[[i]] <- matrix(sample(1:9), nrow = 3) > # Sum them elementwise > Reduce('+', mymats) > [,1] [,2] [,3] > [1,] 23 33 15 > [2,] 25 36 26 > [3,] 20 24 23 > > HTH, > Dennis > > On Sun, Sep 12, 2010 at 12:52 PM, Gregory Ryslik <rsaber@comcast.net> wrote: > Hi, > > Doing that I get the following: > > Browse[2]> Reduce["+",results] > Error in Reduce["+", results] : > object of type 'closure' is not subsettable > > You want parentheses there, not brackets; you're asking R to subset a function, Reduce, which is an object of type closure. Hopefully the error message makes more sense now. > > > Thanks again! > > Kind regards, > Greg > On Sep 12, 2010, at 3:49 PM, Phil Spector wrote: > > > Gregory - > > Suppose your list is called "mymats". Then > > > > Reduce("+",mymats) > > > > does what you want. > > - Phil > > > > > > On Sun, 12 Sep 2010, Gregory Ryslik wrote: > > > >> Hi, > >> > >> I have a list of several hundred 2 dimensional matrices, where each matrix is n x m. What I need to do is that for each n,m I need an average over all the lists. This would collapse it down to just one nxm matrix. Any easy ways to do that? As always, I'd like to avoid a for loop to keep computational time low! Thanks again everyone! > >> > >> Cheers, > >> G > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dennis Murphy
2010-Sep-13 01:42 UTC
[R] using read.table, removing extra quotation mark from a text field? (e.g. ""cat" )
Hi: Thanks to Jakson Aquino, who showed me how to do a proper text substitution, we have a way out. It also turns out that in the last line, the last numeric field was missing, so I inserted an NA| in the last line of the data file before calling readLines(). His (correct) code is at the bottom of the mail. The first two lines of code below are courtesy of Jakson. Afterward, I tried to shape the result into a data frame for export as a flat file. There's an interesting lesson to be (re)learned in the process, so bear with me. Input file file1.txt (revised): 1|7|30| "dog" 2|6|25| ""cat" 3|4|20|"" 4|5| 56| "mouse" 5|3|56| ""horse" 6|56|NA| "" x <- readLines("file1.txt") y <- sub('""(.)', '"\\1', x) d <- do.call(rbind, strsplit(y, split = '\\|')) d <- as.data.frame(d) d V1 V2 V3 V4 1 1 7 30 "dog" 2 2 6 25 "cat" 3 3 4 20 "" 4 4 5 56 "mouse" 5 5 3 56 "horse" 6 6 56 NA ""> str(d)'data.frame': 6 obs. of 4 variables: $ V1: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 $ V2: Factor w/ 6 levels "3","4","5","56",..: 6 5 2 3 1 4 $ V3: Factor w/ 6 levels " 56","20","25",..: 4 3 2 1 5 6 $ V4: Factor w/ 6 levels " \"\""," \"cat\"",..: 3 2 6 5 4 1 Everything is a factor, as it should be since we converted a character matrix into a data frame. Now convert the factors to numeric and character and write out to a file. d$V1 <- as.numeric(d$V1) d$V2 <- as.numeric(d$V2) d$V3 <- as.numeric(d$V3) d$V4 <- as.character(d$V4) d V1 V2 V3 V4 1 1 6 4 "dog" 2 2 5 3 "cat" 3 3 2 2 "" 4 4 3 1 "mouse" 5 5 1 5 "horse" 6 6 4 6 "" Oopsie. We got the numeric factor codes back in V2 and V3. The FAQ 7.10 trap... # Back to the drawing board. d <- do.call(rbind, strsplit(y, split = '\\|')) d <- as.data.frame(d) d1 <- d d1$V1 <- as.numeric(as.character(d1$V1)) d1$V2 <- as.numeric(as.character(d1$V2)) d1$V3 <- as.numeric(as.character(d1$V3)) d1$V4 <- as.character(as.character(d1$V4))> d1V1 V2 V3 V4 1 1 7 30 "dog" 2 2 6 25 "cat" 3 3 4 20 "" 4 4 5 56 "mouse" 5 5 3 56 "horse" 6 6 56 NA "" Much better. Let's double check that we're OK. str(d1) 'data.frame': 6 obs. of 4 variables: $ V1: num 1 2 3 4 5 6 $ V2: num 7 6 4 5 3 56 $ V3: num 30 25 20 56 56 NA $ V4: chr " \"dog\"" " \"cat\"" "\"\"" " \"mouse\"" ... # NOW write it out... write.table(d1, file = 'file3.dat', quote = FALSE) # looks good And that's why FAQ 7.10 is written the way it is. If one is happy with y (just the paired double quotes removed), then Jakson's final line is sufficient: writeLines(y, "file2.txt") Dennis On Sun, Sep 12, 2010 at 5:05 PM, Jakson A. Aquino <jaksonaquino@gmail.com>wrote:> On Sun, Sep 12, 2010 at 7:27 PM, Dennis Murphy <djmuser@gmail.com> wrote: > > Hi: > > > > On Sun, Sep 12, 2010 at 1:05 PM, Wil M Contreras Arbaje < > > wil.contreras@gmail.com> wrote: > > > >> While you are looking for a solution within R, it might be simpler to > open > >> your text file in almost any free text editor (Notepad++, Textwrangler, > >> Smultron, vim come to mind), and do Replace all "' for ". > > > > > > There's one problem with that solution: if the character string at the > end > > of the line is blank (i.e., ""), then your suggestion will leave one > double > > quote at the end of a line. Not good. What is needed is a gsub that takes > > two double quotes plus a wild card character and replaces it with one > double > > quote and a wild card character. If you have an editor that can do that, > let > > me know...seriously. I suspect emacs can do this, but none of the basic > > editors I know have that capability. > > > > Dennis > > > > > >> > >> > >> On Sep 12, 2010, at 3:58 PM, jim holtman wrote: > >> > >> You can use the 'gsub' command to remove the quote marks. You could > >>> readLines/writeLines the file to clean it up with gsub before using > >>> read.table on it so it can all be done within R. > >>> > >>> On Sun, Sep 12, 2010 at 1:58 PM, Eva Nordstrom < > eva.nordstrom@yahoo.com> > >>> wrote: > >>> > >>>> I am using read.table to import a text file within R. > >>>> > >>>> There are several "errors" in my text file. An "extra" quotation mark > >>>> has > >>>> inadvertently been included within a few text fields. > >>>> > >>>> > >>>> e.g. for a pipe (|) delimited text file, I have something similar to > >>>> this: > >>>> > >>>> 1|7|30| "dog" > >>>> 2|6|25| ""cat" > >>>> 3|4|20|"" > >>>> 4|5| 56| "mouse" > >>>> 5|3|56| ""horse" > >>>> 6|56| "" > > x <- readLines("file1.txt") > y <- sub('""(.)', '"\\1', x) > writeLines(y, "file2.txt") >[[alternative HTML version deleted]]