Hi all, I have to filter a tab-delimited text file like below: "GeneNames" "value1" "value2" "log2(Fold_change)" "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) normalized) > 4)" ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 TRUE ENSG00000177133 142 2 5.46771720082336 5.13545298955309 FALSE ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 TRUE ENSG00000009724 10 162 -4.69995182667858 -5.03221603794886 FALSE ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 TRUE based on the last column (TRUE), and then write to a new text file, meaning I should get something like below: "GeneNames" "value1" "value2" "log2(Fold_change)" "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) normalized) > 4)" ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 TRUE ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 TRUE ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 TRUE I used read.table and write.table but I am still not very satisfied with the results. Here is what I did: expFC <- read.table( "test.txt", header=T, sep="\t" ) expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",] write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" ) Result: "GeneNames" "value1" "value2" "log2.Fold_change." "log2.Fold_change..normalized" "Signature.abs.log2.Fold_change..normalized....4." "ENSG00000209350" 4 35 -3.81131293562629 -4.14357714689656 TRUE "ENSG00000116285" 115 1669 -4.54130810709955 -4.87357231836982 TRUE "ENSG00000162460" 3 31 -4.05126372834704 -4.38352793961731 TRUE As you can see, there are two points: 1. The headers were altered. All the special characters were converted to dot (.). 2. The gene names (first column) were quoted (which were not in the original file). The second point is not very annoying, but the first one is. How do I get exact the headers like the original file? Thanks, D.
Duke - One possibility is to check the help files for the functions involved to see if there are options to control this behaviour. For example, the check.names= argument to read.table, or the quote= argument to write.table. How about expFC <- read.table("test.txt", header=TRUE, sep="\t", check.names=FALSE) expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",] write.table(expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t", quote=FALSE ) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Fri, 10 Sep 2010, Duke wrote:> Hi all, > > I have to filter a tab-delimited text file like below: > > "GeneNames" "value1" "value2" "log2(Fold_change)" > "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) > normalized) > 4)" > ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 TRUE > ENSG00000177133 142 2 5.46771720082336 5.13545298955309 FALSE > ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 > TRUE > ENSG00000009724 10 162 -4.69995182667858 -5.03221603794886 > FALSE > ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 TRUE > > based on the last column (TRUE), and then write to a new text file, meaning I > should get something like below: > > "GeneNames" "value1" "value2" "log2(Fold_change)" > "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) > normalized) > 4)" > ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 TRUE > ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 > TRUE > ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 TRUE > > I used read.table and write.table but I am still not very satisfied with the > results. Here is what I did: > > expFC <- read.table( "test.txt", header=T, sep="\t" ) > expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",] > write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" ) > > Result: > > "GeneNames" "value1" "value2" "log2.Fold_change." > "log2.Fold_change..normalized" > "Signature.abs.log2.Fold_change..normalized....4." > "ENSG00000209350" 4 35 -3.81131293562629 -4.14357714689656 > TRUE > "ENSG00000116285" 115 1669 -4.54130810709955 -4.87357231836982 > TRUE > "ENSG00000162460" 3 31 -4.05126372834704 -4.38352793961731 > TRUE > > As you can see, there are two points: > > 1. The headers were altered. All the special characters were converted to dot > (.). > 2. The gene names (first column) were quoted (which were not in the original > file). > > The second point is not very annoying, but the first one is. How do I get > exact the headers like the original file? > > Thanks, > > D. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Phil, On 9/10/10 1:45 PM, Phil Spector wrote:> Duke - > One possibility is to check the help files for the functions > involved to see if there are options to control this behaviour. > For example, the check.names= argument to read.table, or the quote= > argument to write.table. How aboutYes, I did before posting question to the list. But somehow I missed (or misunderstood) the check.names option. As about quote=FALSE option for write.table, it does not work as I want, since all the headers are unquoted too.> > expFC <- read.table("test.txt", header=TRUE, sep="\t", check.names=FALSE) > expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",] > write.table(expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, > sep="\t", quote=FALSE )This works perfectly and solves the first issue. Thanks so much Phil. D.> > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spector at stat.berkeley.edu > > > On Fri, 10 Sep 2010, Duke wrote: > >> Hi all, >> >> I have to filter a tab-delimited text file like below: >> >> "GeneNames" "value1" "value2" "log2(Fold_change)" >> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >> normalized) > 4)" >> ENSG00000209350 4 35 -3.81131293562629 >> -4.14357714689656 TRUE >> ENSG00000177133 142 2 5.46771720082336 >> 5.13545298955309 FALSE >> ENSG00000116285 115 1669 -4.54130810709955 >> -4.87357231836982 TRUE >> ENSG00000009724 10 162 -4.69995182667858 >> -5.03221603794886 FALSE >> ENSG00000162460 3 31 -4.05126372834704 >> -4.38352793961731 TRUE >> >> based on the last column (TRUE), and then write to a new text file, >> meaning I should get something like below: >> >> "GeneNames" "value1" "value2" "log2(Fold_change)" >> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >> normalized) > 4)" >> ENSG00000209350 4 35 -3.81131293562629 >> -4.14357714689656 TRUE >> ENSG00000116285 115 1669 -4.54130810709955 >> -4.87357231836982 TRUE >> ENSG00000162460 3 31 -4.05126372834704 >> -4.38352793961731 TRUE >> >> I used read.table and write.table but I am still not very satisfied >> with the results. Here is what I did: >> >> expFC <- read.table( "test.txt", header=T, sep="\t" ) >> expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",] >> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, >> sep="\t" ) >> >> Result: >> >> "GeneNames" "value1" "value2" "log2.Fold_change." >> "log2.Fold_change..normalized" >> "Signature.abs.log2.Fold_change..normalized....4." >> "ENSG00000209350" 4 35 -3.81131293562629 >> -4.14357714689656 TRUE >> "ENSG00000116285" 115 1669 -4.54130810709955 >> -4.87357231836982 TRUE >> "ENSG00000162460" 3 31 -4.05126372834704 >> -4.38352793961731 TRUE >> >> As you can see, there are two points: >> >> 1. The headers were altered. All the special characters were >> converted to dot (.). >> 2. The gene names (first column) were quoted (which were not in the >> original file). >> >> The second point is not very annoying, but the first one is. How do I >> get exact the headers like the original file? >> >> Thanks, >> >> D. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
On Fri, Sep 10, 2010 at 1:24 PM, Duke <duke.lists at gmx.com> wrote:> ?Hi all, > > I have to filter a tab-delimited text file like below: > > "GeneNames" ? ?"value1" ? ?"value2" ? ?"log2(Fold_change)" > ?"log2(Fold_change) normalized" ? ?"Signature(abs(log2(Fold_change) > normalized) > 4)" > ENSG00000209350 ? ?4 ? ?35 ? ?-3.81131293562629 ? ?-4.14357714689656 ? ?TRUE > ENSG00000177133 ? ?142 ? ?2 ? ?5.46771720082336 ? ?5.13545298955309 ? ?FALSE > ENSG00000116285 ? ?115 ? ?1669 ? ?-4.54130810709955 ? ?-4.87357231836982 > ?TRUE > ENSG00000009724 ? ?10 ? ?162 ? ?-4.69995182667858 ? ?-5.03221603794886 > ?FALSE > ENSG00000162460 ? ?3 ? ?31 ? ?-4.05126372834704 ? ?-4.38352793961731 ? ?TRUE > > based on the last column (TRUE), and then write to a new text file, meaning > I should get something like below: > > "GeneNames" ? ?"value1" ? ?"value2" ? ?"log2(Fold_change)" > ?"log2(Fold_change) normalized" ? ?"Signature(abs(log2(Fold_change) > normalized) > 4)" > ENSG00000209350 ? ?4 ? ?35 ? ?-3.81131293562629 ? ?-4.14357714689656 ? ?TRUE > ENSG00000116285 ? ?115 ? ?1669 ? ?-4.54130810709955 ? ?-4.87357231836982 > ?TRUE > ENSG00000162460 ? ?3 ? ?31 ? ?-4.05126372834704 ? ?-4.38352793961731 ? ?TRUE > > I used read.table and write.table but I am still not very satisfied with the > results. Here is what I did: > > expFC <- read.table( "test.txt", header=T, sep="\t" ) > expFC.TRUE <- expFC[expFC[dim(expFC)[2]]=="TRUE",] > write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" ) > > Result: > > "GeneNames" ? ?"value1" ? ?"value2" ? ?"log2.Fold_change." > ?"log2.Fold_change..normalized" > ?"Signature.abs.log2.Fold_change..normalized....4." > "ENSG00000209350" ? ?4 ? ?35 ? ?-3.81131293562629 ? ?-4.14357714689656 > ?TRUE > "ENSG00000116285" ? ?115 ? ?1669 ? ?-4.54130810709955 ? ?-4.87357231836982 > ?TRUE > "ENSG00000162460" ? ?3 ? ?31 ? ?-4.05126372834704 ? ?-4.38352793961731 > ?TRUE > > As you can see, there are two points: > > 1. The headers were altered. All the special characters were converted to > dot (.). > 2. The gene names (first column) were quoted (which were not in the original > file). >This will copy input lines matching pattern as well as the header to the output verbatim preserving all quotes, spacing, etc. myFilter <- function(infile, outfile, pattern = "TRUE$") { L <- readLines(infile) cat(L[1], "\n", file = outfile) L2 <- grep(pattern, L[-1], value = TRUE) for(el in L2) cat(el, "\n", file = outfile, append = TRUE) } # e.g. myFilter("infile.txt", "outfile.txt") -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On 9/10/10 2:49 PM, Gabor Grothendieck wrote:> On Fri, Sep 10, 2010 at 1:24 PM, Duke<duke.lists at gmx.com> wrote: >> Hi all, >> >> I have to filter a tab-delimited text file like below: >> >> "GeneNames" "value1" "value2" "log2(Fold_change)" >> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >> normalized)> 4)" >> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 TRUE >> ENSG00000177133 142 2 5.46771720082336 5.13545298955309 FALSE >> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 >> TRUE >> ENSG00000009724 10 162 -4.69995182667858 -5.03221603794886 >> FALSE >> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 TRUE >> >> based on the last column (TRUE), and then write to a new text file, meaning >> I should get something like below: >> >> "GeneNames" "value1" "value2" "log2(Fold_change)" >> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >> normalized)> 4)" >> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 TRUE >> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 >> TRUE >> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 TRUE >> >> I used read.table and write.table but I am still not very satisfied with the >> results. Here is what I did: >> >> expFC<- read.table( "test.txt", header=T, sep="\t" ) >> expFC.TRUE<- expFC[expFC[dim(expFC)[2]]=="TRUE",] >> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" ) >> >> Result: >> >> "GeneNames" "value1" "value2" "log2.Fold_change." >> "log2.Fold_change..normalized" >> "Signature.abs.log2.Fold_change..normalized....4." >> "ENSG00000209350" 4 35 -3.81131293562629 -4.14357714689656 >> TRUE >> "ENSG00000116285" 115 1669 -4.54130810709955 -4.87357231836982 >> TRUE >> "ENSG00000162460" 3 31 -4.05126372834704 -4.38352793961731 >> TRUE >> >> As you can see, there are two points: >> >> 1. The headers were altered. All the special characters were converted to >> dot (.). >> 2. The gene names (first column) were quoted (which were not in the original >> file). >> > This will copy input lines matching pattern as well as the header to > the output verbatim preserving all quotes, spacing, etc. > > myFilter<- function(infile, outfile, pattern = "TRUE$") { > L<- readLines(infile) > cat(L[1], "\n", file = outfile) > L2<- grep(pattern, L[-1], value = TRUE) > for(el in L2) cat(el, "\n", file = outfile, append = TRUE) > } > > # e.g. > myFilter("infile.txt", "outfile.txt") >I love this the best! Even it is not as simple as the bash one liner (system( "cat infile.txt | grep -v FALSE > outfile.txt", wait=TRUE )), but I am very happy to learn that R does have other similar functions as in bash. If there is a document or a list of all such functions, that would be excellent. Thanks Gabor, D.
On Fri, Sep 10, 2010 at 4:20 PM, Duke <duke.lists at gmx.com> wrote:> ?On 9/10/10 2:49 PM, Gabor Grothendieck wrote: >> >> On Fri, Sep 10, 2010 at 1:24 PM, Duke<duke.lists at gmx.com> ?wrote: >>> >>> ?Hi all, >>> >>> I have to filter a tab-delimited text file like below: >>> >>> "GeneNames" ? ?"value1" ? ?"value2" ? ?"log2(Fold_change)" >>> ?"log2(Fold_change) normalized" ? ?"Signature(abs(log2(Fold_change) >>> normalized)> ?4)" >>> ENSG00000209350 ? ?4 ? ?35 ? ?-3.81131293562629 ? ?-4.14357714689656 >>> ?TRUE >>> ENSG00000177133 ? ?142 ? ?2 ? ?5.46771720082336 ? ?5.13545298955309 >>> ?FALSE >>> ENSG00000116285 ? ?115 ? ?1669 ? ?-4.54130810709955 ? ?-4.87357231836982 >>> ?TRUE >>> ENSG00000009724 ? ?10 ? ?162 ? ?-4.69995182667858 ? ?-5.03221603794886 >>> ?FALSE >>> ENSG00000162460 ? ?3 ? ?31 ? ?-4.05126372834704 ? ?-4.38352793961731 >>> ?TRUE >>> >>> based on the last column (TRUE), and then write to a new text file, >>> meaning >>> I should get something like below: >>> >>> "GeneNames" ? ?"value1" ? ?"value2" ? ?"log2(Fold_change)" >>> ?"log2(Fold_change) normalized" ? ?"Signature(abs(log2(Fold_change) >>> normalized)> ?4)" >>> ENSG00000209350 ? ?4 ? ?35 ? ?-3.81131293562629 ? ?-4.14357714689656 >>> ?TRUE >>> ENSG00000116285 ? ?115 ? ?1669 ? ?-4.54130810709955 ? ?-4.87357231836982 >>> ?TRUE >>> ENSG00000162460 ? ?3 ? ?31 ? ?-4.05126372834704 ? ?-4.38352793961731 >>> ?TRUE >>> >>> I used read.table and write.table but I am still not very satisfied with >>> the >>> results. Here is what I did: >>> >>> expFC<- read.table( "test.txt", header=T, sep="\t" ) >>> expFC.TRUE<- expFC[expFC[dim(expFC)[2]]=="TRUE",] >>> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" >>> ) >>> >>> Result: >>> >>> "GeneNames" ? ?"value1" ? ?"value2" ? ?"log2.Fold_change." >>> ?"log2.Fold_change..normalized" >>> ?"Signature.abs.log2.Fold_change..normalized....4." >>> "ENSG00000209350" ? ?4 ? ?35 ? ?-3.81131293562629 ? ?-4.14357714689656 >>> ?TRUE >>> "ENSG00000116285" ? ?115 ? ?1669 ? ?-4.54130810709955 >>> ?-4.87357231836982 >>> ?TRUE >>> "ENSG00000162460" ? ?3 ? ?31 ? ?-4.05126372834704 ? ?-4.38352793961731 >>> ?TRUE >>> >>> As you can see, there are two points: >>> >>> 1. The headers were altered. All the special characters were converted to >>> dot (.). >>> 2. The gene names (first column) were quoted (which were not in the >>> original >>> file). >>> >> This will copy input lines matching pattern as well as the header to >> the output verbatim preserving all quotes, spacing, etc. >> >> myFilter<- function(infile, outfile, pattern = "TRUE$") { >> ? ? ? ?L<- readLines(infile) >> ? ? ? ?cat(L[1], "\n", file = outfile) >> ? ? ? ?L2<- grep(pattern, L[-1], value = TRUE) >> ? ? ? ?for(el in L2) cat(el, "\n", file = outfile, append = TRUE) >> } >> >> # e.g. >> myFilter("infile.txt", "outfile.txt") >> > > I love this the best! Even it is not as simple as the bash one liner > (system( "cat infile.txt | grep -v FALSE > outfile.txt", wait=TRUE )), but I > am very happy to learn that R does have other similar functions as in bash. > If there is a document or a list of all such functions, that would be > excellent. > > Thanks Gabor, >Check out these help files: help.search(keyword = "character", package = "base") -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On 9/10/10 4:24 PM, Gabor Grothendieck wrote:> On Fri, Sep 10, 2010 at 4:20 PM, Duke<duke.lists at gmx.com> wrote: >> On 9/10/10 2:49 PM, Gabor Grothendieck wrote: >>> On Fri, Sep 10, 2010 at 1:24 PM, Duke<duke.lists at gmx.com> wrote: >>>> Hi all, >>>> >>>> I have to filter a tab-delimited text file like below: >>>> >>>> "GeneNames" "value1" "value2" "log2(Fold_change)" >>>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >>>> normalized)> 4)" >>>> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 >>>> TRUE >>>> ENSG00000177133 142 2 5.46771720082336 5.13545298955309 >>>> FALSE >>>> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 >>>> TRUE >>>> ENSG00000009724 10 162 -4.69995182667858 -5.03221603794886 >>>> FALSE >>>> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 >>>> TRUE >>>> >>>> based on the last column (TRUE), and then write to a new text file, >>>> meaning >>>> I should get something like below: >>>> >>>> "GeneNames" "value1" "value2" "log2(Fold_change)" >>>> "log2(Fold_change) normalized" "Signature(abs(log2(Fold_change) >>>> normalized)> 4)" >>>> ENSG00000209350 4 35 -3.81131293562629 -4.14357714689656 >>>> TRUE >>>> ENSG00000116285 115 1669 -4.54130810709955 -4.87357231836982 >>>> TRUE >>>> ENSG00000162460 3 31 -4.05126372834704 -4.38352793961731 >>>> TRUE >>>> >>>> I used read.table and write.table but I am still not very satisfied with >>>> the >>>> results. Here is what I did: >>>> >>>> expFC<- read.table( "test.txt", header=T, sep="\t" ) >>>> expFC.TRUE<- expFC[expFC[dim(expFC)[2]]=="TRUE",] >>>> write.table (expFC.TRUE, file="test_TRUE.txt", row.names=FALSE, sep="\t" >>>> ) >>>> >>>> Result: >>>> >>>> "GeneNames" "value1" "value2" "log2.Fold_change." >>>> "log2.Fold_change..normalized" >>>> "Signature.abs.log2.Fold_change..normalized....4." >>>> "ENSG00000209350" 4 35 -3.81131293562629 -4.14357714689656 >>>> TRUE >>>> "ENSG00000116285" 115 1669 -4.54130810709955 >>>> -4.87357231836982 >>>> TRUE >>>> "ENSG00000162460" 3 31 -4.05126372834704 -4.38352793961731 >>>> TRUE >>>> >>>> As you can see, there are two points: >>>> >>>> 1. The headers were altered. All the special characters were converted to >>>> dot (.). >>>> 2. The gene names (first column) were quoted (which were not in the >>>> original >>>> file). >>>> >>> This will copy input lines matching pattern as well as the header to >>> the output verbatim preserving all quotes, spacing, etc. >>> >>> myFilter<- function(infile, outfile, pattern = "TRUE$") { >>> L<- readLines(infile) >>> cat(L[1], "\n", file = outfile) >>> L2<- grep(pattern, L[-1], value = TRUE) >>> for(el in L2) cat(el, "\n", file = outfile, append = TRUE) >>> } >>> >>> # e.g. >>> myFilter("infile.txt", "outfile.txt") >>> >> I love this the best! Even it is not as simple as the bash one liner >> (system( "cat infile.txt | grep -v FALSE> outfile.txt", wait=TRUE )), but I >> am very happy to learn that R does have other similar functions as in bash. >> If there is a document or a list of all such functions, that would be >> excellent. >> >> Thanks Gabor, >> > Check out these help files: > > help.search(keyword = "character", package = "base") >Great! Thanks so much Gabor. D.