I'm looking over good-code a post-doc in my lab wrote and trying to learn how it works. I came across the following: rel.abundance <- as.matrix(read.delim("rel.abundance.csv",row.names=1,as.is =TRUE)) rel.abundance <- log2(rel.abundance-min(rel.abundance)+1) I'm not sure what the second line is doing. I ran each line in R and couldn't see a noticeable difference in the output. I assume log2() takes the log base 2 of the values? I'm not clear what -min(rel.abundance) is doing either...my hunch would be that it would take the smallest value in each row? I'd really like to figure out: 1) What's actually going on? 2) Is there a good way to run a command over a large dataset in R and better be able to tell what is going on? More specifically, when I run each line in R it looks something like this (w/ dif. values per row): Archaea|Euryarchaeota|Methanobacteria|Methanobacteriales|Methanobacteriaceae|Methanobrevibacter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,3,0,0,0 There are a lot of cells w/ values per row, which is one reason why I think it is difficult to detect a pattern.... Thanks in advance! Ben [[alternative HTML version deleted]]
The second line is just scaling the data based on log2. It is subtracting the minimun of the entire matrix (not just each row) and adding 1 to make sure there is not a value of zero since log2(0) is not valid. Here is an example of sample data:> x <- matrix(runif(25, -50, 50), 5) > x[,1] [,2] [,3] [,4] [,5] [1,] 29.730883 15.47239 -28.679186 47.617069 -48.692242 [2,] -4.472555 -14.68027 -37.062765 23.179251 21.556607 [3,] -8.991592 -22.97399 -2.188197 -14.327309 -39.681576 [4,] 31.087024 49.26841 42.407447 -6.852631 -5.371565 [5,] 10.493329 13.34933 9.876097 -35.178844 14.010105> # scale to log2 > x <- log2(x - min(x) + 1) > x[,1] [,2] [,3] [,4] [,5] [1,] 6.311487 6.026017 4.393214 6.604506 0.000000 [2,] 5.498879 5.129776 3.658723 6.187283 6.154795 [3,] 5.346980 4.739754 5.569978 5.144248 3.323466 [4,] 6.335913 6.628783 6.525124 5.420873 5.469908 [5,] 5.911346 5.978232 5.896474 3.859313 5.993275 You should see a noticable change between the data read in and the result of the second statement. On Mon, Jun 13, 2011 at 11:59 AM, Ben Ganzfried <ben.ganzfried at gmail.com> wrote:> I'm looking over good-code a post-doc in my lab wrote and trying to learn > how it works. ?I came across the following: > rel.abundance <- as.matrix(read.delim("rel.abundance.csv",row.names=1,as.is > =TRUE)) > rel.abundance <- log2(rel.abundance-min(rel.abundance)+1) > > I'm not sure what the second line is doing. ?I ran each line in R and > couldn't see a noticeable difference in the output. ?I assume log2() takes > the log base 2 of the values? ?I'm not clear what -min(rel.abundance) is > doing either...my hunch would be that it would take the smallest value in > each row? > I'd really like to figure out: > 1) What's actually going on? > 2) Is there a good way to run a command over a large dataset in R and better > be able to tell what is going on? ?More specifically, when I run each line > in R it looks something like this (w/ dif. values per row): > Archaea|Euryarchaeota|Methanobacteria|Methanobacteriales|Methanobacteriaceae|Methanobrevibacter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,3,0,0,0 > > > There are a lot of cells w/ values per row, which is one reason why I think > it is difficult to detect a pattern.... > > Thanks in advance! > > Ben > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Hi r-help-bounces at r-project.org napsal dne 13.06.2011 17:59:03:> Ben Ganzfried <ben.ganzfried at gmail.com> > Odeslal: r-help-bounces at r-project.org > > 13.06.2011 17:59 > > Komu > > r-help at r-project.org > > Kopie > > P?edm?t > > [R] log2() and -min() very quick question > > I'm looking over good-code a post-doc in my lab wrote and trying tolearn> how it works. I came across the following: > rel.abundance <-as.matrix(read.delim("rel.abundance.csv",row.names=1,as.is> =TRUE)) > rel.abundance <- log2(rel.abundance-min(rel.abundance)+1) > > I'm not sure what the second line is doing. I ran each line in R and > couldn't see a noticeable difference in the output. I assume log2()takes> the log base 2 of the values? I'm not clear what -min(rel.abundance) is > doing either...my hunch would be that it would take the smallest valuein> each row?No. If rel.abundance is matrix min(rel.abundance) is overall minimum> mat<-matrix(1:12, 3,4) > min(mat)[1] 1 so log2(rel.abundance-min(rel.abundance)+1) subtract minimum value from all numbers, after that it add 1 do all numbers, takes log base 2 from each number and returns matrix with the same dimensions as input matrix.> I'd really like to figure out: > 1) What's actually going on? > 2) Is there a good way to run a command over a large dataset in R andbetter> be able to tell what is going on? More specifically, when I run eachline> in R it looks something like this (w/ dif. values per row): > Archaea|Euryarchaeota|Methanobacteria|Methanobacteriales| >Methanobacteriaceae|Methanobrevibacter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,>0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,>0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,>0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,>0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,>0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,> 0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,3,0,0,0 > > > There are a lot of cells w/ values per row, which is one reason why Ithink> it is difficult to detect a pattern....there are some summary and structure commands summary(data) or str(data) which can tell you some overall information about your data. Regards Petr> > Thanks in advance! > > Ben > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Possibly Parallel Threads
- log2(quote(1:10)) evaluates the quoted 1:10, log() does not
- I have a problem with the log2 function
- Warnings generated by log2()/log10() are really large/takes a long time to display
- Adding Institution-Affiliation to Description File of R Package
- multi-class histogram?