Hello I'm trying to write a function to calculate the relative entropy between two distributions. The data I have is in table format, for example:> t1 <- prop.table(table(c(0,0,2,4,4))) > t2 <- prop.table(table(c(0,2,2,2,3))) > t10 2 4 0.4 0.2 0.4> t20 2 3 0.2 0.6 0.2 The relative entropy is given by H[P||Q] = sum(p * log2(p/q)) with the conventions that 0*log2(0/q) = 0 and p*log2(p/0) = Inf. I'm not sure about what is the best way to achieve that. Is there a way to test if a table has a value for a given level, so that I can detect that, for example, t1 is missing levels 1 and 3 and t2 is missing levels 1 and 4 (is "level" the correct terminology here?)? Simply trying to access t1[["1"]], for example, gives a "subscript out of bounds" error. Another option would be to "expand" the tables, so that, for example, t1 becomes 0 1 2 3 4 0.4 0.0 0.2 0.0 0.4 Is there a way to do that? Thanks, Andre
Hi Andre, Just about expending the table, The way you could do this is by using factors, for example: t1 <- prop.table(table(factor(c(0,0,2,4,4)))) t2 <- prop.table(table(factor( c(0,2,2,2,3)))) The rest is for more knowledgeable people then me to say... On Mon, Jul 27, 2009 at 10:21 PM, Andre Nathan <andre@digirati.com.br>wrote:> Hello > > I'm trying to write a function to calculate the relative entropy between > two distributions. The data I have is in table format, for example: > > > t1 <- prop.table(table(c(0,0,2,4,4))) > > t2 <- prop.table(table(c(0,2,2,2,3))) > > t1 > > 0 2 4 > 0.4 0.2 0.4 > > t2 > > 0 2 3 > 0.2 0.6 0.2 > > The relative entropy is given by > > H[P||Q] = sum(p * log2(p/q)) > > with the conventions that 0*log2(0/q) = 0 and p*log2(p/0) = Inf. > > I'm not sure about what is the best way to achieve that. Is there a way > to test if a table has a value for a given level, so that I can detect > that, for example, t1 is missing levels 1 and 3 and t2 is missing levels > 1 and 4 (is "level" the correct terminology here?)? Simply trying to > access t1[["1"]], for example, gives a "subscript out of bounds" error. > > Another option would be to "expand" the tables, so that, for example, t1 > becomes > > 0 1 2 3 4 > 0.4 0.0 0.2 0.0 0.4 > > Is there a way to do that? > > Thanks, > Andre > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]]
Try this: t1 <- prop.table(table(factor(c(0,0,2,4,4), levels = 0:4))) t2 <- prop.table(table(factor(c(0,2,2,2,3), levels = 0:4))) On Mon, Jul 27, 2009 at 4:21 PM, Andre Nathan <andre@digirati.com.br> wrote:> Hello > > I'm trying to write a function to calculate the relative entropy between > two distributions. The data I have is in table format, for example: > > > t1 <- prop.table(table(c(0,0,2,4,4))) > > t2 <- prop.table(table(c(0,2,2,2,3))) > > t1 > > 0 2 4 > 0.4 0.2 0.4 > > t2 > > 0 2 3 > 0.2 0.6 0.2 > > The relative entropy is given by > > H[P||Q] = sum(p * log2(p/q)) > > with the conventions that 0*log2(0/q) = 0 and p*log2(p/0) = Inf. > > I'm not sure about what is the best way to achieve that. Is there a way > to test if a table has a value for a given level, so that I can detect > that, for example, t1 is missing levels 1 and 3 and t2 is missing levels > 1 and 4 (is "level" the correct terminology here?)? Simply trying to > access t1[["1"]], for example, gives a "subscript out of bounds" error. > > Another option would be to "expand" the tables, so that, for example, t1 > becomes > > 0 1 2 3 4 > 0.4 0.0 0.2 0.0 0.4 > > Is there a way to do that? > > Thanks, > Andre > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
On Mon, 2009-07-27 at 16:34 -0300, Henrique Dallazuanna wrote:> Try this: > > t1 <- prop.table(table(factor(c(0,0,2,4,4), levels = 0:4))) > t2 <- prop.table(table(factor(c(0,2,2,2,3), levels = 0:4)))Is there a way to do this given an already existing table? The problem is that I actually build the distributions as I read data from files, something like distr <- NULL for (file in files) { x <- as.matrix(read.table(file)) t <- c(distr, table(x)) distr <- tapply(t, names(t), sum) } distr <- prop.table(distr) So I only know the maximum level after the distributions are created. Thanks, Andre