Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. [,1] [,2] [,3] [ ,4] [1,] 0,1 1,3 40,10 0,0 [2,] 20,5 4,2 10,40 10,0 [3,] 0,11 1,2 120,10 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: [,1] [,2] [,3] [,4] [1,] 1 0.25 0.2 0 [2,] 0.2 0.33 0.2 1 [3,] 1 0.33 0.08 0 Apologies, I know this is very complex. Any help, even just some pointers on how to write a general function where values are separated by a comma, is realy very much appreciated! Thank you -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html Sent from the R help mailing list archive at Nabble.com.
Hi, It is not the most elegant thing ever, but this does what you want. I am *fairly* certain it generalizes to different sized matrices, but I'd double check. When you divide by 0, it returns NaN, but this is pretty easy to fix if you really want 0s using is.nan(). My general process was: split data by commas, convert to numeric, define a function that does your calculations, apply this function, convert results back from a list to a matrix with the same number of columns as the original data, add any column/rownames from original matrix, return results. # Define a function my.fun <- function(dat) { # split data by commas, and convert to numeric # with commas, it would have been character # so something like this is necessary temp <- lapply(strsplit(dat, ","), as.numeric) # Define summary function my.summary <- function(x) { ## This combines your first and second steps value <- x[1]/sum(x) ## if value > .5, return 1 - value ## otherwise, just return the value if(isTRUE(value > 0.5)) { return(1 - value) } else {return(value)} } temp2 <- lapply(temp, my.summary) output <- matrix(unlist(temp2), ncol = ncol(dat), dimnames = dimnames(dat)) return(output) } # Create your data dat <- c("0,1", "1,3", "40,10", "0,0", "20,5", "4,2", "10,40", "10,0", "0,11", "1,2", "120,10", "0,0") dat <- matrix(dat, ncol = 4, byrow = TRUE) # Test it out my.fun(dat) HTH, Josh On Thu, Oct 7, 2010 at 10:19 PM, burgundy <sauburn at yahoo.com> wrote:> > Hello, > > I have a dataframe (tab separated file) which looks like the example below - > two values separated by a comma, and tab separation between each of these. > > ? ? [,1] ?[,2] ?[,3] ?[ ,4] > [1,] 0,1 ?1,3 ? 40,10 ?0,0 > [2,] 20,5 ?4,2 ?10,40 ?10,0 > [3,] 0,11 ?1,2 ?120,10 ?0,0 > > I would like to calculate the percentage of the smallest number separated by > the comma by: > 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 > 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 > = 0.8 > 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave > value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 > > plan to generate file like: > > ? ?[,1] ?[,2] ?[,3] ?[,4] > [1,] 1 ? 0.25 ?0.2 ?0 > [2,] 0.2 ?0.33 ?0.2 ?1 > [3,] 1 ?0.33 ?0.08 ?0 > > Apologies, I know this is very complex. Any help, even just some pointers on > how to write a general function where values are separated by a comma, is > realy very much appreciated! > > Thank you > > -- > View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com> wrote:> > Hello, > > I have a dataframe (tab separated file) which looks like the example below - > two values separated by a comma, and tab separation between each of these. > > ? ? [,1] ?[,2] ?[,3] ?[ ,4] > [1,] 0,1 ?1,3 ? 40,10 ?0,0 > [2,] 20,5 ?4,2 ?10,40 ?10,0 > [3,] 0,11 ?1,2 ?120,10 ?0,0 > > I would like to calculate the percentage of the smallest number separated by > the comma by: > 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 > 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 > = 0.8 > 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave > value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 > > plan to generate file like: > > ? ?[,1] ?[,2] ?[,3] ?[,4] > [1,] 1 ? 0.25 ?0.2 ?0 > [2,] 0.2 ?0.33 ?0.2 ?1 > [3,] 1 ?0.33 ?0.08 ?0Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). Using that match a regular expression consisting of digits, a comma and digits capturing the two strings of digits and passing them to function f replacing the expression with the output of f. Then read the resulting text into a data frame. library(gsubfn) L <- c(" 0,1 1,3 40,10 0,0", " 20,5 4,2 10,40 10,0", " 0,11 1,2 120,10 0,0") f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) } L2 <- gsubfn("(\\d+),(\\d+)", f, L) DF <- read.table(textConnection(L2)) which gives:> DFV1 V2 V3 V4 1 0.0 0.2500000 0.20000000 NaN 2 0.2 0.3333333 0.20000000 0 3 0.0 0.3333333 0.07692308 NaN -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Fri, Oct 8, 2010 at 10:18 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com> wrote: >> >> Hello, >> >> I have a dataframe (tab separated file) which looks like the example below - >> two values separated by a comma, and tab separation between each of these. >> >> ? ? [,1] ?[,2] ?[,3] ?[ ,4] >> [1,] 0,1 ?1,3 ? 40,10 ?0,0 >> [2,] 20,5 ?4,2 ?10,40 ?10,0 >> [3,] 0,11 ?1,2 ?120,10 ?0,0 >> >> I would like to calculate the percentage of the smallest number separated by >> the comma by: >> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 >> 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 >> = 0.8 >> 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave >> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 >> >> plan to generate file like: >> >> ? ?[,1] ?[,2] ?[,3] ?[,4] >> [1,] 1 ? 0.25 ?0.2 ?0 >> [2,] 0.2 ?0.33 ?0.2 ?1 >> [3,] 1 ?0.33 ?0.08 ?0 > > Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). ?Using that > match a regular expression consisting of digits, a comma and digits > capturing the two strings of digits and passing them to function f > replacing the expression with the output of f. ?Then read the > resulting text into a data frame. > > library(gsubfn) > L <- c(" 0,1 ?1,3 ? 40,10 ?0,0", " 20,5 ?4,2 ?10,40 ?10,0", > ? " 0,11 ?1,2 ?120,10 ?0,0") > > f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) } > L2 <- gsubfn("(\\d+),(\\d+)", f, L) > > DF <- read.table(textConnection(L2)) > > which gives: > >> DF > ? V1 ? ? ? ?V2 ? ? ? ? V3 ?V4 > 1 0.0 0.2500000 0.20000000 NaN > 2 0.2 0.3333333 0.20000000 ? 0 > 3 0.0 0.3333333 0.07692308 NaNA further simplification would be to use strapply from the same package. It eliminates the need for read.table at the end:> strapply(L, "(\\d+),(\\d+)", f, simplify = rbind)[,1] [,2] [,3] [,4] [1,] 0.0 0.2500000 0.20000000 NaN [2,] 0.2 0.3333333 0.20000000 0 [3,] 0.0 0.3333333 0.07692308 NaN -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Here's another method without using any external regular expression libraries: dat <- read.table(tc <- textConnection( '0,1 1,3 40,10 0,0 20,5 4,2 10,40 10,0 0,11 1,2 120,10 0,0'), sep="") mat <- apply(dat, c(1,2), function(x){ temp <- as.numeric(unlist(strsplit(x, ','))) min(temp)/sum(temp) }) For mat[2,4], I get 0 (as did the other solutions), and you get 1, so check on that. If you want the divide-by-0 NaNs to be 0, you can check that by replacing min(temp)/sum(temp) with: ifelse(is.nan(val<-min(temp)/sum(temp)), 0, val) This has an advantage over: mat[is.na(mat)] <- 0 in that you might have true missingness in your data and is.na won't be able to distinguish it. Cheers, Jeff. On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com> wrote:> > Hello, > > I have a dataframe (tab separated file) which looks like the example below - > two values separated by a comma, and tab separation between each of these. > > ? ? [,1] ?[,2] ?[,3] ?[ ,4] > [1,] 0,1 ?1,3 ? 40,10 ?0,0 > [2,] 20,5 ?4,2 ?10,40 ?10,0 > [3,] 0,11 ?1,2 ?120,10 ?0,0 > > I would like to calculate the percentage of the smallest number separated by > the comma by: > 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 > 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 > = 0.8 > 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave > value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 > > plan to generate file like: > > ? ?[,1] ?[,2] ?[,3] ?[,4] > [1,] 1 ? 0.25 ?0.2 ?0 > [2,] 0.2 ?0.33 ?0.2 ?1 > [3,] 1 ?0.33 ?0.08 ?0 > > Apologies, I know this is very complex. Any help, even just some pointers on > how to write a general function where values are separated by a comma, is > realy very much appreciated! > > Thank you > > -- > View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >