Hello,
I have a dataframe (tab separated file) which looks like the example below -
two values separated by a comma, and tab separation between each of these.
     [,1]  [,2]  [,3]  [ ,4]
[1,] 0,1  1,3   40,10  0,0
[2,] 20,5  4,2  10,40  10,0
[3,] 0,11  1,2  120,10  0,0
I would like to calculate the percentage of the smallest number separated by
the comma by:
1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
= 0.8
3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave
value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
plan to generate file like:
   
    [,1]  [,2]  [,3]  [,4]
[1,] 1   0.25  0.2  0
[2,] 0.2  0.33  0.2  1
[3,] 1  0.33  0.08  0
Apologies, I know this is very complex. Any help, even just some pointers on
how to write a general function where values are separated by a comma, is
realy very much appreciated!
Thank you
-- 
View this message in context:
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
Sent from the R help mailing list archive at Nabble.com.
Hi,
It is not the most elegant thing ever, but this does what you want.  I
am *fairly* certain it generalizes to different sized matrices, but
I'd double check.  When you divide by 0, it returns NaN, but this is
pretty easy to fix if you really want 0s using is.nan().  My general
process was: split data by commas, convert to numeric, define a
function that does your calculations, apply this function, convert
results back from a list to a matrix with the same number of columns
as the original data, add any column/rownames from original matrix,
return results.
# Define a function
my.fun <- function(dat) {
  # split data by commas, and convert to numeric
  # with commas, it would have been character
  # so something like this is necessary
  temp <- lapply(strsplit(dat, ","), as.numeric)
  # Define summary function
  my.summary <- function(x) {
    ## This combines your first and second steps
    value <- x[1]/sum(x)
    ## if value > .5, return 1 - value
    ## otherwise, just return the value
    if(isTRUE(value > 0.5)) {
      return(1 - value)
    } else {return(value)}
  }
  temp2 <- lapply(temp, my.summary)
  output <- matrix(unlist(temp2), ncol = ncol(dat),
    dimnames = dimnames(dat))
  return(output)
}
# Create your data
dat <- c("0,1", "1,3", "40,10",
"0,0", "20,5", "4,2",
         "10,40", "10,0", "0,11", "1,2",
"120,10", "0,0")
dat <- matrix(dat, ncol = 4, byrow = TRUE)
# Test it out
my.fun(dat)
HTH,
Josh
On Thu, Oct 7, 2010 at 10:19 PM, burgundy <sauburn at yahoo.com>
wrote:>
> Hello,
>
> I have a dataframe (tab separated file) which looks like the example below
-
> two values separated by a comma, and tab separation between each of these.
>
> ? ? [,1] ?[,2] ?[,3] ?[ ,4]
> [1,] 0,1 ?1,3 ? 40,10 ?0,0
> [2,] 20,5 ?4,2 ?10,40 ?10,0
> [3,] 0,11 ?1,2 ?120,10 ?0,0
>
> I would like to calculate the percentage of the smallest number separated
by
> the comma by:
> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
> 2) taking the first value and dividing it by the total e.g. for [1,3],
40/50
> = 0.8
> 3) where the value generated by 2) is >0.5, print 1-value, otherwise,
leave
> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
>
> plan to generate file like:
>
> ? ?[,1] ?[,2] ?[,3] ?[,4]
> [1,] 1 ? 0.25 ?0.2 ?0
> [2,] 0.2 ?0.33 ?0.2 ?1
> [3,] 1 ?0.33 ?0.08 ?0
>
> Apologies, I know this is very complex. Any help, even just some pointers
on
> how to write a general function where values are separated by a comma, is
> realy very much appreciated!
>
> Thank you
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com> wrote:> > Hello, > > I have a dataframe (tab separated file) which looks like the example below - > two values separated by a comma, and tab separation between each of these. > > ? ? [,1] ?[,2] ?[,3] ?[ ,4] > [1,] 0,1 ?1,3 ? 40,10 ?0,0 > [2,] 20,5 ?4,2 ?10,40 ?10,0 > [3,] 0,11 ?1,2 ?120,10 ?0,0 > > I would like to calculate the percentage of the smallest number separated by > the comma by: > 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 > 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 > = 0.8 > 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave > value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 > > plan to generate file like: > > ? ?[,1] ?[,2] ?[,3] ?[,4] > [1,] 1 ? 0.25 ?0.2 ?0 > [2,] 0.2 ?0.33 ?0.2 ?1 > [3,] 1 ?0.33 ?0.08 ?0Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). Using that match a regular expression consisting of digits, a comma and digits capturing the two strings of digits and passing them to function f replacing the expression with the output of f. Then read the resulting text into a data frame. library(gsubfn) L <- c(" 0,1 1,3 40,10 0,0", " 20,5 4,2 10,40 10,0", " 0,11 1,2 120,10 0,0") f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) } L2 <- gsubfn("(\\d+),(\\d+)", f, L) DF <- read.table(textConnection(L2)) which gives:> DFV1 V2 V3 V4 1 0.0 0.2500000 0.20000000 NaN 2 0.2 0.3333333 0.20000000 0 3 0.0 0.3333333 0.07692308 NaN -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Fri, Oct 8, 2010 at 10:18 AM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com> wrote: >> >> Hello, >> >> I have a dataframe (tab separated file) which looks like the example below - >> two values separated by a comma, and tab separation between each of these. >> >> ? ? [,1] ?[,2] ?[,3] ?[ ,4] >> [1,] 0,1 ?1,3 ? 40,10 ?0,0 >> [2,] 20,5 ?4,2 ?10,40 ?10,0 >> [3,] 0,11 ?1,2 ?120,10 ?0,0 >> >> I would like to calculate the percentage of the smallest number separated by >> the comma by: >> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 >> 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 >> = 0.8 >> 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave >> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 >> >> plan to generate file like: >> >> ? ?[,1] ?[,2] ?[,3] ?[,4] >> [1,] 1 ? 0.25 ?0.2 ?0 >> [2,] 0.2 ?0.33 ?0.2 ?1 >> [3,] 1 ?0.33 ?0.08 ?0 > > Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). ?Using that > match a regular expression consisting of digits, a comma and digits > capturing the two strings of digits and passing them to function f > replacing the expression with the output of f. ?Then read the > resulting text into a data frame. > > library(gsubfn) > L <- c(" 0,1 ?1,3 ? 40,10 ?0,0", " 20,5 ?4,2 ?10,40 ?10,0", > ? " 0,11 ?1,2 ?120,10 ?0,0") > > f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) } > L2 <- gsubfn("(\\d+),(\\d+)", f, L) > > DF <- read.table(textConnection(L2)) > > which gives: > >> DF > ? V1 ? ? ? ?V2 ? ? ? ? V3 ?V4 > 1 0.0 0.2500000 0.20000000 NaN > 2 0.2 0.3333333 0.20000000 ? 0 > 3 0.0 0.3333333 0.07692308 NaNA further simplification would be to use strapply from the same package. It eliminates the need for read.table at the end:> strapply(L, "(\\d+),(\\d+)", f, simplify = rbind)[,1] [,2] [,3] [,4] [1,] 0.0 0.2500000 0.20000000 NaN [2,] 0.2 0.3333333 0.20000000 0 [3,] 0.0 0.3333333 0.07692308 NaN -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Here's another method without using any external regular expression
libraries:
dat <- read.table(tc <- textConnection(
'0,1 1,3 40,10 0,0
20,5 4,2 10,40 10,0
0,11 1,2 120,10 0,0'), sep="")
mat <- apply(dat, c(1,2), function(x){
	temp <- as.numeric(unlist(strsplit(x, ',')))
	min(temp)/sum(temp)
})
For mat[2,4], I get 0 (as did the other solutions), and you get 1, so
check on that. If you want the divide-by-0 NaNs to be 0, you can check
that by replacing
min(temp)/sum(temp)
with:
ifelse(is.nan(val<-min(temp)/sum(temp)), 0, val)
This has an advantage over:
mat[is.na(mat)] <- 0
in that you might have true missingness in your data and is.na won't
be able to distinguish it.
Cheers,
Jeff.
On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com>
wrote:>
> Hello,
>
> I have a dataframe (tab separated file) which looks like the example below
-
> two values separated by a comma, and tab separation between each of these.
>
> ? ? [,1] ?[,2] ?[,3] ?[ ,4]
> [1,] 0,1 ?1,3 ? 40,10 ?0,0
> [2,] 20,5 ?4,2 ?10,40 ?10,0
> [3,] 0,11 ?1,2 ?120,10 ?0,0
>
> I would like to calculate the percentage of the smallest number separated
by
> the comma by:
> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
> 2) taking the first value and dividing it by the total e.g. for [1,3],
40/50
> = 0.8
> 3) where the value generated by 2) is >0.5, print 1-value, otherwise,
leave
> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
>
> plan to generate file like:
>
> ? ?[,1] ?[,2] ?[,3] ?[,4]
> [1,] 1 ? 0.25 ?0.2 ?0
> [2,] 0.2 ?0.33 ?0.2 ?1
> [3,] 1 ?0.33 ?0.08 ?0
>
> Apologies, I know this is very complex. Any help, even just some pointers
on
> how to write a general function where values are separated by a comma, is
> realy very much appreciated!
>
> Thank you
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>