thr3ads.net - R help - [R] function using values separated by a comma [Oct 2010]

If this information is useful, please help other people find it:
Share via:

burgundy

2010-Oct-08 05:19 UTC

[R] function using values separated by a comma

Hello,

I have a dataframe (tab separated file) which looks like the example below -
two values separated by a comma, and tab separation between each of these.

     [,1]  [,2]  [,3]  [ ,4]
[1,] 0,1  1,3   40,10  0,0
[2,] 20,5  4,2  10,40  10,0
[3,] 0,11  1,2  120,10  0,0

I would like to calculate the percentage of the smallest number separated by
the comma by:
1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
2) taking the first value and dividing it by the total e.g. for [1,3], 40/50
= 0.8
3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave
value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2

plan to generate file like:
   
    [,1]  [,2]  [,3]  [,4]
[1,] 1   0.25  0.2  0
[2,] 0.2  0.33  0.2  1
[3,] 1  0.33  0.08  0

Apologies, I know this is very complex. Any help, even just some pointers on
how to write a general function where values are separated by a comma, is
realy very much appreciated!

Thank you

-- 
View this message in context:
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
Sent from the R help mailing list archive at Nabble.com.

Joshua Wiley

2010-Oct-08 07:54 UTC

head link

[R] function using values separated by a comma

Hi,

It is not the most elegant thing ever, but this does what you want.  I
am *fairly* certain it generalizes to different sized matrices, but
I'd double check.  When you divide by 0, it returns NaN, but this is
pretty easy to fix if you really want 0s using is.nan().  My general
process was: split data by commas, convert to numeric, define a
function that does your calculations, apply this function, convert
results back from a list to a matrix with the same number of columns
as the original data, add any column/rownames from original matrix,
return results.


# Define a function
my.fun <- function(dat) {
  # split data by commas, and convert to numeric
  # with commas, it would have been character
  # so something like this is necessary
  temp <- lapply(strsplit(dat, ","), as.numeric)
  # Define summary function
  my.summary <- function(x) {
    ## This combines your first and second steps
    value <- x[1]/sum(x)
    ## if value > .5, return 1 - value
    ## otherwise, just return the value
    if(isTRUE(value > 0.5)) {
      return(1 - value)
    } else {return(value)}
  }
  temp2 <- lapply(temp, my.summary)
  output <- matrix(unlist(temp2), ncol = ncol(dat),
    dimnames = dimnames(dat))
  return(output)
}

# Create your data
dat <- c("0,1", "1,3", "40,10",
"0,0", "20,5", "4,2",
         "10,40", "10,0", "0,11", "1,2",
"120,10", "0,0")
dat <- matrix(dat, ncol = 4, byrow = TRUE)

# Test it out
my.fun(dat)

HTH,

Josh

On Thu, Oct 7, 2010 at 10:19 PM, burgundy <sauburn at yahoo.com>
wrote:>
> Hello,
>
> I have a dataframe (tab separated file) which looks like the example below
-
> two values separated by a comma, and tab separation between each of these.
>
> ? ? [,1] ?[,2] ?[,3] ?[ ,4]
> [1,] 0,1 ?1,3 ? 40,10 ?0,0
> [2,] 20,5 ?4,2 ?10,40 ?10,0
> [3,] 0,11 ?1,2 ?120,10 ?0,0
>
> I would like to calculate the percentage of the smallest number separated
by
> the comma by:
> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
> 2) taking the first value and dividing it by the total e.g. for [1,3],
40/50
> = 0.8
> 3) where the value generated by 2) is >0.5, print 1-value, otherwise,
leave
> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
>
> plan to generate file like:
>
> ? ?[,1] ?[,2] ?[,3] ?[,4]
> [1,] 1 ? 0.25 ?0.2 ?0
> [2,] 0.2 ?0.33 ?0.2 ?1
> [3,] 1 ?0.33 ?0.08 ?0
>
> Apologies, I know this is very complex. Any help, even just some pointers
on
> how to write a general function where values are separated by a comma, is
> realy very much appreciated!
>
> Thank you
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Gabor Grothendieck

2010-Oct-08 14:18 UTC

head link

[R] function using values separated by a comma

On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com>
wrote:>
> Hello,
>
> I have a dataframe (tab separated file) which looks like the example below
-
> two values separated by a comma, and tab separation between each of these.
>
> ? ? [,1] ?[,2] ?[,3] ?[ ,4]
> [1,] 0,1 ?1,3 ? 40,10 ?0,0
> [2,] 20,5 ?4,2 ?10,40 ?10,0
> [3,] 0,11 ?1,2 ?120,10 ?0,0
>
> I would like to calculate the percentage of the smallest number separated
by
> the comma by:
> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
> 2) taking the first value and dividing it by the total e.g. for [1,3],
40/50
> = 0.8
> 3) where the value generated by 2) is >0.5, print 1-value, otherwise,
leave
> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
>
> plan to generate file like:
>
> ? ?[,1] ?[,2] ?[,3] ?[,4]
> [1,] 1 ? 0.25 ?0.2 ?0
> [2,] 0.2 ?0.33 ?0.2 ?1
> [3,] 1 ?0.33 ?0.08 ?0
Try using gsubfn in gsubfn (http://gsubfn.googlecode.com).  Using that
match a regular expression consisting of digits, a comma and digits
capturing the two strings of digits and passing them to function f
replacing the expression with the output of f.  Then read the
resulting text into a data frame.

library(gsubfn)
L <- c(" 0,1  1,3   40,10  0,0", " 20,5  4,2  10,40 
10,0",
   " 0,11  1,2  120,10  0,0")

f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) }
L2 <- gsubfn("(\\d+),(\\d+)", f, L)

DF <- read.table(textConnection(L2))

which gives:
> DF   V1        V2         V3  V4
1 0.0 0.2500000 0.20000000 NaN
2 0.2 0.3333333 0.20000000   0
3 0.0 0.3333333 0.07692308 NaN

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Gabor Grothendieck

2010-Oct-08 14:38 UTC

head link

[R] function using values separated by a comma

On Fri, Oct 8, 2010 at 10:18 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:> On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com>
wrote:
>>
>> Hello,
>>
>> I have a dataframe (tab separated file) which looks like the example
below -
>> two values separated by a comma, and tab separation between each of
these.
>>
>> ? ? [,1] ?[,2] ?[,3] ?[ ,4]
>> [1,] 0,1 ?1,3 ? 40,10 ?0,0
>> [2,] 20,5 ?4,2 ?10,40 ?10,0
>> [3,] 0,11 ?1,2 ?120,10 ?0,0
>>
>> I would like to calculate the percentage of the smallest number
separated by
>> the comma by:
>> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
>> 2) taking the first value and dividing it by the total e.g. for [1,3],
40/50
>> = 0.8
>> 3) where the value generated by 2) is >0.5, print 1-value,
otherwise, leave
>> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
>>
>> plan to generate file like:
>>
>> ? ?[,1] ?[,2] ?[,3] ?[,4]
>> [1,] 1 ? 0.25 ?0.2 ?0
>> [2,] 0.2 ?0.33 ?0.2 ?1
>> [3,] 1 ?0.33 ?0.08 ?0
>
> Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). ?Using that
> match a regular expression consisting of digits, a comma and digits
> capturing the two strings of digits and passing them to function f
> replacing the expression with the output of f. ?Then read the
> resulting text into a data frame.
>
> library(gsubfn)
> L <- c(" 0,1 ?1,3 ? 40,10 ?0,0", " 20,5 ?4,2 ?10,40
?10,0",
> ? " 0,11 ?1,2 ?120,10 ?0,0")
>
> f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) }
> L2 <- gsubfn("(\\d+),(\\d+)", f, L)
>
> DF <- read.table(textConnection(L2))
>
> which gives:
>
>> DF
> ? V1 ? ? ? ?V2 ? ? ? ? V3 ?V4
> 1 0.0 0.2500000 0.20000000 NaN
> 2 0.2 0.3333333 0.20000000 ? 0
> 3 0.0 0.3333333 0.07692308 NaN
A further simplification would be to use strapply from the same
package. It eliminates the need for read.table at the end:
> strapply(L, "(\\d+),(\\d+)", f, simplify = rbind)     [,1]      [,2]       [,3] [,4]
[1,]  0.0 0.2500000 0.20000000  NaN
[2,]  0.2 0.3333333 0.20000000    0
[3,]  0.0 0.3333333 0.07692308  NaN

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Jeffrey Spies

2010-Oct-08 15:47 UTC

head link

[R] function using values separated by a comma

Here's another method without using any external regular expression
libraries:

dat <- read.table(tc <- textConnection(
'0,1 1,3 40,10 0,0
20,5 4,2 10,40 10,0
0,11 1,2 120,10 0,0'), sep="")

mat <- apply(dat, c(1,2), function(x){
	temp <- as.numeric(unlist(strsplit(x, ',')))
	min(temp)/sum(temp)
})

For mat[2,4], I get 0 (as did the other solutions), and you get 1, so
check on that. If you want the divide-by-0 NaNs to be 0, you can check
that by replacing

min(temp)/sum(temp)

with:

ifelse(is.nan(val<-min(temp)/sum(temp)), 0, val)

This has an advantage over:

mat[is.na(mat)] <- 0

in that you might have true missingness in your data and is.na won't
be able to distinguish it.

Cheers,

Jeff.

On Fri, Oct 8, 2010 at 1:19 AM, burgundy <sauburn at yahoo.com>
wrote:>
> Hello,
>
> I have a dataframe (tab separated file) which looks like the example below
-
> two values separated by a comma, and tab separation between each of these.
>
> ? ? [,1] ?[,2] ?[,3] ?[ ,4]
> [1,] 0,1 ?1,3 ? 40,10 ?0,0
> [2,] 20,5 ?4,2 ?10,40 ?10,0
> [3,] 0,11 ?1,2 ?120,10 ?0,0
>
> I would like to calculate the percentage of the smallest number separated
by
> the comma by:
> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50
> 2) taking the first value and dividing it by the total e.g. for [1,3],
40/50
> = 0.8
> 3) where the value generated by 2) is >0.5, print 1-value, otherwise,
leave
> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2
>
> plan to generate file like:
>
> ? ?[,1] ?[,2] ?[,3] ?[,4]
> [1,] 1 ? 0.25 ?0.2 ?0
> [2,] 0.2 ?0.33 ?0.2 ?1
> [3,] 1 ?0.33 ?0.08 ?0
>
> Apologies, I know this is very complex. Any help, even just some pointers
on
> how to write a general function where values are separated by a comma, is
> realy very much appreciated!
>
> Thank you
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Maybe Matching Threads

Search for more seemingly similar threads

R help - Oct 2010 - function using values separated by a comma

[R] function using values separated by a comma

[R] function using values separated by a comma

[R] function using values separated by a comma

[R] function using values separated by a comma

[R] function using values separated by a comma

Maybe Matching Threads