thr3ads.net - R help - [R] Applying function to multiple data [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Akshata Rao

2011-Mar-03 10:06 UTC

[R] Applying function to multiple data

Dear R helpers,

I know R language at a preliminary level. This is my first post to this R
forum. I have recently learned the use of function and have been successful
in writing few on my own. However I am not able to figure out how to apply
the function to multiple sets of data.

# MY QUERY

Suppose I am having following data.frame

df = data.frame(k = c(1:8), ratings = c("A", "B",
"C", "D", "E", "F", "G",
"H"),
default_frequency
c(0.00229,0.01296,0.01794,0.04303,0.04641,0.06630,0.06862,0.06936))

# -------------------------------

DP = function(k, ODF, ratings)

{

n                   <-  length(ODF)
tot_klnODF    <-  sum(k*log(ODF))
tot_k             <-  sum(k)
tot_lnODF     <-  sum(log(ODF))
tot_k2           <-  sum(k^2)
slope            <-  exp((n * tot_klnODF - tot_k * tot_lnODF)/(n * tot_k2 -
tot_k^2))
intercept       <-  exp((tot_lnODF - log(slope)* tot_k)/n)
IPD               <-  intercept * slope^k

return(data.frame(ratings = ratings, default_probability = round(IPD, digits
= 4)))

}

result = DP(k = df$k, ODF = df$default_frequency, ratings = df$ratings)

#
________________________________________________________________________________________

The above code fetches me following result. However, I am dealing with only
one set of data here as defined in 'df'.
> result
  ratings default_probability
1       A              0.0061
2       B              0.0094
3       C              0.0145
4       D              0.0222
5       E              0.0342
6       F              0.0527
7       G              0.0810
8       H              0.1247


# MY PROBLEM

Suppose I have data as given below

Class            k      rating      default_frequency
Bank            1         A            0.00229
                   2         B             0.01296
                   3         C             0.01794
                   4         D             0.04303
                   5         E             0.04641
                   6         F             0.06630
                  7         G             0.06862
                  8         H             0.06936
Corporate    1         A             0.00101
                  2         B             0.01433
                  3         C             0.02711
                  4         D             0.03701
                  5         E             0.04313
                  6         F             0.05600
                  7         G             0.06041
                  8         H             0.07112
Sovereign    1         A             0.00210
                  2         B             0.01014
                  3         C             0.02001
                  4         D             0.04312
                  5         E             0.05114
                  6         F             0.06801
                  7         G             0.06997
                  8         H             0.07404

So I need to use the function "DP" defined above to generate three
sets of
results viz. for Bank, Corporate, Sovereign and save each of these results
as diffrent csv files say as bank.csv, corporate.csv etc. Again please note
that there could be say 'm' number of classes. I was trying to use the
apply
function but things are not working for me. I will really apprecaite the
guidenace. I hope I am able to put up my query in a neat manner.

Regards and thanking you all in advance.

Akshata Rao

	[[alternative HTML version deleted]]

Ivan Calandra

2011-Mar-03 15:14 UTC

head link

[R] Applying function to multiple data

Hi,

It might not be the best approach, but here is what I would do.

##########

1) If you have your data in 3 different data.frames:

#create a named list where each element is one of your data.frame
list_df <- vector(mode="list", length=3)
names(list_df) <- c("Bank", "Corporate",
"Sovereign")

list_df[[1]] <- data.frame(k = c(1:8), ratings = c("A",
"B", "C", "D",
"E", "F", "G","H"), default_frequency = 
c(0.00229,0.01296,0.01794,0.04303,0.04641,0.06630,0.06862,0.06936))
list_df[[2]] <- data.frame(k = c(1:8), ratings = c("A",
"B", "C", "D",
"E", "F", "G","H"), default_frequency = 
c(0.00101,0.01433,0.02711,0.03701,0.04313,0.05600,0.06041,0.07112))
list_df[[3]] <- data.frame(k = c(1:8), ratings = c("A",
"B", "C", "D",
"E", "F", "G","H"), default_frequency = 
c(0.00210,0.01014,0.02001,0.04312,0.05114,0.06801,0.06997,0.07404))

#apply your function DP to each element of the list, i.e. to each 
data.frame:
out1 <- lapply(list_df, FUN=function(x) DP(k=x$k, 
ODF=x$default_frequency, ratings=x$ratings))

##########

2) If you have your data in a single data.frame, as it looks from your 
example, I would first fill all the cells, so that it looks like this:

df2 <- structure(list(Class = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 
.Label = c("Bank", "Corporate", "Sovereign"),
class = "factor"), k =
c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L), rating = structure(c(1L, 2L, 3L, 4L, 5L, 
6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L), .Label = c("A", "B", "C", "D",
"E", "F", "G", "H"), class =
"factor"), default_frequency = c(0.00229, 0.01296, 0.01794, 0.04303, 
0.04641, 0.0663, 0.06862, 0.06936, 0.00101, 0.01433, 0.02711, 0.03701, 
0.04313, 0.056, 0.06041, 0.07112, 0.0021, 0.01014, 0.02001, 0.04312, 
0.05114, 0.06801, 0.06997, 0.07404)), .Names = c("Class",
"k",
"ratings", "default_frequency"), class =
"data.frame", row.names = c(NA,
-24L))

#then split by Class:
list_df2 <- split(df2, df2$Class)
#and apply as before:
out2 <- lapply(list_df2, FUN=function(x) DP(k=x$k, 
ODF=x$default_frequency, ratings=x$ratings))

#or in one step using plyr:
library(plyr)
out3 <- dlply(.data=df2, .variables="Class", .fun=function(x)
DP(k=x$k,
ODF=x$default_frequency, ratings=x$ratings))


##########

3) all solutions give the same results:

all.equal(out1, out2, check.attributes=FALSE)
[1] TRUE
all.equal(out1, out3, check.attributes=FALSE)
[1] TRUE
all.equal(out2, out3, check.attributes=FALSE)
[1] TRUE


HTH,
Ivan




Le 3/3/2011 11:06, Akshata Rao a ?crit :> Dear R helpers,
>
> I know R language at a preliminary level. This is my first post to this R
> forum. I have recently learned the use of function and have been successful
> in writing few on my own. However I am not able to figure out how to apply
> the function to multiple sets of data.
>
> # MY QUERY
>
> Suppose I am having following data.frame
>
> df = data.frame(k = c(1:8), ratings = c("A", "B",
"C", "D", "E", "F", "G",
> "H"),
> default_frequency >
c(0.00229,0.01296,0.01794,0.04303,0.04641,0.06630,0.06862,0.06936))
>
> # -------------------------------
>
> DP = function(k, ODF, ratings)
>
> {
>
> n<-  length(ODF)
> tot_klnODF<-  sum(k*log(ODF))
> tot_k<-  sum(k)
> tot_lnODF<-  sum(log(ODF))
> tot_k2<-  sum(k^2)
> slope<-  exp((n * tot_klnODF - tot_k * tot_lnODF)/(n * tot_k2 -
> tot_k^2))
> intercept<-  exp((tot_lnODF - log(slope)* tot_k)/n)
> IPD<-  intercept * slope^k
>
> return(data.frame(ratings = ratings, default_probability = round(IPD,
digits
> = 4)))
>
> }
>
> result = DP(k = df$k, ODF = df$default_frequency, ratings = df$ratings)
>
> #
>
________________________________________________________________________________________
>
> The above code fetches me following result. However, I am dealing with only
> one set of data here as defined in 'df'.
>
>> result
>    ratings default_probability
> 1       A              0.0061
> 2       B              0.0094
> 3       C              0.0145
> 4       D              0.0222
> 5       E              0.0342
> 6       F              0.0527
> 7       G              0.0810
> 8       H              0.1247
>
>
> # MY PROBLEM
>
> Suppose I have data as given below
>
> Class            k      rating      default_frequency
> Bank            1         A            0.00229
>                     2         B             0.01296
>                     3         C             0.01794
>                     4         D             0.04303
>                     5         E             0.04641
>                     6         F             0.06630
>                    7         G             0.06862
>                    8         H             0.06936
> Corporate    1         A             0.00101
>                    2         B             0.01433
>                    3         C             0.02711
>                    4         D             0.03701
>                    5         E             0.04313
>                    6         F             0.05600
>                    7         G             0.06041
>                    8         H             0.07112
> Sovereign    1         A             0.00210
>                    2         B             0.01014
>                    3         C             0.02001
>                    4         D             0.04312
>                    5         E             0.05114
>                    6         F             0.06801
>                    7         G             0.06997
>                    8         H             0.07404
>
> So I need to use the function "DP" defined above to generate
three sets of
> results viz. for Bank, Corporate, Sovereign and save each of these results
> as diffrent csv files say as bank.csv, corporate.csv etc. Again please note
> that there could be say 'm' number of classes. I was trying to use
the apply
> function but things are not working for me. I will really apprecaite the
> guidenace. I hope I am able to put up my query in a neat manner.
>
> Regards and thanking you all in advance.
>
> Akshata Rao
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. S?ugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php

Possibly Parallel Threads

loading data frames and rbind them

R help - Mar 2011 - Applying function to multiple data

[R] Applying function to multiple data

[R] Applying function to multiple data

Possibly Parallel Threads