On May 8, 2014, at 9:49 AM, Abhinaba Roy wrote:
> Hi R helpers,
>
> I have a dataframe like
>
> ID Yr_Mnth AMT_PAID AMT_DUE paidToDue
> CS00000026A 201301 320.48 1904 0.168319328
> CS00000026A 201302 4881.31 15708 0.310753119
> CS00000026A 201303 7609.04 25585 0.297402384
> CS00000026A 201304 9782.70 21896 0.446780234
> CS00000026A 201305 6482.01 22015 0.294436066
> CS00000026A 201306 5226.28 14280 0.365985994
> CS00000026A 201307 9078.47 19040 0.476810399
> CS00000026A 201308 7060.33 23800 0.296652521
> CS00000026A 201309 7595.57 17136 0.443252218
> CS00000026A 201310 5388.64 24752 0.217705236
>
> The problem I am facing is to capture the change in 'paidToDue'
which is
> define as follows
>
> Let 'm' be the value of 'Yr_Mnth' in the current row
(except the 1st row)
> and 'm-1' be that in the previous row
>
> I am trying to add a column to the dataframe 'Change' which will
have
> values 'Improve','Deteriorate' and 'No change',
which are defined as
>
>
> if (AMT_PAID(m) != AMT_PAID(m-1)) & sign(paidToDue(m)-paidToDue(m-1)==1
&
> abs(paidToDue(m)-paidToDue(m-1))>0.1 then 'Change' =
'Improve'
There is a `diff` function that may make this all much simpler:
You could translate (AMT_PAID[m] != AMT_PAID[(m-1]) to
diff(AMT_PAID) != 0 # length is 1 shorter than the input vector
And sign(paidToDue[m]-paidToDue[m-1] ) ==1 to
diff(paidToDue) > 0 # can pad with c(NA, ...)
From your incorrect use of parentheses for indexing, I'm guessing you are
very new to R programming. You also attempted to paste a CSV file and that was
rejected by the mail-server which only accepts MIME-text formatted files.
Despite the fact that most csv files really are text files, they often get
labeled differently by posters' mail clients.
> if (AMT_PAID(m) != AMT_PAID(m-1)) & sign(paidToDue(m)-paidToDue(m-1) ==
-1
> & abs(paidToDue(m)-paidToDue(m-1)) > 0.1 then 'Change' =
'Deteriorate'
>
> else 'Change' = 'No change'
If this were just a matter of differences in 'paidToDue' within values
of ID, then it would be as simple as:
dat$Change <- with( dat, ave( paidToDue, ID, FUN=function(x){
c(NA, c('Deteriorate', 'No change',
'Improve)[findInterval(x, c(-Inf, -0.1, 0.1, Inf) )] ) } )
)>
>
> Note: I have 5000 unique ID in the data and this has to be done for each ID
> and the data is sorted by Yr_Mnth.
When you need to use multiple columns as input and work across rows I generally
use an lapply( split(), fun)-strategy.
>
> Please find attached the csv file for reference.
>
> How can it be done in R?
It's not going to be terribly difficult, but I'm concerned this is
homework, so not trying for a complete solution. You have not done very much in
the way of setting the context.
> --
> Regards
> Abhinaba Roy
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA