Dear list,
I've tried to search the archives but found nothing, although I may
use the wrong wording in my searches. I've also double-checked the
upData function in Hmisc, but it does something else.
I'm wondering if one can update a dataframe by "forcing into" it a
shorter dataframe containing the corrections, like the "update"
provided in SAS data steps.
In this simple example:
a <- data.frame(id=c(1:5),x=rnorm(5))
b <- data.frame(id=4,x=rnorm(1))
> a
id x
1 1 0.6557921
2 2 0.1897523
3 3 0.7976721
4 4 0.2107103
5 5 -0.8855786
> b
id x
1 4 0.8369147
I would like the "updated" dataframe to look like (row names are not
important to me)
id x
1 1 0.6557921
2 2 0.1897523
3 3 0.7976721
4 4 0.8369147
5 5 -0.8855786
I thought this could be done with merge, but this never removes the
old version of a row, it just gives me two rows with id==4.
I thought of this solution:
reject <- a$id %in% b$id
a2 <- a[!reject,]
a3 <- rbind(a2,b)
> a3
id x
1 1 0.6557921
2 2 0.1897523
3 3 0.7976721
5 5 -0.8855786
11 4 0.8369147
This works, and obviously it is not the best way to make the
correction in a simple case like this. But providing a few lines of
corrected data can be an effective method with large dataframes,
especially if many identifier (grouping) variables are needed to
identify each line that needs updating, and in this context my
solution above rapidly becomes ugly.
Furthermore (but I can live with this constraint) this method removes
entire rows, so I need to make sure the dataframe used to make
corrections contains all the Y variables in the original dataframe,
even those that do not need correcting.
If a method exists to just change one variable in 5 lines for a
dataframe of 5000 lines and 30 variables, I'd appreciate learning
about it. But I'll already be thrilled if I can update whole lines at
a time.
Sincerely,
Denis Chabot