On Sat, 2007-02-17 at 17:34 +0100, Johannes Graumann
wrote:> Hi all,
>
> My current project brought forth the snippet below, which modifies in each
> row of a data frame a certain field depending on another field in the same
> row. Dealing with data of some 30000 entries this works, but is horribly
> slow. Can anyone show this newbie how to do this properly (faster ;0)?
>
> for (i in 1:nrow(dataframe)){
> if (any(grep('^yes$',dataframe[i,][['Field1']]))){
> dataframe[i,]['Field1'] <- dataframe[i,]['Field2']
> } else {
> dataframe[i,]['Field1'] <- NA
> }
> }
>
> Thanks for your insights, Joh
Beyond the for() loop issue. you are doing a lot of unnecessary
subsetting.
For example:
dataframe[i,][['Field1']]
can be replaced with:
dataframe[['Field1']]
or if you have to loop:
dataframe[i, 'Field1']
See ?Extract
One clarification question on your use of grep(), which is do you have
entries that have a 'yes' at the end of the field, or are you just
looking for a field entry != 'yes'? If the latter, you don't need
to
use grep() of course.
One potential approach is the following:
dataframe[["Field1"]] <- with(dataframe,
ifelse(any(grep("^yes$", Field1)),
Field2, NA))
If you are just looking for an entry != "yes", then:
dataframe[["Field1"]] <- with(dataframe,
ifelse(Field1 != "yes",
Field2, NA))
See ?ifelse and ?with. Also look at ?replace for an alternative way to
replace() values based upon conditions.
HTH,
Marc Schwartz