Camilo Mora
2013-Mar-23 11:08 UTC
[R] Error with custom function in apply: Error in FUN(newX[, i], ...) : unused argument(s) (newX[, i])
Hi everyone, I wonder if I can get your help using a custom function in apply. Imagine the following dataframe called "data": LowLim HighLim A1 A2 A3 A4 4 6 3 4 5 6 4 6 4 5 5 6 2 3 1 4 2 3 2 3 NA NA NA NA We have created a custom function (see below) that takes the values in a given row between columns A1 to A4 to see if they are outside the limits in the same row set by columns LowLim and HighLim, if at least x consecutive values are outside, the function returns the column name of the first column in that series. If no value is outside, the function returns the name of the last column and if there are NAs, the function returns NA. So in the example above, the function return the following results: Considering 2 consecutive values outside the limits: A4 A4 A1 NA Considering 1 value outside the limits: A1 A4 A1 NA The problem we have is that our dataframe has over 1.2 million rows. So right now we are using a loop (see below), which work fine by entering the values of each row in our function but it will take several days to complete. The idea is to see if we can use our function with apply. Basically, data$Results<-apply(data,1, ColChange, LowLim=data[1], HighLim=data[2], Vals=data[3:6], NumConsecOut=2) But we get the following error: ?Error in FUN(newX[, i], ...) : unused argument(s) (newX[, i])? Any idea about this error or an alternative way to obtain the results we look for. Thank you very much, Camilo ################# Our function is: ####################### ColChange <- function(LowLim, HighLim, Vals, NumConsecOut) { cols <- ncol(Vals) yr_init = 0 k = 0 for (i in 1:cols){ val = Vals[i] if (is.na(val)){ Result="NA" next } else { if (val<LowLim||val>HighLim){ if (yr_init==0) { yr_init = colnames(Vals)[i] k = k + 1 } else { k = k + 1 } if (k==NumConsecOut){ Result=yr_init break } } else { yr_init=0 k=0 } if (yr_init==0){ Result=colnames(Vals)[cols] } } } return(Result) } ################################################# #######Our loop is:############################### nrows=nrow(data) a=c() for(b in 1:nrows) { c= ColChange(data[b,1],data[b,2],data[b,3:6],2) a=rbind(a,c) } ################################################# Camilo Mora, Ph.D. Department of Geography, University of Hawaii Currently available in Colombia Phone: Country code: 57 Provider code: 313 Phone 776 2282 From the USA or Canada you have to dial 011 57 313 776 2282 http://www.soc.hawaii.edu/mora/
Rui Barradas
2013-Mar-23 11:46 UTC
[R] Error with custom function in apply: ”Error in FUN(newX[, i], ...) : unused argument(s) (newX[, i])”
Hello, The following should be faster. It preallocates a vector of length nrows instead of extending 'a' on every iteration. a2 <- character(nrows) for(b in 1:nrows) { c= ColChange(data[b,1],data[b,2],data[b,3:6],2) a2[b]=c } all.equal(c(a), a2) As for the use of apply, I'm getting errors but the way to call it would be, after changing ColChange in order to correct some of the errors, (changed: ncol to NCOL and colnames to names) ColChange2 <- function(LowLim, HighLim, Vals, NumConsecOut) { cols <- NCOL(Vals) yr_init = 0 k = 0 for (i in 1:cols){ val = Vals[i] if (is.na(val)){ Result="NA" next } else { if (val<LowLim||val>HighLim){ if (yr_init==0) { yr_init = names(Vals)[i] k = k + 1 } else { k = k + 1 } if (k==NumConsecOut){ Result=yr_init break } } else { yr_init=0 k=0 } if (yr_init==0){ Result = names(Vals)[cols] } } } return(Result) } ################################################# apply(data, 1, function(x) ColChange2(x[1], x[2], x[3:6], 2) ) Error in ColChange2(x[1], x[2], x[3:6], 2) : object 'Result' not found So there's some debugging to be done. Anyway, the revised loop should be much faster. Hope this helps, Rui Barradas Em 23-03-2013 11:08, Camilo Mora escreveu:> Hi everyone, > > I wonder if I can get your help using a custom function in apply. > > Imagine the following dataframe called "data": > > LowLim HighLim A1 A2 A3 A4 > 4 6 3 4 5 6 > 4 6 4 5 5 6 > 2 3 1 4 2 3 > 2 3 NA NA NA NA > > We have created a custom function (see below) that takes the values in a > given row between columns A1 to A4 to see if they are outside the limits > in the same row set by columns LowLim and HighLim, if at least x > consecutive values are outside, the function returns the column name of > the first column in that series. If no value is outside, the function > returns the name of the last column and if there are NAs, the function > returns NA. > > So in the example above, the function return the following results: > > Considering 2 consecutive values outside the limits: > A4 > A4 > A1 > NA > > Considering 1 value outside the limits: > A1 > A4 > A1 > NA > > The problem we have is that our dataframe has over 1.2 million rows. So > right now we are using a loop (see below), which work fine by entering > the values of each row in our function but it will take several days to > complete. The idea is to see if we can use our function with apply. > Basically, > > data$Results<-apply(data,1, ColChange, LowLim=data[1], HighLim=data[2], > Vals=data[3:6], NumConsecOut=2) > > But we get the following error: ?Error in FUN(newX[, i], ...) : unused > argument(s) (newX[, i])? > > Any idea about this error or an alternative way to obtain the results we > look for. > > Thank you very much, > > Camilo > > ################# Our function is: ####################### > ColChange <- function(LowLim, HighLim, Vals, NumConsecOut) { > cols <- ncol(Vals) > yr_init = 0 > k = 0 > for (i in 1:cols){ > val = Vals[i] > if (is.na(val)){ > Result="NA" > next > } else { > if (val<LowLim||val>HighLim){ > if (yr_init==0) { > yr_init = colnames(Vals)[i] > k = k + 1 > } else { > k = k + 1 > } > if (k==NumConsecOut){ > Result=yr_init > break > } > } else { > yr_init=0 > k=0 > } > if (yr_init==0){ > Result=colnames(Vals)[cols] > } > } > } > return(Result) > } > ################################################# > > > > #######Our loop is:############################### > nrows=nrow(data) > > a=c() > > for(b in 1:nrows) > { > c= ColChange(data[b,1],data[b,2],data[b,3:6],2) > a=rbind(a,c) > } > ################################################# > > > Camilo Mora, Ph.D. > Department of Geography, University of Hawaii > Currently available in Colombia > Phone: Country code: 57 > Provider code: 313 > Phone 776 2282 > From the USA or Canada you have to dial 011 57 313 776 2282 > http://www.soc.hawaii.edu/mora/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.