Hello, I'm doing some analysis on a rather large data set. In this case, some simple commands are failing. For example, this one:> x$eventtype <- factor(x$eventtype)Error in unique.default(x) : length 1093574297 is too large for hashing ...I think this is a bug, because "hashing" should not be required for the "factor" function. Am I right? The whole column does not need to be hashed, only the unique keys. Sure, there is the potential to overflow the key register, but this error should be thrown only if that occurs, no? Cordially, Adam D. I. Kramer, Ph.D. Data Scientist, Facebook, Inc. akramer at fb.com
On 05/04/2012 2:03 PM, Adam D. I. Kramer wrote:> Hello, > > I'm doing some analysis on a rather large data set. In this case, > some simple commands are failing. For example, this one: > > > x$eventtype<- factor(x$eventtype) > Error in unique.default(x) : length 1093574297 is too large for hashing > > ...I think this is a bug, because "hashing" should not be required for the > "factor" function. Am I right? The whole column does not need to be hashed, > only the unique keys. Sure, there is the potential to overflow the key > register, but this error should be thrown only if that occurs, no?It looks as though the error is coming when unique() tries to determine the unique levels in the argument, but really there's no way to answer your question without more information. What type of object is x$eventtype? It is really 1093574297 elements long? How many unique values does it have? Duncan Murdoch