thr3ads.net - R help - AW: AW: [R] constructing specially ordered factor [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Khamenia, Valery

2004-Oct-05 07:02 UTC

AW: AW: [R] constructing specially ordered factor

> Please follow the posting guide and do your homework before 
> posting, 
1. my last homework in university was done a lot of years ago.

2. I always try to follow posting guide.

>      An object of the same type of 'x'. but if an element is equal
to
>      one with a smaller index, it is removed.
> 
> so the order is preserved, by definition.
Here stated when element is removed. There is no explicit statement,
that the order is preserved. If one writes its own implementation
with reodered output, it still matches the docs. Or?
 
> BTW it uses hashing for `acceleration', not something as slow 
> as sorting.
BTW, do you mean that current hash-based implementation brings *clearly* 
better performance than any O(n*log(n)) sort based algorithm? 
If I have correctly understood src/main/unique.c then current 
hash function is niether minimal perfect hash function nor even 
minimal hash function. In addition, as might be expected, 
current hash function uses full pass through the string to get 
a hash key. 

So, in particular, can anyone clearly show that the current 
hash-based algorithm will be quicker then sort-based 
algorithm if the input has:

1. a lot of strings;
2. strings are very long; 
3. strings are quite unsimilar

?

Hm, I don't believe you are ready to prove smth. like that. 

P.S. Sorry for broken email-reference. I am bound to Outlook 
Express now.

--
Valery

Thomas Lumley

2004-Oct-05 14:38 UTC

head link

AW: AW: [R] constructing specially ordered factor

On Tue, 5 Oct 2004, Khamenia, Valery wrote:> BTW, do you mean that current hash-based implementation brings *clearly*
> better performance than any O(n*log(n)) sort based algorithm?
> If I have correctly understood src/main/unique.c then current
> hash function is niether minimal perfect hash function nor even
> minimal hash function. In addition, as might be expected,
> current hash function uses full pass through the string to get
> a hash key.
>
> So, in particular, can anyone clearly show that the current
> hash-based algorithm will be quicker then sort-based
> algorithm if the input has:
>
> 1. a lot of strings;
> 2. strings are very long;
> 3. strings are quite unsimilar
>
> ?
>
> Hm, I don't believe you are ready to prove smth. like that.
Clearly the algorithm is not optimal for all possible sets of 
inputs (eg if the inputs are already known to be unique then an even 
faster implementation is just to do nothing).

As R's string comparison function use C's strcmp, for which the C
standard
makes no performance guarantees whatsoever, it is not possible to prove 
*anything* about the relative performance of algorithms without making 
some additional assumptions.

However, if this was a serious question, it is known that
a) in some versions of R on some byte-orderings the hash was broken and 
the performance was far inferior, suggesting that it is ordinarily quite 
effective.
b) Switching the rowsum function from a sort-based implementation to 
hashing produced a substantial speed increase.

 	-thomas

Apparently Analagous Threads

Search for more maybe matching threads

R help - Oct 2004 - AW: AW: constructing specially ordered factor

AW: AW: [R] constructing specially ordered factor

AW: AW: [R] constructing specially ordered factor

Apparently Analagous Threads