On Tue, Dec 14, 2004 at 12:36:53PM -0300, Georges Dupret
wrote:> I need to iterate over pairs of terms to compute the term correlation
> matrix. My first attempt was:
>
> for(term1 = db.allterms_begin(); term1 != db.allterms_end(); ++term1){
> for(term2 = term1 + 1; term2 != db.allterms_end(); ++term2){
> ...
> }
> }
>
> this doesn't work because term1 + 1 is not defined, so I did
>
> for(term1 = db.allterms_begin(); term1 != db.allterms_end(); ++term1){
> term2 = term1;
> ++term2;
> for(; term2 != db.allterms_end(); ++term2){
> ...
> }
> }
>
> and to my surprise, incrementing term2 incremented as well term1. Is
> this what is really intended?
Yes. TermIterators have the semantics of STL input iterators. If you
copy and increment, using the old iterator gives undefined behaviour
(at present I believe you'll always get both incremented, but that
might change in future).
>Finally, I solved the problem with
>
> for(term1 = db.allterms_begin(); term1 != db.allterms_end(); ++term1){
> term2 = db.allterms_begin();
> term2.skip_to(*term1);
> if(term2 == db.allterms_end()){
> cerr << "term2 end of list while term1 is '" <<
*term1 << "'\n";
> exit(1);
> }
> else
> ++term2;
> for(; term2 != db.allterms_end(); ++term2){
> ...
> }
> }
That looks about right. Perhaps we should offer a "clone" method
which
creates a separate iterator using allterms_begin() and skip_to().
> Note that if term2 is not set to db.allterms_begin(), the code crashes.
>
> Is there a more elegant way to iterate over pairs of terms?
If you're manipulating the terms a lot, you could pull them out into a
vector or something first, then use that. For just iterating twice, I
suspect it's not worthwhile, and sucking everything into memory works
less well if the database is too big, whereas iterating from the disk
table probably degrades more gracefully.
Cheers,
Olly