thr3ads.net - R devel - [Rd] stats::fft produces inconsistent results [Oct 2021]

If this information is useful, please help other people find it:
Share via:

Dipterix Wang

2021-Oct-20 20:00 UTC

[Rd] stats::fft produces inconsistent results

Wow, you guys are amazing!
>>> as part of its pipeline, ravetools::mvfftw computes the mean of the
>>> input vector **and then centers it to a mean of zero**
(intentionally or
>>> accidentally?)
> 
>>> because variables are passed to compiled code by reference (someone
>>> can feel free to correct my terminology), this means that the
original
>>> vector in R now has a mean of zero

I didn?t center the input vector in my code. The data was fed ?as-is? into
FFTW3. My guess is FFTW3 internally center the data. It could be that FFTW3
library behave differently on different platforms, which could explain the
reproducibility issue.
>>> "Indeed, R vectors are passed "by reference" to C
code, but the semantic must be "by value", i.e. the C function must
NOT change the contents of the vector, except in very specific cases.?

CRAN has already had fftw and fftwtools, the issue is the data I?m targeting at
are at GB-level, copying the vectors can be memory inefficient or even use up
memories. The strategy of ravetools is to import signals from local files, fft,
then directly write to disk. So only one reference will be used and modifying
in-place is on purpose. In fact, and the fft functions I created are not
intended to be used directly by users.

However, I should've been very cautious when using these functions. This is
my fault. I?ll check the whole package to make sure only one reference is used
or otherwise the vectors will be copied.
>>> This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.

Nice to learn! I?ll add it to my code.
>>> Properly using R vectors in C code is tricky. You have to
understand.
>>> 1) When you are allowed or not to modify vectors
>>> 2) When to PROTECT()vectors
>>> 3) How the garbage collector works and when it can trigger (answer
: basically, when you call any internal R function)
>>> Chapter 5 of "Writing R Extensions" documentation is
quite extensive:
>>> 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C
<https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C>
Indeed, I found myself often confused about when to PROTECT and when not. 
>>>> ... but then ravetools  is not even a CRAN package, so why
should you dare to use it for anything serious ?
Haha, thanks : ) I guess I will probably be grouchy too if seeing so many people
making the same mistakes again and again. It just happened to be me.
But it?s good to be rigorous. Sooner or later I'll have to face these
problems. It?s better to make mistakes before having made many.

Thanks y?all!

Best,
- Dipterix Wang

> On Oct 20, 2021, at 5:32 AM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
> 
>>>>>> Martin Maechler 
>>>>>>    on Wed, 20 Oct 2021 11:26:21 +0200 writes:
> 
> [............]
> 
>> Thank you, Andr? , that's very good.
> 
>> Just to state the obvious conclusion:
> 
>> If Ben's suggestion is correct (and Andr? has explained *how*
>> that could happen) this would mean  a
>> SEVERE BUG in package ravetools's  mvfftw() function.
> 
>> and it would have been (yet another) case of gaining speed by
>> killing correctness...
> 
>> ... but then ravetools  is not even a CRAN package, so why
>> should you dare to use it for anything serious ?
> 
>> ... yes, being grouchy ..
> 
> which I should rather not be.
> 
> Dipterix Wang *did* say initially that he is currently
> developing ravetools so it's very reasonabl this is not yet a
> CRAN package..
> 
> Best,
> Martin
> 

	[[alternative HTML version deleted]]

GILLIBERT, Andre

2021-Oct-21 10:32 UTC

head link

[Rd] stats::fft produces inconsistent results

> Haha, thanks : ) I guess I will probably be grouchy too if seeing so many
people making the same mistakes again and again. It just happened to be me.

Fortunately, you did not get offensed. :)

This is nice to have a large community of developers for R packages, even if,
sometimes, buggy packages are annoying R developers because any small change in
R may "break" them even though they were actually broken from the
begining.

> Indeed, I found myself often confused about when to PROTECT and when not.

A (relatively) quick explanation.

There are several ?pools? of data objects that have different rules. The most
common ?pool? is the pool of garbage collectable R objects, that can be
allocated with allocVector and is passed from R to C code and vice versa.
Another pool is the malloc/free pool, that works with explicit
allocation/deallocation. R does not modify the malloc/free implementation in any
way, and memory leaks may happen. Operating systems may have other pools of
memory (e.g. mmap'ed memory) that are not handled by R either. There is also
a transient storage (R_alloc/vmaxset/vmaxget) that is automatically freed when
returning from C to R, and should be used for temporary storage but not for
objects returned to R code.

The PROTECT system is needed for garbage collectable objects.

The garbage collector may trigger whenever a R internal function is called.
Typically, when some memory is internally allocated.

The garbage collector frees objects that are neither referenced directly nor
indirectly from R code and from the PROTECT stack.

The PROTECT stack is used by C code to make sure objects that are not yet (or
will never be) referenced by R code, are not destroyed when the garbage
collector runs.

The functions allocating new R objects, such as allocVector(), but also
coerceVector(), duplicate(),return unprotected objects, that may be destroyed
the next time an internal R function is called, unless it is explicitly
PROTECT'ed before. Indeed, such objects would have no reference from R code
and so, would be deleted.

The PROTECT stack must be balanced on a call from R to a C function. There must
be as many UNPROTECT'ions than PROTECT'ions.

The typical C code PROTECTs any object allocated as soon as it is allocated
(e.g. call to allocVector or coerceVector). It UNPROTECTs temporary objects to
"free" them (the actual memory release may be delayed to the next
garbage collection). It UNPROTECTs the object it returns to R code. Indeed, in
pure C code, there will be no garbage collection between the time the object is
UNPROTECTed and the time R grabs the object. You must be very careful if you are
using C++, because destructors must not call any R internal function that may
trigger a garbage collection.
The arguments to the C code, do not have to be PROTECT'ed, unless they are
re-allocated. For instance, it is frequent to call coerceVector or arguments and
re-assign them to the C variable that represents the argument. The new object
must be PROTECT'ed.

Actually, you do not need to *directly* PROTECT all objects that are allocated
in the C function, but you must make sure that all objects are *indirectly*
PROTECT'ed. For instance, you may allocate a VECSXP (a "list" in
R) and fill the slots with newly allocated objects. You only need to PROTECT the
VECSXP, since its slots are indirectly protected.

If you have any doubt, it is not a bug to over-PROTECT objects. It may slightly
slow down garbage collection and use space on the PROTECTion stack, but that is
rarely a big deal. You should only avoid that when that would lead to thousands
or millions of protections.

As I said, the PROTECT stack must be balanced between the entry and exit of the
C code. This is not a problem for 99% of functions that free all the memory they
use internally except the object that is returned. Sometimes, some
"background" memory, hidden to R code, may have to be allocated for
more time. A call to R_PreserveObject protects the object, even after the C code
returns to R, until R_ReleaseObject is called. Without an explicit call to
R_ReleaseObject, memory is leaked!

There is another mechanism in R that must be known. If you call any R function
from C code, or any internal R function that may fail with an error, or any
internal R function that can be stopped by the user (see R_CheckUserInterrupt),
then, R may call a longjmp to exit all the C code. This is very much
incompatible with C++ exceptions or constructors/destructors. Rcpp can avoid, to
some extent, that problem.

With C code, this means that some malloc'ed memory or allocated resources
(file descriptors, sockets, etc.) may be leaked unless something is done to
prevent that. All PROTECT'ed objects are automatically unprotected, so there
is no problem with memory leak of garbage collectable objects. There is a
R_UnwindProtect() mechanism to free temporary resources (e.g. a socket you
allocated) when a longjmp is triggered. Non-memory resources (e.g. a socket)
returned to R should use the R_MakeExternalPtr() mechanism to make sure that,
when the memory is freed by the garbage collector, the resource is also freed.

"Writing R extensions" contains more extensive documentation, but I
hope that my quick description of the system will make it easier to understand
the extensive documentation.

 --

Sinc?rement

Andr? GILLIBERT

De : Dipterix Wang <dipterix.wang at gmail.com>
Envoy? : mercredi 20 octobre 2021 22:01
? : Martin Maechler <maechler at stat.math.ethz.ch>; GILLIBERT, Andre
<Andre.Gillibert at chu-rouen.fr>; bbolker at gmail.com
Cc : r-devel at r-project.org
Objet : Re: [Rd] stats::fft produces inconsistent results

ATTENTION: Cet e-mail provient d?une adresse mail ext?rieure au CHU de Rouen. Ne
cliquez pas sur les liens ou n'ouvrez pas les pi?ces jointes ? moins de
conna?tre l'exp?diteur et de savoir que le contenu est s?r. En cas de doute,
transf?rer le mail ? ? DSI, S?curit? ? pour analyse. Merci de votre vigilance

Wow, you guys are amazing!

as part of its pipeline, ravetools::mvfftw computes the mean of the
input vector **and then centers it to a mean of zero** (intentionally or
accidentally?)

because variables are passed to compiled code by reference (someone
can feel free to correct my terminology), this means that the original
vector in R now has a mean of zero

I didn?t center the input vector in my code. The data was fed ?as-is? into
FFTW3. My guess is FFTW3 internally center the data. It could be that FFTW3
library behave differently on different platforms, which could explain the
reproducibility issue.

"Indeed, R vectors are passed "by reference" to C code, but the
semantic must be "by value", i.e. the C function must NOT change the
contents of the vector, except in very specific cases.?

CRAN has already had fftw and fftwtools, the issue is the data I?m targeting at
are at GB-level, copying the vectors can be memory inefficient or even use up
memories. The strategy of ravetools is to import signals from local files, fft,
then directly write to disk. So only one reference will be used and modifying
in-place is on purpose. In fact, and the fft functions I created are not
intended to be used directly by users.

However, I should've been very cautious when using these functions. This is
my fault. I?ll check the whole package to make sure only one reference is used
or otherwise the vectors will be copied.

This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.

Nice to learn! I?ll add it to my code.

Properly using R vectors in C code is tricky. You have to understand.

1) When you are allowed or not to modify vectors

2) When to PROTECT()vectors

3) How the garbage collector works and when it can trigger (answer : basically,
when you call any internal R function)

Chapter 5 of "Writing R Extensions" documentation is quite extensive:

 https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C

Indeed, I found myself often confused about when to PROTECT and when not.

... but then ravetools  is not even a CRAN package, so why should you dare to
use it for anything serious ?

Haha, thanks : ) I guess I will probably be grouchy too if seeing so many people
making the same mistakes again and again. It just happened to be me.

But it?s good to be rigorous. Sooner or later I'll have to face these
problems. It?s better to make mistakes before having made many.

Thanks y?all!

Best,

- Dipterix Wang

On Oct 20, 2021, at 5:32 AM, Martin Maechler <maechler at
stat.math.ethz.ch<mailto:maechler at stat.math.ethz.ch>> wrote:

Martin Maechler
   on Wed, 20 Oct 2021 11:26:21 +0200 writes:

[............]

Thank you, Andr? , that's very good.

Just to state the obvious conclusion:

If Ben's suggestion is correct (and Andr? has explained *how*
that could happen) this would mean  a
SEVERE BUG in package ravetools's  mvfftw() function.

and it would have been (yet another) case of gaining speed by
killing correctness...

... but then ravetools  is not even a CRAN package, so why
should you dare to use it for anything serious ?

... yes, being grouchy ..

which I should rather not be.

Dipterix Wang *did* say initially that he is currently
developing ravetools so it's very reasonabl this is not yet a
CRAN package..

Best,
Martin

	[[alternative HTML version deleted]]

R devel - Oct 2021 - stats::fft produces inconsistent results

[Rd] stats::fft produces inconsistent results

[Rd] stats::fft produces inconsistent results