Andr?, I'm not an R core member, but happen to have looked a little bit at this issue myself.? I've seen similar things on Skylake and Coffee Lake 2 (9700, one generation past your latest) too.? I think it would make sense to have some handling of this, although I would want to show the trade-off with performance impacts on CPUs that are not affected by this, and on vectors that don't actually have NAs and similar.? I think the performance impact is likely to be small so long as branch prediction is active, but since branch prediction is involved you might need to check with different ratios of NAs (not for your NA bailout branch, but for e.g. interaction of what you add and the existing `na.rm=TRUE` logic). You'll also need to think of cases such as c(Inf, NA), c(NaN, NA), etc., which might complicate the logic a fair bit. Presumably the x87 FPU will remain common for a long time, but if there was reason to think otherwise, then the value of this becomes questionable. Either way, I would probably wait to see what R Core says. For reference this 2012 blog post[1] discusses some aspects of the issue, including that at least "historically" AMD was not affected. Since we're on the topic I want to point out that the default NA in R starts off as a signaling NA: ??? example(numToBits)?? # for `bitC` ??? bitC(NA_real_) ??? ## [1] 0 11111111111 | 0000000000000000000000000000000000000000011110100010 ??? bitC(NA_real_ + 0) ??? ## [1] 0 11111111111 | 1000000000000000000000000000000000000000011110100010 Notice the leading bit of the significant starts off as zero, which marks it as a signaling NA, but becomes 1, i.e. non-signaling, after any operation[2]. This is meaningful because the mere act of loading a signaling NA into the x87 FPU is sufficient to trigger the slowdowns, even if the NA is not actually used in arithmetic operations.? This happens sometimes under some optimization levels.? I don't now of any benefit of starting off with a signaling NA, especially since the encoding is lost pretty much as soon as it is used.? If folks are interested I can provide patch to turn the NA quiet by default. Best, B. [1]: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ [2]: https://en.wikipedia.org/wiki/NaN#Encoding> On Thursday, September 30, 2021, 06:52:59 AM EDT, GILLIBERT, Andre <andre.gillibert at chu-rouen.fr> wrote: > > Dear R developers, > > By default, R uses the "long double" data type to get extra precision for intermediate computations, with a small performance tradeoff. > > Unfortunately, on all Intel x86 computers I have ever seen, long doubles (implemented in the x87 FPU) are extremely slow whenever a special representation (NA, NaN or infinities) is used; probably because it triggers poorly optimized microcode in the CPU firmware. A function such as sum() becomes more than hundred times slower! > Test code: > a=runif(1e7);system.time(for(i in 1:100)sum(a)) > b=a;b[1]=NA;system.time(sum(b)) > > The slowdown factors are as follows on a few intel CPU: > > 1)????? Pentium Gold G5400 (Coffee Lake, 8th generation) with R 64 bits : 140 times slower with NA > > 2)????? Pentium G4400 (Skylake, 6th generation) with R 64 bits : 150 times slower with NA > > 3)????? Pentium G3220 (Haswell, 4th generation) with R 64 bits : 130 times slower with NA > > 4)????? Celeron J1900 (Atom Silvermont) with R 64 bits : 45 times slower with NA > > I do not have access to more recent Intel CPUs, but I doubt that it has improved much. > > Recent AMD CPUs have no significant slowdown. > There is no significant slowdown on Intel CPUs (more recent than Sandy Bridge) for 64 bits floating point calculations based on SSE2. Therefore, operators using doubles, such as '+' are unaffected. > > I do not know whether recent ARM CPUs have slowdowns on FP64... Maybe somebody can test. > > Since NAs are not rare in real-life, I think that it would worth an extra check in functions based on long doubles, such as sum(). The check for special representations do not necessarily have to be done at each iteration for cumulative functions. > If you are interested, I can write a bunch of patches to fix the main functions using long doubles: cumsum, cumprod, sum, prod, rowSums, colSums, matrix multiplication (matprod="internal"). > > What do you think of that? > > -- > Sincerely > Andr? GILLIBERT > >???? [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
GILLIBERT, Andre
2021-Sep-30 16:06 UTC
[Rd] Workaround very slow NAN/Infinities arithmetic?
Brodie Gaslam wrote:> Andr?,> I'm not an R core member, but happen to have looked a little bit at this > issue myself. I've seen similar things on Skylake and Coffee Lake 2 > (9700, one generation past your latest) too. I think it would make sense > to have some handling of this, although I would want to show the trade-off > with performance impacts on CPUs that are not affected by this, and on > vectors that don't actually have NAs and similar. I think the performance > impact is likely to be small so long as branch prediction is active, but > since branch prediction is involved you might need to check with different > ratios of NAs (not for your NA bailout branch, but for e.g. interaction > of what you add and the existing `na.rm=TRUE` logic).For operators such as '+', randomly placed NAs could slow down AMD processor due to many branch mispredictions. However, the functions using long doubles are mainly "cumulative" functions such as sum, mean, cumsum, etc. They can stop at the first NA found.> You'll also need to think of cases such as c(Inf, NA), c(NaN, NA), etc., > which might complicate the logic a fair bit.When an infinity is found, the rest of the vector must be searched for infinities of the opposite side or NAs. Mixing NaN and NAs has always been platform-dependent in R, but, on x87, it seems that the first NA/NaN met wins. Being consistent with that behavior is easy. I wrote a first patch for the sum() function, taking in account all those special +Inf/-Inf/NA/NaN cases. Without any NA, the sum() function was 1.6% slower with my first patch on a Celeron J1900. Not a big deal, in my opinion. However, I could easily compensate that loss (and much more) by improving the code. By splitting the na.rm=TRUE and na.rm=FALSE code paths, I could save a useless FP comparison (for NAN), a useless CMOV (for the 'updated' variable) and useless SSE->memory->FP87 moves (high latency). My new code is x1.75 times faster, for sum(big_vector_without_NAs, na.rm=FALSE). It is x1.35 times faster for sum(big_vector_without_NAs, na.rm=TRUE) Of course, it is much faster if there are any NA, because it stops at the first NA found. For infinities on AMD CPUs, it may not necessarily be faster. -- Sincerely Andr? GILLIBERT
iuke-tier@ey m@iii@g oii uiow@@edu
2021-Sep-30 17:24 UTC
[Rd] [External] Re: Workaround very slow NAN/Infinities arithmetic?
On Thu, 30 Sep 2021, brodie gaslam via R-devel wrote:> > Andr?, > > I'm not an R core member, but happen to have looked a little bit at this > issue myself.? I've seen similar things on Skylake and Coffee Lake 2 > (9700, one generation past your latest) too.? I think it would make sense > to have some handling of this, although I would want to show the trade-off > with performance impacts on CPUs that are not affected by this, and on > vectors that don't actually have NAs and similar.? I think the performance > impact is likely to be small so long as branch prediction is active, but > since branch prediction is involved you might need to check with different > ratios of NAs (not for your NA bailout branch, but for e.g. interaction > of what you add and the existing `na.rm=TRUE` logic).I would want to see realistic examples where this matters, not microbenchmarks, before thinking about complicating the code. Not all but most cases where sum(x) returns NaN/NA would eventually result in an error; getting to the error faster is not likely to be useful. My understanding is that arm64 does not support proper long doubles (they are the same as regular doubles). So code using long doubles isn't getting the hoped-for improved precision. Since that architecture is becoming more common we should probably be looking at replacing uses of long doubles with better algorithms that can work with regular doubles, e.g Kahan summation or variants for sum.> You'll also need to think of cases such as c(Inf, NA), c(NaN, NA), etc., > which might complicate the logic a fair bit. > > Presumably the x87 FPU will remain common for a long time, but if there > was reason to think otherwise, then the value of this becomes > questionable. > > Either way, I would probably wait to see what R Core says. > > For reference this 2012 blog post[1] discusses some aspects of the issue, > including that at least "historically" AMD was not affected. > > Since we're on the topic I want to point out that the default NA in R > starts off as a signaling NA: > > ??? example(numToBits)?? # for `bitC` > ??? bitC(NA_real_) > ??? ## [1] 0 11111111111 | 0000000000000000000000000000000000000000011110100010 > ??? bitC(NA_real_ + 0) > ??? ## [1] 0 11111111111 | 1000000000000000000000000000000000000000011110100010 > > Notice the leading bit of the significant starts off as zero, which marks > it as a signaling NA, but becomes 1, i.e. non-signaling, after any > operation[2]. > > This is meaningful because the mere act of loading a signaling NA into the > x87 FPU is sufficient to trigger the slowdowns, even if the NA is not > actually used in arithmetic operations.? This happens sometimes under some > optimization levels.? I don't now of any benefit of starting off with a > signaling NA, especially since the encoding is lost pretty much as soon as > it is used.? If folks are interested I can provide patch to turn the NA > quiet by default.In principle this might be a good idea, but the current bit pattern is unfortunately baked into a number of packages and documents on internals, as well as serialized objects. The work needed to sort that out is probably not worth the effort. It also doesn't seem to affect the performance issue here since setting b[1] <- NA_real_ + 0 produces the same slowdown (at least on my current Intel machine). Best, luke> > Best, > > B. > > [1]: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ > [2]: https://en.wikipedia.org/wiki/NaN#Encoding > > > > > >> On Thursday, September 30, 2021, 06:52:59 AM EDT, GILLIBERT, Andre <andre.gillibert at chu-rouen.fr> wrote: >> >> Dear R developers, >> >> By default, R uses the "long double" data type to get extra precision for intermediate computations, with a small performance tradeoff. >> >> Unfortunately, on all Intel x86 computers I have ever seen, long doubles (implemented in the x87 FPU) are extremely slow whenever a special representation (NA, NaN or infinities) is used; probably because it triggers poorly optimized microcode in the CPU firmware. A function such as sum() becomes more than hundred times slower! >> Test code: >> a=runif(1e7);system.time(for(i in 1:100)sum(a)) >> b=a;b[1]=NA;system.time(sum(b)) >> >> The slowdown factors are as follows on a few intel CPU: >> >> 1)????? Pentium Gold G5400 (Coffee Lake, 8th generation) with R 64 bits : 140 times slower with NA >> >> 2)????? Pentium G4400 (Skylake, 6th generation) with R 64 bits : 150 times slower with NA >> >> 3)????? Pentium G3220 (Haswell, 4th generation) with R 64 bits : 130 times slower with NA >> >> 4)????? Celeron J1900 (Atom Silvermont) with R 64 bits : 45 times slower with NA >> >> I do not have access to more recent Intel CPUs, but I doubt that it has improved much. >> >> Recent AMD CPUs have no significant slowdown. >> There is no significant slowdown on Intel CPUs (more recent than Sandy Bridge) for 64 bits floating point calculations based on SSE2. Therefore, operators using doubles, such as '+' are unaffected. >> >> I do not know whether recent ARM CPUs have slowdowns on FP64... Maybe somebody can test. >> >> Since NAs are not rare in real-life, I think that it would worth an extra check in functions based on long doubles, such as sum(). The check for special representations do not necessarily have to be done at each iteration for cumulative functions. >> If you are interested, I can write a bunch of patches to fix the main functions using long doubles: cumsum, cumprod, sum, prod, rowSums, colSums, matrix multiplication (matprod="internal"). >> >> What do you think of that? >> >> -- >> Sincerely >> Andr? GILLIBERT >> >> ???? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Brodie Gaslam
2021-Oct-01 16:07 UTC
[Rd] [External] Re: Workaround very slow NAN/Infinities arithmetic?
> On Thursday, September 30, 2021, 01:25:02 PM EDT, <luke-tierney at uiowa.edu> wrote: > >> On Thu, 30 Sep 2021, brodie gaslam via R-devel wrote: >> >> Andr?, >> >> I'm not an R core member, but happen to have looked a little bit at this >> issue myself.? I've seen similar things on Skylake and Coffee Lake 2 >> (9700, one generation past your latest) too.? I think it would make sense >> to have some handling of this, although I would want to show the trade-off >> with performance impacts on CPUs that are not affected by this, and on >> vectors that don't actually have NAs and similar.? I think the performance >> impact is likely to be small so long as branch prediction is active, but >> since branch prediction is involved you might need to check with different >> ratios of NAs (not for your NA bailout branch, but for e.g. interaction >> of what you add and the existing `na.rm=TRUE` logic). > > I would want to see realistic examples where this matters, not > microbenchmarks, before thinking about complicating the code. Not all > but most cases where sum(x) returns NaN/NA would eventually result in > an error; getting to the error faster is not likely to be useful.That's a very good point, and I'll admit I did not consider it sufficiently.? There are examples such as `rowSums`/`colSums` where some rows/columns evaluate to NA thus the result is still contains meaningful data.? By extension, any loop that applies `sum` to list elements where some might contain NAs, and others not.? `tapply` or any other group based aggregation come to mind.> My understanding is that arm64 does not support proper long doubles > (they are the same as regular doubles).Mine is the same.> So code using long doubles isn't getting the hoped-for improved > precision. Since that architecture is becoming more common we should > probably be looking at replacing uses of long doubles with better > algorithms that can work with regular doubles, e.g Kahan summation or > variants for sum.This is probably the bigger issue.? If the future is ARM/AMD, the value of Intel x87-only optimizations becomes questionable. More generally is the question of whether to completely replace long double with algorithmic precision methods, at a cost of performance on systems that do support hardware long doubles (Intel or otherwise), or whether both code pathways are kept and selected at compile time.? Or maybe the aggregation functions gain a low-precision flag for simple double precision accumulation. I'm curious to look at the performance and precision implications of e.g. Kahan summation if no one has done that yet.? Some quick poking around shows people using processor specific intrinsics to take advantage of advanced multi-element instructions, but I imagine R would not want to do that.? Assuming others have not done this already, I will have a look and report back.>> Since we're on the topic I want to point out that the default NA in R >> starts off as a signaling NA: >> >>???? example(numToBits)?? # for `bitC` >>???? bitC(NA_real_) >>???? ## [1] 0 11111111111 | 0000000000000000000000000000000000000000011110100010 >>???? bitC(NA_real_ + 0) >>???? ## [1] 0 11111111111 | 1000000000000000000000000000000000000000011110100010 >> >> Notice the leading bit of the significant starts off as zero, which marks >> it as a signaling NA, but becomes 1, i.e. non-signaling, after any >> operation[2]. >> >> This is meaningful because the mere act of loading a signaling NA into the >> x87 FPU is sufficient to trigger the slowdowns, even if the NA is not >> actually used in arithmetic operations.? This happens sometimes under some >> optimization levels.? I don't now of any benefit of starting off with a >> signaling NA, especially since the encoding is lost pretty much as soon as >> it is used.? If folks are interested I can provide patch to turn the NA >> quiet by default. > > In principle this might be a good idea, but the current bit pattern is > unfortunately baked into a number of packages and documents on > internals, as well as serialized objects. The work needed to sort that > out is probably not worth the effort.One reason why we might not need to sort this out is precisely the instability shown above.? Anything that relies on the signaling bit set to a particular value will behave differently with `NA_real_` vs `NA_real_ + x`.? `R_IsNA` only checks the lower word, so it doesn't care about the signaling bit or the 19 subsequent ones.? Anything that does likely has unpredictable behavior **currently**. Similarly, the documentation[1] only specifies the low word:> On such platforms NA is represented by the NaN value with low-word 0x7a2 > (1954 in decimal).This is consistent with the semantics of `R_IsNA`.> It also doesn't seem to affect the performance issue here since > setting b[1] <- NA_real_ + 0 produces the same slowdown (at least on > my current Intel machine).The subtlety here is that the slowdown happens by merely loading the signaling NaN onto the X87 FPU.? Consider the following: ??? x <- t(1:1e7+1L) ??? system.time(rowSums(x)) ??? ##??? user? system elapsed ??? ##?? 1.685?? 0.007?? 1.693 If we apply the following patch to make NA_REAL quiet: Index: src/main/arithmetic.c ==================================================================--- src/main/arithmetic.c?? ?(revision 80996) +++ src/main/arithmetic.c?? ?(working copy) @@ -117,7 +117,7 @@ ???? /* The gcc shipping with Fedora 9 gets this wrong without ????? * the volatile declaration. Thanks to Marc Schwartz. */ ???? volatile ieee_double x; -??? x.word[hw] = 0x7ff00000; +??? x.word[hw] = 0x7ff80000; ???? x.word[lw] = 1954; ???? return x.value; ?} We get a ~25x speedup: ??? x <- t(1:1e7+1L) ??? system.time(rowSums(x)) ??? ##??? user? system elapsed ??? ##?? 0.068?? 0.000?? 0.068 This despite no NAs anywhere in sight.? I observe the slow behavior on both Skylake, Coffee Lake 2 and others, across different OSes, so long as the -O2 compilation flag is used. This likely happens because the compiler tries to prepare for the `keepNA` branch in[1]: ?? ???? case INTSXP: ?? ???? { ?? ??? ?int *ix = INTEGER(x) + (R_xlen_t)n*j; ?? ??? ?for (cnt = 0, sum = 0., i = 0; i < n; i++, ix++) ?? ??? ???? if (*ix != NA_INTEGER) {cnt++; sum += *ix;} ?? ??? ???? else if (keepNA) {sum = NA_REAL; break;} ?? ??? ?break; ?? ???? } by loading an NA_REAL into the FPU.? At least that was my interpretation of the following machine code (dumped from src/main/array.o).? I barely understand ASM, but it looks like 9bd4 loads R_NaReal, and this happens before the test for NA at 9be8: ??????????? case LGLSXP:?? # looks like INTSXP/LGLSXP branches collapsed ??????????? { ??????????????? int *ix = LOGICAL(x) + (R_xlen_t)n * j; ??? 9bc8:?????? 4d 01 e3??????????????? add??? r11,r12 ??????????????? for (R_xlen_t i = 0; i < n; i++, ra++, ix++) ??? 9bcb:?????? 48 85 db??????????????? test?? rbx,rbx ??? 9bce:?????? 0f 8e fc fe ff ff?????? jle??? 9ad0 <do_colsum+0x260> ??????????????????? if (keepNA) { ??????????????????????? if (*ix != NA_LOGICAL) *ra += *ix; ??????????????????????? else *ra = NA_REAL; ??? 9bd4:?????? dd 05 00 00 00 00?????? fld??? QWORD PTR [rip+0x0]??????? # 9bda <do_colsum+0x36a> ??????????????????????? 9bd6: R_X86_64_PC32???? R_NaReal-0x4 ??? 9bda:?????? 48 8b 95 60 ff ff ff??? mov??? rdx,QWORD PTR [rbp-0xa0] ??????????????? for (R_xlen_t i = 0; i < n; i++, ra++, ix++) ??? 9be1:?????? 31 c0?????????????????? xor??? eax,eax ??? 9be3:?????? eb 2c?????????????????? jmp??? 9c11 <do_colsum+0x3a1> ??? 9be5:?????? 0f 1f 00??????????????? nop??? DWORD PTR [rax] ??????????????????????? if (*ix != NA_LOGICAL) *ra += *ix; ??? 9be8:?????? 45 39 d0??????????????? cmp??? r8d,r10d ??? 9beb:?????? 74 5b?????????????????? je???? 9c48 <do_colsum+0x3d8> ??? 9bed:?????? 44 89 85 78 ff ff ff??? mov??? DWORD PTR [rbp-0x88],r8d ??? 9bf4:?????? db 85 78 ff ff ff?????? fild?? DWORD PTR [rbp-0x88] ??? 9bfa:?????? db 2a?????????????????? fld??? TBYTE PTR [rdx] ??? 9bfc:?????? de c1?????????????????? faddp? st(1),st ??? 9bfe:?????? db 3a?????????????????? fstp?? TBYTE PTR [rdx] ... SNIP ??? 9c48:?????? db 3a?????????????????? fstp?? TBYTE PTR [rdx] ??? 9c4a:?????? db 2a?????????????????? fld??? TBYTE PTR [rdx] ??? 9c4c:?????? eb b2?????????????????? jmp??? 9c00 <do_colsum+0x390> If no NA_LOGICAL (NA_INTEGER) are encountered, the NaN is not used. However, it is always loaded, and for signaling NaNs this alone appears to switch the FPU to turtle mode. But more important than whether I can interpret ASM correctly (dubious), simply changing the NA_REAL value to be of the quiet variety dramatically improves performance of `rowSums` with integers. Ironically, this doesn't happen with the REALSXP branch because that one relies on NaN propagation in the FPU so doesn't load the NA_REAL for the early-break case when it encounters one.? Of course if it does encounter a NaN, quiet or not, we get a slowdown. Compiling with -O3 also fixes this. Bad performance of `rowSums` on integers alone is not really that big a deal given the output is numeric, and that's why I never reported this despite having known about it for a while.? But Andr?'s e-mail pushed me into saying something about it.? There is a risk that code relies on the full bit pattern of NA_REAL, but I think those are probably broken currently since the signaling bit is even more unstable than the general payload bits. Best, B. [1]: https://github.com/r-devel/r-svn/blob/6891db49680629427e3d5927053531b8fa5d8ee3/src/main/array.c#L1939 [2]: https://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values> Best, > > luke > > >> >> Best, >> >> B. >> >> [1]: https://randomascii.wordpress.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/ >> [2]: https://en.wikipedia.org/wiki/NaN#Encoding >> >> >> >> >> >>> On Thursday, September 30, 2021, 06:52:59 AM EDT, GILLIBERT, Andre <andre.gillibert at chu-rouen.fr> wrote: >>> >>> Dear R developers, >>> >>> By default, R uses the "long double" data type to get extra precision for intermediate computations, with a small performance tradeoff. >>> >>> Unfortunately, on all Intel x86 computers I have ever seen, long doubles (implemented in the x87 FPU) are extremely slow whenever a special representation (NA, NaN or infinities) is used; probably because it triggers poorly optimized microcode in the CPU firmware. A function such as sum() becomes more than hundred times slower! >>> Test code: >>> a=runif(1e7);system.time(for(i in 1:100)sum(a)) >>> b=a;b[1]=NA;system.time(sum(b)) >>> >>> The slowdown factors are as follows on a few intel CPU: >>> >>> 1)????? Pentium Gold G5400 (Coffee Lake, 8th generation) with R 64 bits : 140 times slower with NA >>> >>> 2)????? Pentium G4400 (Skylake, 6th generation) with R 64 bits : 150 times slower with NA >>> >>> 3)????? Pentium G3220 (Haswell, 4th generation) with R 64 bits : 130 times slower with NA >>> >>> 4)????? Celeron J1900 (Atom Silvermont) with R 64 bits : 45 times slower with NA >>> >>> I do not have access to more recent Intel CPUs, but I doubt that it has improved much. >>> >>> Recent AMD CPUs have no significant slowdown. >>> There is no significant slowdown on Intel CPUs (more recent than Sandy Bridge) for 64 bits floating point calculations based on SSE2. Therefore, operators using doubles, such as '+' are unaffected. >>> >>> I do not know whether recent ARM CPUs have slowdowns on FP64... Maybe somebody can test. >>> >>> Since NAs are not rare in real-life, I think that it would worth an extra check in functions based on long doubles, such as sum(). The check for special representations do not necessarily have to be done at each iteration for cumulative functions. >>> If you are interested, I can write a bunch of patches to fix the main functions using long doubles: cumsum, cumprod, sum, prod, rowSums, colSums, matrix multiplication (matprod="internal"). >>> >>> What do you think of that? >>> >>> -- >>> Sincerely >>> Andr? GILLIBERT >>> >>>????? [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa????????????????? Phone:??????????? 319-335-3386 > Department of Statistics and??????? Fax:????????????? 319-335-3017 >???? Actuarial Science > 241 Schaeffer Hall????????????????? email:? luke-tierney at uiowa.edu > Iowa City, IA 52242??????????????? WWW:? http://www.stat.uiowa.edu > >