thr3ads.net - R help - [R] R-Help [Jan 2012]

If this information is useful, please help other people find it:
Share via:

caam

2012-Jan-18 20:49 UTC

[R] R-Help

I am trying to create a frequency distribution and I am a bit confused.

Here are the commands I have entered:
> data <-
read.csv(file="40609_sortedfinal.csv",head=TRUE,sep=",")
> NumberOfActionsByStatus = data$STATUS
> NumberOfActionsByUser = data$ETS_LOGIN
> NumberOfBidOffer = data$BID_OFFER
> NumberOfActionsByUser.freq = table(NumberOfActionsByUser)
> NumberOfBidOffer.freq = table(NumberOfBidOffer)
> NumberOfActionsByStatus.freq = table(NumberOfActionsByStatus)

After doing that above, I was able to see the freq. distribution of contents
in each.
> cbind(NumberOfBidOffer.freq)      NumberOfBidOffer.freq
Bid                  178074
Offer                179540
> cbind(NumberOfActionsByUser.freq)             NumberOfActionsByUser.freq
A                             22
B                               2
C                            29028
D                        328054
E                        187
F                               7
G                             2
H                               4
I                              308

............................

Now, what I am having a problem with is, I want to then output a freq
distribution that shows out of the 22 actions by A.. how many of those
actions were BID and how many of those were OFFER.

I would like to do that for each user; ie.  A->F

What is the best command to do this? 

Any help would be appreciated,

Best,

SA


--
View this message in context:
http://r.789695.n4.nabble.com/R-Help-tp4308052p4308052.html
Sent from the R help mailing list archive at Nabble.com.

Sarah Goslee

2012-Jan-18 22:01 UTC

head link

[R] R-Help

A reproducible example would be really helpful here, but I think what
you want is:
table(NumberOfActionsByUser, NumberOfBidOffer)

On Wed, Jan 18, 2012 at 3:49 PM, caam <shawn.akhter at gmail.com>
wrote:> I am trying to create a frequency distribution and I am a bit confused.
>
> Here are the commands I have entered:
>
>> data <-
read.csv(file="40609_sortedfinal.csv",head=TRUE,sep=",")
>> NumberOfActionsByStatus = data$STATUS
>> NumberOfActionsByUser = data$ETS_LOGIN
>> NumberOfBidOffer = data$BID_OFFER
>> NumberOfActionsByUser.freq = table(NumberOfActionsByUser)
>> NumberOfBidOffer.freq = table(NumberOfBidOffer)
>> NumberOfActionsByStatus.freq = table(NumberOfActionsByStatus)
>
>
> After doing that above, I was able to see the freq. distribution of
contents
> in each.
>
>> cbind(NumberOfBidOffer.freq)
> ? ? ?NumberOfBidOffer.freq
> Bid ? ? ? ? ? ? ? ? ?178074
> Offer ? ? ? ? ? ? ? ?179540
>
>> cbind(NumberOfActionsByUser.freq)
> ? ? ? ? ? ? NumberOfActionsByUser.freq
> A ? ? ? ? ? ? ? ? ? ? ? ? ? ? 22
> B ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2
> C ? ? ? ? ? ? ? ? ? ? ? ? ? ?29028
> D ? ? ? ? ? ? ? ? ? ? ? ?328054
> E ? ? ? ? ? ? ? ? ? ? ? ?187
> F ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7
> G ? ? ? ? ? ? ? ? ? ? ? ? ? ? 2
> H ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 4
> I ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?308
>
> ............................
>
> Now, what I am having a problem with is, I want to then output a freq
> distribution that shows out of the 22 actions by A.. how many of those
> actions were BID and how many of those were OFFER.
>
> I would like to do that for each user; ie. ?A->F
>
> What is the best command to do this?
>
> Any help would be appreciated,
>
> Best,
>
> SA
>
>
-- 
Sarah Goslee
http://www.functionaldiversity.org

Christopher Kelvin

2012-Jan-27 11:35 UTC

head link

[R] R-help

Hello,
How can i do simulation with a weibull distribution after i have generated data
with the distribution,
for example; if i generate x=rweibull(50,shape=0.8,scale=2) and i want to
simulate this data 1000 times so that i can use it to estimate
the parameters.
thank you


________________________________
 From: "r-help-request@r-project.org"
<r-help-request@r-project.org>
To: r-help@r-project.org 
Sent: Friday, January 27, 2012 7:00 PM
Subject: R-help Digest, Vol 107, Issue 27

Send R-help mailing list submissions to
    r-help@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
    r-help-request@r-project.org

You can reach the person managing the list at
    r-help-owner@r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-help digest..."


Today's Topics:

   1. extracting from data.frames for survival analysis
      (Philip Robinson)
   2. Re: extracting from data.frames for survival analysis
      (Gerrit Eichner)
   3. Finding suspicious data points? (Fredrik Karlsson)
   4. Re: (no subject) (peter dalgaard)
   5. Re: Finding suspicious data points? (Carl Witthoft)
   6. Re: How do I use the cut function to assign specific cut
      points? (Frank Harrell)
   7. Re: Extracting data from trellis object (Jean V Adams)
   8. Conditional Random Fields (Ana)
   9. Request for help on manipulation large data sets (Chee Chen)
  10. Re: R.lnk shortcut: Warning message when opening Microsoft
      Word 2003 (Duncan Murdoch)
  11. Re: R extracting regression coefficients from multiple
      regressions using lapply command (Jean V Adams)
  12. Inserting a character into a character string XXXX (Dan Abner)
  13. null distribution of binom.test p values (Chris Wallace)
  14. eRm package - Rasch simulation (James Holland)
  15. eRm - Rasch modeling - First item missing from estimation
      (James Holland)
  16. How do I compare a multiple staged response with
      multivariables to a Development Index? (Jhope)
  17. Re: Inserting a character into a character string XXXX
      (Gerrit Eichner)
  18. Re: R-help Digest, Vol 107, Issue 25 (Paul Miller)
  19. Re: Inserting a character into a character string XXXX
      (William Dunlap)
  20. Checking for invalid dates: Code works but needs improvement
      (Paul Miller)
  21. Re: Coloring Canada provinces (package maps?)
      (Dimitri Liakhovitski)
  22. Re: null distribution of binom.test p values (Greg Snow)
  23. Error in ifelse(append, "a", "w") : , (list) object
cannot be
      coerced to type 'logical' (Juan Antonio Balbuena)
  24. adding additional information to histogram (Raphael Bauduin)
  25. Re: function for grouping (yan)
  26. Re: Error in ifelse(append, "a", "w") : ,    (list)
object
      cannot be coerced to type 'logical' (David Winsemius)
  27. lattice panels with grouped extra data as text? (Rainer Hurling)
  28. Re: eRm package - Rasch simulation (Achim Zeileis)
  29. Re: List to Array: How to establish the dimension of the
      array (Brian Diggs)
  30. Re: eRm - Rasch modeling - First item missing from estimation
      (Achim Zeileis)
  31. Re: null distribution of binom.test p values (Greg Snow)
  32. Re: null distribution of binom.test p values (Chris Wallace)
  33. Re: Issues with PearsonIV distribution
      (gaurang.jitendrabhai@aviva.com)
  34. (no subject) (Harish Eswaran)
  35. Re: Inserting a character into a character string XXXX (Dan Abner)
  36. Re: aplpy recursive function on a list (Brian Diggs)
  37. Re: Coloring Canada provinces (package maps?) (Barry Rowlingson)
  38. Re: (no subject) (R. Michael Weylandt)
  39. Re: Inserting a character into a character string XXXX
      (William Dunlap)
  40. Re: Coloring Canada provinces (package maps?)
      (Dimitri Liakhovitski)
  41. Re: How do I compare a multiple staged response with
      multivariables to a Development Index? (R. Michael Weylandt)
  42. Re: How do I compare 47 GLM models with 1 to 5 interactions
      and    unique combinations? (Frank Harrell)
  43. Re: function for grouping (Petr Savicky)
  44. Re: Inserting a character into a character string XXXX (Dan Abner)
  45. Re: Checking for invalid dates: Code works but needs
      improvement (Marc Schwartz)
  46. Re: Checking for invalid dates: Code works but needs
      improvement (Gabor Grothendieck)
  47. Re: aplpy recursive function on a list (Berend Hasselman)
  48. What does [[1]] mean? (Ajay Askoolum)
  49. Re: aplpy recursive function on a list (Berend Hasselman)
  50. Re: What does [[1]] mean? (Greg Snow)
  51. merge multiple data frames (maxbre)
  52. 3-parametric Weibull regression (Pfrengle Andreas (GS-SI/ENX7))
  53. Re: merge multiple data frames (R. Michael Weylandt)
  54. Re: What does [[1]] mean? (Patrick Burns)
  55. Re: null distribution of binom.test p values (Thomas Lumley)
  56. Re: aplpy recursive function on a list (Brian Diggs)
  57. Calculate a function repeatedly over sections of a ts object
      (Jorge Molinos)
  58. Quality of fit statistics for NLS? (Max Brondfield)
  59. Re: Why was the ?doSMP? package removed from CRAN? (Tal Galili)
  60. Re: Calculate a function repeatedly over sections of a ts
      object (R. Michael Weylandt)
  61. Re: Calculate a function repeatedly over sections of a ts
      object (Gabor Grothendieck)
  62. Re: merge multiple data frames (maxbre)
  63. Re: How do I compare 47 GLM models with 1 to 5 interactions
      and    unique combinations? (Jhope)
  64. Re: How to remove rows representing concurrent sessions    from
      data.frame? (Jean V Adams)
  65. Re: Quality of fit statistics for NLS? (Bert Gunter)
  66. Re: How do I compare 47 GLM models with 1 to 5 interactions
      and unique combinations? (Bert Gunter)
  67. Re: warning In fun(...) : no DISPLAY variable so Tk is not
      available (sjkimble)
  68. Re: function for grouping (Petr Savicky)
  69. Conditional cumulative sum (Axel Urbiz)
  70. Re: Conditional cumulative sum (Pete Brecknock)
  71. Re: Checking for invalid dates: Code works but needs
      improvement (Rui Barradas)
  72. Is there a R command for testing the difference of two liear
      regressions? (Michael)
  73. Re: Is there a R command for testing the difference of two
      liear regressions? (Mark Leeds)
  74. multiple column comparison (ryanfuller)
  75. Re: Clusters: How to build a "Amalgamation Schedule"
      (sequence of jointing )? (womak)
  76. Placing a Shaded Box on a Plot (Stephanie Cooke)
  77. Re: How do I compare 47 GLM models with 1 to 5 interactions
      and    unique combinations? (Rub?n Roa)
  78. Re: merge multiple data frames (Massimo Bressan)
  79. Re: adding additional information to histogram (Jim Lemon)
  80. Call dynamic functions (deivit)
  81. Re: Quality of fit statistics for NLS? (peter dalgaard)
  82. laod multichannel-audio-files with readWave (tuneR) (Alex Hofmann)
  83. Re: Call dynamic functions (deivit)
  84. generate a random number with rexp ? (Adel ESSAFI)


----------------------------------------------------------------------

Message: 1
Date: Thu, 26 Jan 2012 21:47:13 +1000
From: Philip Robinson <philip.c.robinson@gmail.com>
To: r-help@r-project.org
Subject: [R] extracting from data.frames for survival analysis
Message-ID:
    <CAMMiRhzuAStTd1EsZy7ecnxUui8PuBWngh=PTbwnXekUH-TiJA@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I have a data frame:
> class(B27.vec)[1] "data.frame"
> head(B27.vec)
  AGE Gend B27 AgeOn DD uveitis psoriasis IBD CD UC InI BASDAI BASFI Smok UV
1  57    1   1    19 38       2         1   1  1  1   1   5.40  8.08   NA  1
2  35    1   1    33  2       2         1   1  1  1   1   1.69  2.28   NA  1
3  49    2   1    40  9       1         1   1  1  1   1   8.30  9.40   NA  0
4  32    1   1    21 11       1         1   1  1  1   1   5.10  9.10   NA  0
5  31    1   1    24  7       1         1   1  1  1   1   6.63  6.52   NA  0
6  27    1   1    23  4       1         2   1  1  1   1   7.19  6.51   NA  0

I am trying to perform survival analysis but continually get errors
when extracting from this data.frame:

attempt 1:> X <- Surv(B27.vec$AgeOn,B27.vec$UV)
> survdiff(X,rho=0,data=uvf)Error in x$terms : $ operator is invalid for atomic vectors

attempt 2:> X <- Surv(B27.vec[,4],B27.vec[,15])
> survdiff(X,rho=0,data=uvf)Error in x$terms : $ operator is invalid for atomic vector

attempt 3:> AO <- B27.vec[["AgeOn", exact = TRUE]]
> UV <- B27.vec[["UV",exact=TRUE]]
> X <- Surv(AO,UV)
> survdiff(X,rho=0,data=uvf)Error in x$terms : $ operator is invalid for atomic vectors

I have read ?data.frame & extract.data.frame but I cannot understand
how I might structure this differently so it extracts the required
columns from this dataframe. For the second 2 attempts I am not using
the $ term. Sorry if this seems basic but cannot understand why
attempt 1 or 2 doesn't work.

thanks
Philip



------------------------------

Message: 2
Date: Thu, 26 Jan 2012 12:58:39 +0100 (MET)
From: Gerrit Eichner <Gerrit.Eichner@math.uni-giessen.de>
To: Philip Robinson <philip.c.robinson@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] extracting from data.frames for survival analysis
Message-ID:
    <Pine.SOC.4.64.1201261253340.26881@solcom.hrz.uni-giessen.de>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

Hi, Philip,

counter-questions:

1. Which/where is the grouping variable for the test of differences in 
survival?

2. Assume the grouping variable is Gend in B27.vec. Then, why aren't you 
using

survdiff( Surv( AgeOn, UV) ~ Gend, rho = 0, data = B27.vec)

?

Hth  -- Gerrit

On Thu, 26 Jan 2012, Philip Robinson wrote:
> Hi,
>
> I have a data frame:
>
>> class(B27.vec)
> [1] "data.frame"
>
>> head(B27.vec)
>
>  AGE Gend B27 AgeOn DD uveitis psoriasis IBD CD UC InI BASDAI BASFI Smok UV
> 1  57    1   1    19 38       2         1   1  1  1   1   5.40  8.08   NA 
1
> 2  35    1   1    33  2       2         1   1  1  1   1   1.69  2.28   NA 
1
> 3  49    2   1    40  9       1         1   1  1  1   1   8.30  9.40   NA 
0
> 4  32    1   1    21 11       1         1   1  1  1   1   5.10  9.10   NA 
0
> 5  31    1   1    24  7       1         1   1  1  1   1   6.63  6.52   NA 
0
> 6  27    1   1    23  4       1         2   1  1  1   1   7.19  6.51   NA 
0
>
> I am trying to perform survival analysis but continually get errors
> when extracting from this data.frame:
>
> attempt 1:
>> X <- Surv(B27.vec$AgeOn,B27.vec$UV)
>> survdiff(X,rho=0,data=uvf)
> Error in x$terms : $ operator is invalid for atomic vectors
>
> attempt 2:
>> X <- Surv(B27.vec[,4],B27.vec[,15])
>> survdiff(X,rho=0,data=uvf)
> Error in x$terms : $ operator is invalid for atomic vector
>
> attempt 3:
>> AO <- B27.vec[["AgeOn", exact = TRUE]]
>> UV <- B27.vec[["UV",exact=TRUE]]
>> X <- Surv(AO,UV)
>> survdiff(X,rho=0,data=uvf)
> Error in x$terms : $ operator is invalid for atomic vectors
>
> I have read ?data.frame & extract.data.frame but I cannot understand
> how I might structure this differently so it extracts the required
> columns from this dataframe. For the second 2 attempts I am not using
> the $ term. Sorry if this seems basic but cannot understand why
> attempt 1 or 2 doesn't work.
>
> thanks
> Philip
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


------------------------------

Message: 3
Date: Thu, 26 Jan 2012 13:27:49 +0100
From: Fredrik Karlsson <dargosch@gmail.com>
To: R-Help List <r-help@stat.math.ethz.ch>
Subject: [R] Finding suspicious data points?
Message-ID:
    <CANO=ohJMe1qGoWA59cQJ5KNwkMunGbHqXE_pcEwU5f7YZyA-dA@mail.gmail.com>
Content-Type: text/plain

Dear list,

I would like to find data points that at least should be checked one more
time before I process them further.
I've had a look at the outliers package for this, and the outliers function
in that package, but it appears to only return one value.

An example:
> outlier(c(1:3,rnorm(1000,mean=100000,sd=300)))[1] 1

I think at least 1,2 and 3 should be checked in this case.


Any ideas on how to achieve this in R?

Actually, the real data I will be investigating consist of vector norms and
angles (in an attempt to identify either very short, very long vectors, or
vectors pointing in an odd angle for the category to which it has been
assigned) so a 2D method would be even better.

I would much appreciate any help I can get on this,

/Fredrik



-- 
"Life is like a trumpet - if you don't put anything into it, you
don't get
anything out of it."

    [[alternative HTML version deleted]]



------------------------------

Message: 4
Date: Thu, 26 Jan 2012 14:33:19 +0100
From: peter dalgaard <pdalgd@gmail.com>
To: Paul Johnston <paul.johnston@manchester.ac.uk>
Cc: "R-help@lists.R-project.org" <R-help@r-project.org>
Subject: Re: [R] (no subject)
Message-ID: <11D29C69-EFE5-4C01-8C6C-B231E13B19BF@gmail.com>
Content-Type: text/plain; charset=us-ascii

That's not how to do it. Please follow the link in the footer and the
correct way to subscribe should become clear.

-pd


On Jan 25, 2012, at 21:49 , Paul Johnston wrote:
> subscribe
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes@cbs.dk  Priv: PDalgd@gmail.com



------------------------------

Message: 5
Date: Thu, 26 Jan 2012 09:00:13 -0500
From: Carl Witthoft <carl@witthoft.com>
To: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] Finding suspicious data points?
Message-ID: <4F215C6D.8070003@witthoft.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

According to the help file for 'outlier'  ,  (quoting)

x a data sample, vector in most cases. If argument is a dataframe, then 
outlier is
calculated for each column by sapply. The same behavior is applied by apply
when the matrix is given.  (endquote)

Looks like you could create a matrix that looks like an "upper 
triangular" like

1    1      1
NA    2    2
NA    NA    3

and see the results.  However, since 'outlier' just returns the value
furthest from the mean, this doesn't really provide much information. 
If I were to write a function to find "genuine" outliers,  I would do
something like

x[ abs(x-mean(x)) > 3*sd(x)] , thus returning all values more than 
3-sigma from the mean.



<quote>

I would like to find data points that at least should be checked one 
more time before I process them further.
I've had a look at the outliers package for this, and the outliers 
function in that package, but it appears to only return one value.

An example:
> outlier(c(1:3,rnorm(1000,mean=100000,sd=300)))[1] 1

I think at least 1,2 and 3 should be checked in this case.

Any ideas on how to achieve this in R?

Actually, the real data I will be investigating consist of vector norms 
and angles (in an attempt to identify either very short, very long 
vectors, or vectors pointing in an odd angle for the category to which 
it has been assigned) so a 2D method would be even better.

I would much appreciate any help I can get on this,


-- 

Sent from my Cray XK6
"Pendeo-navem mei anguillae plena est."



------------------------------

Message: 6
Date: Thu, 26 Jan 2012 06:02:12 -0800 (PST)
From: Frank Harrell <f.harrell@vanderbilt.edu>
To: r-help@r-project.org
Subject: Re: [R] How do I use the cut function to assign specific cut
    points?
Message-ID: <1327586532134-4330380.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

It is not valid to categorize BMI.  This will result in major loss of
information and residual confounding.  Plus there is huge heterogeneity in
the BMI >= 30 group.   Details are at
http://biostat.mc.vanderbilt.edu/CatContinuous and see these articles:

@Article{fil07cat,
  author =          {Filardo, Giovanni and Hamilton, Cody and Hamman, Baron and
Ng, Hon K. T. and Grayburn, Paul},
  title =          {Categorizing {BMI} may lead to biased results in studies
investigating in-hospital mortality after isolated {CABG}},
  journal =      J Clin Epi,
  year =          2007,
  volume =      60,
  pages =      {1132-1139},
  annote =      {BMI;CABG;surgical adverse events;hospital
mortality;epidemiology;smoothing methods;categorization;categorizing
continuous variables;investigators should waive categorization entirely and
use smoothed functions for continuous variables;examples of non-monotonic
relationships}
}
@Article{roy06dic,
  author =          {Royston, Patrick and Altman, Douglas G. and
Sauerbrei, Willi},
  title =          {Dichotomizing continuous predictors in multiple
regression: a bad idea},
  journal =      Stat in Med,
  year =          2006,
  volume =         25,
  pages =         {127-141},
  annote =         {continuous
covariates;dichotomization;categorization;regression;efficiency;clinical
research;residual confounding;destruction of statistical inference
when cutpoints are chosen using the response variable;varying effect
estimates when change cutpoints;difficult to interpret effects
when dichotomize;nice plot showing effect of categorization;PBC data}
}

If you work with colleagues who tell you "this is the way it's
done"  don't
go down without a fight.  In general, good statistical practice dictates
that categorization is only done for producing certain tables (for which
case you might use the cut2 function in the Hmisc package).  Even that will
change as we incorporate more micrographics (think of loess plots with BMI
on the x-axis) within table cells as is now done in the Hmisc
summary.formula function for purely categorical variables.

Frank


citadel wrote> 
> I am new to R, and I am trying to cut a continuous variable BMI into
> different categories and can't figure out how to use it. I would like
to
> cut it into four groups: <20, 20-25, 25-30 and >= 30.  I am having
> difficulty figuring the code for <20 and >=30? Please help. Thank
you.
> 
-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4330380.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 7
Date: Thu, 26 Jan 2012 08:19:26 -0600
From: Jean V Adams <jvadams@usgs.gov>
To: tsippel <tsippel@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Extracting data from trellis object
Message-ID:
    <OF9FA6B00F.6DDAE8DD-ON86257991.004E2213-86257991.004EB0C1@usgs.gov>
Content-Type: text/plain

tsippel wrote on 01/25/2012 01:23:05 PM:
> Simple question (I hope?).  How does one extract the data from a trellis
> object? I need to get the data from a call to histogram().
> 
> A simple example below...
> 
> dat<-data.frame(year=as.factor(round(runif(500, 1990, 2010))), 
x=rnorm(500,> 75, 35),
>           y=rep(seq(1,50,by=1), times=10))
> histogram(data=dat, y~x | year, type='count')
> 
> How do I get the actual numbers plotted in the histograms from this call
to> histogram()?
> 
> Thanks,
> 
> Tim

hist.stuff <- histogram(data=dat, y~x | year, type='count')
lapply(hist.stuff, print)

Jean
    [[alternative HTML version deleted]]



------------------------------

Message: 8
Date: Thu, 26 Jan 2012 15:19:44 +0100
From: Ana <rrasterr@gmail.com>
To: "r-help@r-project.org" <r-help@r-project.org>
Subject: [R] Conditional Random Fields
Message-ID:
    <CAErHZW25xoztTboq8Q=R6-O5=g5qz27Aux98=yj-BZ4wJq037w@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Are there other packages besides CRF available on R, for Conditional
Random Fields ?

Thanks in advance.



------------------------------

Message: 9
Date: Thu, 26 Jan 2012 09:23:53 -0500

To: "R-ORG" <r-help@r-project.org>
Subject: [R] Request for help on manipulation large data sets
Message-ID: <E975FF9C525843E0A09995344CD68314@XbiT>
Content-Type: text/plain

Dear All,
I would like to ask for help on how to read different files automatically and do
analysis using scripts.

1. Description of the data
1.1.  there are 5 text files, each of which contains cleaned data for the same
100 SNPs.  Observations (e.g., position on gnome, alelle type, ...) for SNPs 
are rows ordered by the SNP numbers,
1.2.  there are 1 text file, containing the expression level of mRNAs 9 (and
other information), which are rows ordered by mRNA numbers,

So for each SNP the sample size is 5.

2.  Description of aim
Take SNP 1 and mRNA 1 for example. Write a scrip that  can
2.1 extract  row 1 from each of the 5 text files for SNP data
2.2 extract row 1 from the mRNA text file
2.3  regress mRNA 1 counts on SNP 1
2.4 store regression results, SNP number, mRNA number and their positions on
gnome

3.  key thing I need help
automatic extraction of observations for one particular SNP from SNP data files,
since there are actually 400 SNP data files, or more generally extract
information automatically from different files and store it properly for
operation.

Thanks a lot!
Chee
    [[alternative HTML version deleted]]



------------------------------

Message: 10
Date: Thu, 26 Jan 2012 09:36:20 -0500
From: Duncan Murdoch <murdoch.duncan@gmail.com>
To: Grant Anderson <gndrsn@comcast.net>
Cc: r-help@r-project.org
Subject: Re: [R] R.lnk shortcut: Warning message when opening
    Microsoft Word 2003
Message-ID: <4F2164E4.6070705@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 12-01-25 3:46 PM, Grant Anderson wrote:>
>
> Whenever I open Microsoft Word (2003), I get this warning two times---and
then? must manually 'OK' it? twice before I can proceed in Word (a pain
to do):
>
> ?
>
> "The drive or network connection that the shortcut 'R.lnk'
refers to is unavailable. Make sure that the disk is properly inserted or the
network resource is available, and then try again."
>
> ?
>
> Perhaps I? linked Word with R, somehow,? when I recently installed R and
Word on my new computer???
>
> ?
>
> Anyhow, since I don't want any Word-R interaction, how can I get rid of
these two annoying warnings? I've already searched my entire hard drive for
'R.lnk' (using X1.com) and found only those R-links I understand and
want.
This is clearly a Microsoft bug, so you should ask them.

Duncan Murdoch



------------------------------

Message: 11
Date: Thu, 26 Jan 2012 08:42:36 -0600
From: Jean V Adams <jvadams@usgs.gov>
To: Michael Wither <michael.j.wither@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] R extracting regression coefficients from multiple
    regressions using lapply command
Message-ID:
    <OF202DC839.87458C92-ON86257991.005066C4-86257991.0050CF71@usgs.gov>
Content-Type: text/plain

Michael Wither wrote on 01/26/2012 12:08:19 AM:
> Hi, I have a question about running multiple in regressions in R and 
then> storing the coefficients.  I have a large dataset with several 
variables,> one of which is a state variable, coded 1-50 for each state. I'd like
to
> run a regression of 28 select variables on the remaining 27 variables of
> the dataset (there are 55 variables total), and specific for each state,
ie> run a regression of variable1 on covariate1, covariate2, ..., 
covariate27> for observations where state==1. I'd then like to repeat this for 
variable1> for states 2-50, and the repeat the whole process for variable2,
> variable3,..., variable28. I think I've written the correct R code to
do
> this, but the next thing I'd like to do is extract the coefficients,
> ideally into a coefficient matrix. Could someone please help me with 
this?> Here's the code I've written so far, I'm not sure if this is
the best
way> to do this.  Please help me.
> 
> for (num in 1:50) {
> 
>     #PUF is the data set I'm using
> 
> #Subset the data by states
> PUFnum <- subset(PUF, state==num)
> 
> #Attach data set with state specific data
> attach(PUFnum)
> 
> #Run our prediction regression
>     #the variables class1 through e19700 are the 27 covariates I want to 
use> regression <- lapply(PUFnum,  function(z) lm(z ~
> class1+class2+class3+class4+class5+class6+class7+xtot+e00200+e00300
> +e00600+e00900+e01000+p04470+e04800+e09600+e07180+e07220+e07260
> +e06500+e10300+e59720+e11900+e18425+e18450+e18500+e19700))
> 
> Beta <- lapply(regression, function(d) d<- coef(regression$d))
> 
>  detach(PUFnum)
> 
> }
> Thanks,
> Mike

This should help you get started.

# You don't provide any sample data, so I made some up myself
nstates <- 5
nobs <- 30
nys <- 3
nxs <- 4
PUF <- data.frame(matrix(rnorm(nstates*nobs*(nys+nxs)), nrow=nstates*nobs, 

        dimnames=list(NULL, c(paste("Y", 1:nys, sep=""),
paste("X", 1:nxs,
sep="")))))
PUF$state <- rep(1:nstates, nobs)
head(PUF)

# create a character vector of all your covariate names
# separated by a plus sign
# this will serve as the right half of your regression equations
covariates <- paste(names(PUF)[nys + (1:nxs)], collapse=" + ")

# create an empty array to be filled with coefficients
coefs <- array(NA, dim=c(nstates, nys, nxs+1))

# fill the array with coefficients
# this will work for you if the first 28 columns of your PUF
# data frame are the response variables
for(i in 1:nstates) {
for(j in 1:nys) {
        coefs[i, j, ] <- lm(formula(paste(names(PUF)[j], covariates,
sep="
~ ")),
                data=PUF[PUF$state==i, ])$coef
        }}
coefs

Jean
    [[alternative HTML version deleted]]



------------------------------

Message: 12
Date: Thu, 26 Jan 2012 09:49:49 -0500
From: Dan Abner <dan.abner99@gmail.com>
To: r-help@r-project.org
Subject: [R] Inserting a character into a character string XXXX
Message-ID:
    <CAPRGo-nnMXA1UQOnBoyfhmeaoUyH2oUDp00PmiXOQK2wF2-WAQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hello everyone,

I have a character vector of 24 hour time values in the format hm
without the delimiting ":". How can I insert the ":"
immediately to
the left of the second digit from the right?

mytimes<-scan(what="")
1457
1457
1310
1158
137
1855


Thanks!

Dan



------------------------------

Message: 13
Date: Thu, 26 Jan 2012 12:43:47 +0000
From: Chris Wallace <chris.wallace@cimr.cam.ac.uk>
To: r-help@r-project.org
Subject: [R] null distribution of binom.test p values
Message-ID: <4F214A83.3090406@cimr.cam.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear R-help,

I must be missing something very obvious, but I am confused as to why 
the null distribution for p values generated by binom.test() appears to 
be non-uniform.  The histogram generated below has a trend towards 
values closer to 1 than 0.  I expected it to be flat.

hist(sapply(1:1000, function(i,n=100) 
binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value))

This trend is more pronounced for small n, and the distribution appears 
uniform for larger n, say n=1000.  I had expected the distribution to be 
discrete for small n, but not skewed.  Can anyone explain why?

Many thanks,

Chris.



------------------------------

Message: 14
Date: Thu, 26 Jan 2012 07:27:46 -0600
From: James Holland <holland.aggie@gmail.com>
To: R-help@r-project.org
Subject: [R] eRm package - Rasch simulation
Message-ID:
    <CABdaQ+t9MCRtxOf=OZ5H64wuPQkArckLGz_1y5-r_D7RSPRZMQ@mail.gmail.com>
Content-Type: text/plain

When I try to create a Rasch simulation of data using the sim.rasch
function, I get more items than I intend

#My code

library(eRm)

#Number of items
k <- 20

#Number of participants
n <- 100


#Create Rasch Data
#sim.rasch(persons, items, seed = NULL, cutpoint = "randomized")

r.simulation <- sim.rasch(n,k)

I end up with 20 participants and 100 items, but the instructions say
otherwise (and I believe I have the most up to date).  I reverse the n and
k

sim.rasch(k,n); it works out to how I intended; 100 participants and 20
items.


Can someone explain to me what I'm missing?  I've read the package
instructions and didn't see myself missing something (inputting vectors
instead of an integer).

James

    [[alternative HTML version deleted]]



------------------------------

Message: 15
Date: Thu, 26 Jan 2012 07:51:24 -0600
From: James Holland <holland.aggie@gmail.com>
To: r-help@r-project.org
Subject: [R] eRm - Rasch modeling - First item missing from estimation
Message-ID:
    <CABdaQ+sjMe=rFHc5xrV9GK=K2jTQLBuJiK+f==T3HCk-MnjMnA@mail.gmail.com>
Content-Type: text/plain

I'm trying to kick off some of the rust and learn some of the R packages
for Rasch modeling.  When I tried using the eRm package, I get item
difficulty estimates for all my items accept the first (in terms of order)
item.

#Begin code

library(eRm)

r.simulation <- sim.rasch(20,100)


r.data <- r.simulation$items

#eRm results

erm.rasch <- RM(r.data)
names(erm.rasch)

erm.items <- erm.rasch$etapar

erm.items


#end code


All items accept V1 (the first item in the list) show up.


James

    [[alternative HTML version deleted]]



------------------------------

Message: 16
Date: Thu, 26 Jan 2012 01:37:25 -0800 (PST)
From: Jhope <jeanwaijang@gmail.com>
To: r-help@r-project.org
Subject: [R] How do I compare a multiple staged response with
    multivariables to a Development Index?
Message-ID: <1327570645667-4329909.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

Hi R- listeners,

I should add that I would like also to compare my field data to an index
model. The index was created by using the following script:

devel.index <- function(values, weights=c(1, 2, 3, 4, 5, 6)) {
  foo <- values*weights
  return(apply(foo, 1, sum) / apply(values, 1, sum))
}

Background:
Surveyed turtle egg embryos have been categorized into 6 stages of
development in the field. The stages in the field data are named ST0, ST1,
ST2, ST3, ST4, Shells. from the data = data.to.analyze. 

Q?
1. What is the best way to analyze the field data on embryonic development
of 6 stages? 
2. Doing this while considering, testing the variables: Veg, HTL,
Aeventexhumed, Sector, Rayos, TotalEggs?
3. And then compare the results to a development index.

The goal is to determine hatching success in various areas of the beach. And
try to create a development index of these microenvironments. Seasonality
would play a key role. Is this possible?

Many thanks!
Saludos, Jean



--
View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 17
Date: Thu, 26 Jan 2012 16:00:41 +0100 (MET)
From: Gerrit Eichner <Gerrit.Eichner@math.uni-giessen.de>
To: Dan Abner <dan.abner99@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Inserting a character into a character string XXXX
Message-ID:
    <Pine.SOC.4.64.1201261558280.28059@solcom.hrz.uni-giessen.de>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

Hello, Dan,

you could probably use a combination of nchar(), substr() (or substring())
and paste(). Take a look at their online help pages.

Hth  --  Gerrit

> Hello everyone,
>
> I have a character vector of 24 hour time values in the format hm
> without the delimiting ":". How can I insert the ":"
immediately to
> the left of the second digit from the right?
>
> mytimes<-scan(what="")
> 1457
> 1457
> 1310
> 1158
> 137
> 1855
>
>
> Thanks!
>
> Dan
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 18
Date: Thu, 26 Jan 2012 06:34:37 -0800 (PST)

To: r-help@r-project.org
Cc: ruipbarradas@sapo.pt
Subject: Re: [R] R-help Digest, Vol 107, Issue 25
Message-ID:
    <1327588477.31560.YahooMailClassic@web161603.mail.bf1.yahoo.com>
Content-Type: text/plain; charset=us-ascii

Hi Rui,

Thanks for your reply to my post. My code still has various shortcomings but at
least now it is fully functional.

It may be that, as I transition to using R, I'll have to live with some less
than ideal code, at least at the outset. I'll just have to write and
re-write my code as I improve.

Appreciate your help.

Paul


Message: 66
Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST)
From: Rui Barradas <ruipbarradas@sapo.pt>
To: r-help@r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID: <1327427697928-4324533.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

Hello,

Point 3 is very simple, instead of 'print' use 'cat'.
Unlike 'print' it allows for several arguments and (very) simple
formating.

  { cat("Error: Invalid date values in", DateNames[[i]],
"\n",
               TestDates[DateNames][[i]][TestDates$Invalid==1], "\n")
}

Rui Barradas

Message: 53
Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST)

To: r-help@r-project.org
Subject: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID:
    <1327424089.1149.YahooMailClassic@web161604.mail.bf1.yahoo.com>
Content-Type: text/plain; charset=us-ascii

Hello Everyone,

Still new to R. Wrote some code that finds and prints invalid dates (see below).
This code works but I suspect it's not very good. If someone could show me a
better way, I'd greatly appreciate it.

Here is some information about what I'm trying to accomplish. My sense is
that the R date functions are best at identifying invalid dates when fed
character data in their default format. So my code converts the input dates to
character, breaks them apart using strsplit, and then reformats them. It then
identifies which dates are "missing" in the sense that the month or
year are unknown and prints out any remaining invalid date values.

As I see it, the code has at least 4 shortcomings.

1. It's too long. My understanding is that skilled programmers can usually
or often complete tasks like this in a few lines.

2. It's not vectorized. I started out trying to do something that was
vectorized but ran into problems with the strsplit function. I looked at the
help file and it appears this function will only accept a single character
vector.

3. It prints out the incorrect dates but doesn't indicate which date
variable they belong to. I tried various things with paste but never came up
with anything that worked. Ideally, I'd like to get something that looks
roughly like:

Error: Invalid date values in birthDT

"21931-11-23" 
"1933-06-31"

Error: Invalid date values in diagnosisDT

"2010-02-30"

4. There's no way to specify names for input and output data. I imagine this
would be fairly easy to specify this in the arguments to a function but am not
sure how to incorporate it into a for loop.

Thanks,

Paul  

##########################################
#### Code for detecting invalid dates ####
##########################################

#### Test Data ####

connection <- textConnection("
1 11/23/21931 05/23/2009 un/17/2011
2 06/20/1940  02/30/2010 03/17/2011
3 06/17/1935  12/20/2008 07/un/2011
4 05/31/1937  01/18/2007 04/30/2011
5 06/31/1933  05/16/2009 11/20/un
")

TestDates <- data.frame(scan(connection, 
         list(Patient=0, birthDT="", diagnosisDT="",
metastaticDT="")))

close(connection)

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)

#### List of Date Variables ####

DateNames <- c("birthDT", "diagnosisDT",
"metastaticDT")

#### Read Dates ####

for (i in seq(TestDates[DateNames])){
TestDates[DateNames][[i]] <- as.character(TestDates[DateNames][[i]])
TestDates$ParsedDT <- strsplit(TestDates[DateNames][[i]],"/")
TestDates$Month <- sapply(TestDates$ParsedDT,function(x)x[1])
TestDates$Day <- sapply(TestDates$ParsedDT,function(x)x[2])
TestDates$Year <- sapply(TestDates$ParsedDT,function(x)x[3])
TestDates$Day[TestDates$Day=="un"] <- "15"
TestDates[DateNames][[i]] <- with(TestDates, paste(Year, Month, Day, sep =
"-"))
is.na( TestDates[DateNames][[i]] [TestDates$Month=="un"] ) <- T
is.na( TestDates[DateNames][[i]] [TestDates$Year=="un"] ) <- T
TestDates$Date <- as.Date(TestDates[DateNames][[i]],
format="%Y-%m-%d")
TestDates$Invalid <- ifelse(is.na(TestDates$Date) &
!is.na(TestDates[DateNames][[i]]), 1, 0)
if( sum(TestDates$Invalid)==0 ) 
    { TestDates[DateNames][[i]] <- TestDates$Date } else
    { print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
TestDates <- subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date,
Invalid))
}

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)



------------------------------

Message: 19
Date: Thu, 26 Jan 2012 15:41:42 +0000
From: William Dunlap <wdunlap@tibco.com>
To: Dan Abner <dan.abner99@gmail.com>, "r-help@r-project.org"
    <r-help@r-project.org>
Subject: Re: [R] Inserting a character into a character string XXXX
Message-ID:
    <E66794E69CFDE04D9A70842786030B93260388@PA-MBX03.na.tibco.com>
Content-Type: text/plain; charset="us-ascii"

  > sub("([[:digit:]]{2,2})$", ":\\1", mytimes)
  [1] "14:57" "14:57" "13:10" "11:58"
"1:37"  "18:55"

That will convert "05" to ":05" and will do nothing
to "5".  Pad with 0's before calling sub if that is
required.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> -----Original Message-----
> From: r-help-bounces@r-project.org [mailto:r-help-bounces@r-project.org] On
Behalf Of Dan Abner
> Sent: Thursday, January 26, 2012 6:50 AM
> To: r-help@r-project.org
> Subject: [R] Inserting a character into a character string XXXX
> 
> Hello everyone,
> 
> I have a character vector of 24 hour time values in the format hm
> without the delimiting ":". How can I insert the ":"
immediately to
> the left of the second digit from the right?
> 
> mytimes<-scan(what="")
>  1457
>  1457
>  1310
>  1158
>  137
>  1855
> 
> 
> Thanks!
> 
> Dan
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 20
Date: Thu, 26 Jan 2012 07:54:43 -0800 (PST)

To: r-help@r-project.org
Subject: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID:
    <1327593283.18477.YahooMailClassic@web161603.mail.bf1.yahoo.com>
Content-Type: text/plain; charset=us-ascii

Sorry, sent this earlier but forgot to add an informative subject line. Am
resending, in the hopes of getting further replies. My apologies. Hope this is
OK.

Paul


Hi Rui,

Thanks for your reply to my post. My code still has various shortcomings but at
least now it is fully functional.

It may be that, as I transition to using R, I'll have to live with some less
than ideal code, at least at the outset. I'll just have to write and
re-write my code as I improve.

Appreciate your help.

Paul


Message: 66
Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST)
From: Rui Barradas <ruipbarradas@sapo.pt>
To: r-help@r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID: <1327427697928-4324533.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

Hello,

Point 3 is very simple, instead of 'print' use 'cat'.
Unlike 'print' it allows for several arguments and (very) simple
formating.

  { cat("Error: Invalid date values in", DateNames[[i]],
"\n",
               TestDates[DateNames][[i]][TestDates$Invalid==1], "\n")
}

Rui Barradas

Message: 53
Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST)

To: r-help@r-project.org
Subject: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID:
    <1327424089.1149.YahooMailClassic@web161604.mail.bf1.yahoo.com>
Content-Type: text/plain; charset=us-ascii

Hello Everyone,

Still new to R. Wrote some code that finds and prints invalid dates (see below).
This code works but I suspect it's not very good. If someone could show me a
better way, I'd greatly appreciate it.

Here is some information about what I'm trying to accomplish. My sense is
that the R date functions are best at identifying invalid dates when fed
character data in their default format. So my code converts the input dates to
character, breaks them apart using strsplit, and then reformats them. It then
identifies which dates are "missing" in the sense that the month or
year are unknown and prints out any remaining invalid date values.

As I see it, the code has at least 4 shortcomings.

1. It's too long. My understanding is that skilled programmers can usually
or often complete tasks like this in a few lines.

2. It's not vectorized. I started out trying to do something that was
vectorized but ran into problems with the strsplit function. I looked at the
help file and it appears this function will only accept a single character
vector.

3. It prints out the incorrect dates but doesn't indicate which date
variable they belong to. I tried various things with paste but never came up
with anything that worked. Ideally, I'd like to get something that looks
roughly like:

Error: Invalid date values in birthDT

"21931-11-23" 
"1933-06-31"

Error: Invalid date values in diagnosisDT

"2010-02-30"

4. There's no way to specify names for input and output data. I imagine this
would be fairly easy to specify this in the arguments to a function but am not
sure how to incorporate it into a for loop.

Thanks,

Paul  

##########################################
#### Code for detecting invalid dates ####
##########################################

#### Test Data ####

connection <- textConnection("
1 11/23/21931 05/23/2009 un/17/2011
2 06/20/1940  02/30/2010 03/17/2011
3 06/17/1935  12/20/2008 07/un/2011
4 05/31/1937  01/18/2007 04/30/2011
5 06/31/1933  05/16/2009 11/20/un
")

TestDates <- data.frame(scan(connection, 
         list(Patient=0, birthDT="", diagnosisDT="",
metastaticDT="")))

close(connection)

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)

#### List of Date Variables ####

DateNames <- c("birthDT", "diagnosisDT",
"metastaticDT")

#### Read Dates ####

for (i in seq(TestDates[DateNames])){
TestDates[DateNames][[i]] <- as.character(TestDates[DateNames][[i]])
TestDates$ParsedDT <- strsplit(TestDates[DateNames][[i]],"/")
TestDates$Month <- sapply(TestDates$ParsedDT,function(x)x[1])
TestDates$Day <- sapply(TestDates$ParsedDT,function(x)x[2])
TestDates$Year <- sapply(TestDates$ParsedDT,function(x)x[3])
TestDates$Day[TestDates$Day=="un"] <- "15"
TestDates[DateNames][[i]] <- with(TestDates, paste(Year, Month, Day, sep =
"-"))
is.na( TestDates[DateNames][[i]] [TestDates$Month=="un"] ) <- T
is.na( TestDates[DateNames][[i]] [TestDates$Year=="un"] ) <- T
TestDates$Date <- as.Date(TestDates[DateNames][[i]],
format="%Y-%m-%d")
TestDates$Invalid <- ifelse(is.na(TestDates$Date) &
!is.na(TestDates[DateNames][[i]]), 1, 0)
if( sum(TestDates$Invalid)==0 ) 
    { TestDates[DateNames][[i]] <- TestDates$Date } else
    { print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
TestDates <- subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date,
Invalid))
}

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)



------------------------------

Message: 21
Date: Thu, 26 Jan 2012 11:03:43 -0500
From: Dimitri Liakhovitski <dimitri.liakhovitski@gmail.com>
To: Barry Rowlingson <b.rowlingson@lancaster.ac.uk>
Cc: r-help <r-help@r-project.org>
Subject: Re: [R] Coloring Canada provinces (package maps?)
Message-ID:
    <CAN2xGJbDZi1K1gbxu0kfu_wxrG-zDKJm=X0HnuEJkycd3LD=UQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Barry, thanks a lot!
I was able to read in Candian data set from gadm:

library(raster)
# Finding ISO3 code for Canada
getData('ISO3')  # Canada's code is "CAN"
# Reading in data at different levels
can0<-getData('GADM', country="CAN", level=0)
can1<-getData('GADM', country="CAN", level=1)
can2<-getData('GADM', country="CAN", level=2)
class(can0)
str(can0)
class(can1)
str(can1)

Apologies for a novice question (I've never worked with maps and
raster before): what is the way in raster to see what are the
geographic units within each data set (can0, can1, can2)?
And what function allows to plot them and color them?

[[elided Yahoo spam]]
Dimitri



On Wed, Jan 25, 2012 at 4:37 PM, Barry Rowlingson
<b.rowlingson@lancaster.ac.uk> wrote:> On Wed, Jan 25, 2012 at 9:25 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski@gmail.com> wrote:
>> Dear R'ers,
>>
>> I am wondering what is the smallest geographicterritorial unit
>> available for formatting in Canada. Provinces?
>>
>>
>> I know that in the US it is the county so that I can color US
>> counties any way I want, for example:
>>
>> ### Example for coloring US counties
>> ### Creating an ARTIFICIAL criterion for coloring US counties:
>> library(maps)
>
> If you want to extend your skills beyond the map package then you can
> plot anything that you can get a shapefile, or other common geospatial
> data set of, using the sp packages and friends such as maptools and
> rgdal.
>
> ?gadm has four levels of Canadian boundaries, at first glance -
> country, province (black), something smaller than province (blue) and
> then red which looks like urban divisions.
>
> ?The province upper-left doesn't seem to have any blue subdivisions,
> but that's possibly because there would be more subdivisions than
> people who actually live there.
>
> http://www.gadm.org/download
>
> ?Gadm also has a facility to download the data as .Rdata objects that
> can load straight into R.
>
> ?You might want to ask questions about spatial data programming on
> R-sig-geo or even on www.stackoverflow.com with the R tag.
>
> Barry


-- 
Dimitri Liakhovitski
marketfusionanalytics.com



------------------------------

Message: 22
Date: Thu, 26 Jan 2012 09:08:59 -0700
From: Greg Snow <Greg.Snow@imail.org>
To: Chris Wallace <chris.wallace@cimr.cam.ac.uk>,
    "r-help@r-project.org"    <r-help@r-project.org>
Subject: Re: [R] null distribution of binom.test p values
Message-ID:
    <B37C0A15B8FB3C468B5BC7EBC7DA14CC637F1386DC@LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset="us-ascii"

I believe that what you are seeing is due to the discrete nature of the binomial
test.  When I run your code below I see the bar between 0.9 and 1.0 is about
twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 0.9 is
not there (height 0), if you average the top 2 bars (0.8-0.9 and 0.9-1.0) then
the average height is similar to that of the lowest bar.  The bar between 0.5
and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 and 0.7-0.8)
then they are also similar to the bars near 0.



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow@imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Chris Wallace
> Sent: Thursday, January 26, 2012 5:44 AM
> To: r-help@r-project.org
> Subject: [R] null distribution of binom.test p values
> 
> Dear R-help,
> 
> I must be missing something very obvious, but I am confused as to why
> the null distribution for p values generated by binom.test() appears to
> be non-uniform.  The histogram generated below has a trend towards
> values closer to 1 than 0.  I expected it to be flat.
> 
> hist(sapply(1:1000, function(i,n=100)
>
binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value))
> 
> This trend is more pronounced for small n, and the distribution appears
> uniform for larger n, say n=1000.  I had expected the distribution to
> be
> discrete for small n, but not skewed.  Can anyone explain why?
> 
> Many thanks,
> 
> Chris.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 23
Date: Thu, 26 Jan 2012 16:17:53 +0100
From: Juan Antonio Balbuena <j.a.balbuena@uv.es>
To: r-help@r-project.org
Subject: [R] Error in ifelse(append, "a", "w") : , (list)
object
    cannot be coerced to type 'logical'
Message-ID: <4F216EA1.4030503@uv.es>
Content-Type: text/plain; charset="ISO-8859-1"


   Hello
   I will appreciate your help with the following.
   Running a script in R 2.14.1 under windows vista I get the following error
   message:
    Error in ifelse(append, "a", "w") :
     (list) object cannot be coerced to type 'logical'
   However, the very same script runs perfectly well under Ubuntu.
   My understanding is that this type of error is associated to writing data to
   disk. So the problem is likely caused by the following statements, which are
   inside a loop:
   P.vector <- c(P1, P2, P3)
   write (P.vector, file = "Type2_simul3_2012.txt", sep ="\t
", append =T).
   I  have read that including "\" in such statements can be
problematic,
   because it is a scape character in R, but trying sep=" " instead of
"\t" did
   not solve the problem.
   Any help will be much appreciated. Thanks in advance.
   Juan A. Balbuena

   --

   Dr. Juan A. Balbuena
   Marine Zoology Unit
   Cavanilles Institute of Biodiversity and Evolutionary Biology
   University of
   Valencia
   [1]http://www.uv.es/~balbuena
   P.O. Box 22085
   [2]http://www.uv.es/cavanilles/zoomarin/index.htm
   46071 Valencia, Spain
   [3]http://cetus.uv.es/mullpardb/index.html
   e-mail: [4]j.a.balbuena@uv.es    tel. +34 963 543 658    fax +34 963 543 733
   ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
   NOTE! For shipments by EXPRESS COURIER use the following street address:
   C/ Catedr??tico Jos?? Beltr??n 2, 46980 Paterna (Valencia), Spain.
   ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

References

   1. http://www.uv.es/%7Ebalbuena
   2. http://www.uv.es/cavanilles/zoomarin/index.htm
   3. http://cetus.uv.es/mullpardb/index.html
   4. mailto:j.a.balbuena@uv.es


------------------------------

Message: 24
Date: Thu, 26 Jan 2012 17:12:50 +0100
From: Raphael Bauduin <rblists@gmail.com>
To: r-help@r-project.org
Subject: [R] adding additional information to histogram
Message-ID:
    <CAONrwUHnU9LdN6m5MFaS7bJ41r6U8Pb2yM14e_uGDtTwOmH=GQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I am a beginner with R, and I think the answer to my question will
seem obvious, but after searching and trying without success I've
decided to post to the list.

I am working with data loaded from a csv filewith these fields:
  order_id, item_value
As an order can have multiple items, an order_id may be present
multiple times in the CSV.

I managed to compute the total value  and the number of items for each order:

  oli <- read.csv("/tmp/order_line_items_data.csv", header=TRUE)
  orders_values <- tapply(oli[[2]], oli[[1]], sum)
  items_per_order <- tapply(oli[[2]], oli[[1]], length)

I then can display the histogram of the order values:

  hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)

Now on this histogram, I would like to display the average number of
items of the orders in each group (defined with the breaks).
So for the bar of orders with value 0 to 10, I'd like to display the
average number of items of these orders.

Thanks in advance

Raph



------------------------------

Message: 25
Date: Thu, 26 Jan 2012 08:29:22 -0800 (PST)
From: yan <y.jiao@ucl.ac.uk>
To: r-help@r-project.org
Subject: Re: [R] function for grouping
Message-ID: <1327595362680-4330877.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

what if I don't need to store the combination results, I just want to get
the
combination result for a given row.
e.g. for the 5 elements divided into 3 groups , and if I give another
parameter which is the row number of the results, in petr's example, say if
I set row number to 1, then I get 1,2,3,1,1.

So I need a systematic way of generating the combination, but I don't need
all the combinations in one go.

Many thanks!!!!


yan

--
View this message in context:
http://r.789695.n4.nabble.com/function-for-grouping-tp4324436p4330877.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 26
Date: Thu, 26 Jan 2012 11:32:28 -0500
From: David Winsemius <dwinsemius@comcast.net>
To: Juan Antonio Balbuena <j.a.balbuena@uv.es>
Cc: r-help@r-project.org
Subject: Re: [R] Error in ifelse(append, "a", "w") : ,   
(list) object
    cannot be coerced to type 'logical'
Message-ID: <653C3206-D1C1-4B71-A263-3374B72A367A@comcast.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed; delsp=yes


On Jan 26, 2012, at 10:17 AM, Juan Antonio Balbuena wrote:
>
>   Hello
>   I will appreciate your help with the following.
>   Running a script in R 2.14.1 under windows vista I get the  
> following error
>   message:
>    Error in ifelse(append, "a", "w") :
>     (list) object cannot be coerced to type 'logical'
>   However, the very same script runs perfectly well under Ubuntu.
>   My understanding is that this type of error is associated to 
> writing data to
>   disk. So the problem is likely caused by the following statements,  
> which are
>   inside a loop:
>   P.vector <- c(P1, P2, P3)
>   write (P.vector, file = "Type2_simul3_2012.txt", sep ="\t
", 
> append =T).
>   I  have read that including "\" in such statements can be  
> problematic,
>   because it is a scape character in R, but trying sep=" "
instead 
> of "\t" did
>   not solve the problem.

Isn't it much more likely that your failure to use TRUE and instead  
taking the lazy and failure-prone shortcut of using only "T" that is 
the root of the problem.

>   Any help will be much appreciated. Thanks in advance.
>   Juan A. Balbuena
>
>   --
>
>   Dr. Juan A. Balbuena
>   Marine Zoology Unit
>   Cavanilles Institute of Biodiversity and Evolutionary Biology
>   University of
>   Valencia
>   [1]http://www.uv.es/~balbuena
>   P.O. Box 22085
>   [2]http://www.uv.es/cavanilles/zoomarin/index.htm
>   46071 Valencia, Spain
>   [3]http://cetus.uv.es/mullpardb/index.html
>   e-mail: [4]j.a.balbuena@uv.es    tel. +34 963 543 658    fax +34  
> 963 543 733
>   ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   NOTE! For shipments by EXPRESS COURIER use the following street  
> address:
>   C/ Catedr??tico Jos?? Beltr??n 2, 46980 Paterna (Valencia), Spain.
>   ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> References
>
>   1. http://www.uv.es/%7Ebalbuena
>   2. http://www.uv.es/cavanilles/zoomarin/index.htm
>   3. http://cetus.uv.es/mullpardb/index.html
>   4. mailto:j.a.balbuena@uv.es
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT



------------------------------

Message: 27
Date: Thu, 26 Jan 2012 17:33:51 +0100
From: Rainer Hurling <rhurlin@gwdg.de>
To: r-help@r-project.org
Subject: [R] lattice panels with grouped extra data as text?
Message-ID: <4F21806F.7090603@gwdg.de>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed

I have a problem with including extra data in a lattice graphic. I am 
trying to put some calculated values in an extra column of a boxplot. A 
minimal example should show what I am trying to do:


foo <- data.frame(
   Treatment=rnorm(1:12,2),
  
Variant=c("A","A","B","C","D","C","B","D","D","B","B","A"),

Szenario=c("Dry","Wet","Dry","Dry","Wet","Dry","Wet","Wet","Dry","Wet","Dry","Dry"),
   Area=c(sample(100, 12))
                   )

require(lattice)
require(latticeExtra)

bwplot(Treatment ~ Variant | Szenario,
        data=foo,

        panel = function(...) {
          # Marking the extra column yellow
          panel.xblocks(c(4.5,5.0),
                        c(rgb(255,255,0, alpha=127, max=255),
                          rgb(255,255,0, alpha=127, max=255)))
          # Put in the extra values (instead of a string)
          panel.text(5, min(foo$Treatment), "sum of foo$area per
panel",
srt=90, adj=c(0,0.5), cex=2)

          panel.bwplot(...)
        },

        # Create extra space for a column
        xlim=range(0.5,5.5),
        scales = list(x =
list(labels=c(NA,"A","B","C","D","")))
        )


I would like to put summarized area values (from Foo$Area) in the yellow 
coloured columns of each panel. So afterwards there should be one sum in 
the first panel and another sum in the next panel (and so on, if more 
than two panels).

It tried a little bit with groups and group.number but without success.

Is there any easy solution for this?

Any help is really appreciated.
Rainer Hurling



------------------------------

Message: 28
Date: Thu, 26 Jan 2012 17:46:38 +0100 (CET)
From: Achim Zeileis <Achim.Zeileis@uibk.ac.at>
To: James Holland <holland.aggie@gmail.com>
Cc: R-help@r-project.org
Subject: Re: [R] eRm package - Rasch simulation
Message-ID: <alpine.DEB.2.02.1201261744350.29391@paninaro.uibk.ac.at>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Thu, 26 Jan 2012, James Holland wrote:
> When I try to create a Rasch simulation of data using the sim.rasch
> function, I get more items than I intend
>
> #My code
>
> library(eRm)
>
> #Number of items
> k <- 20
>
> #Number of participants
> n <- 100
>
>
> #Create Rasch Data
> #sim.rasch(persons, items, seed = NULL, cutpoint = "randomized")
>
> r.simulation <- sim.rasch(n,k)
>
> I end up with 20 participants and 100 items, but the instructions say
> otherwise (and I believe I have the most up to date).  I reverse the n and
> k
If I use your code above, I get

R> dim(r.simulation)
[1] 100  20

which is exactly as expected: 100 rows = persons and 20 columns = items.

This is with the current CRAN version of eRm: 0.14-0.
Z
> sim.rasch(k,n); it works out to how I intended; 100 participants and 20
> items.
>
>
> Can someone explain to me what I'm missing?  I've read the package
> instructions and didn't see myself missing something (inputting vectors
> instead of an integer).
>
> James
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


------------------------------

Message: 29
Date: Thu, 26 Jan 2012 08:45:50 -0800
From: Brian Diggs <diggsb@ohsu.edu>
To: <R-help@r-project.org>
Subject: Re: [R] List to Array: How to establish the dimension of the
    array
Message-ID: <4F21833E.5080704@ohsu.edu>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed

Please include the context of the discussion in your responses.  See 
inline below.

On 1/24/2012 11:33 PM, Ajay Askoolum wrote:> Thanks you,
>
> I can get the length of aa with length(unlist(aa)). If aa has 4
> dimensions, I imagine I'd need to do
>
> max(sapply(aa,sapply,sapply,length)
Correct.
> How can I do this in a generic way? That is in a loop. I am clear
> about the exit condition for the loop.
By generic, I assume that you mean without knowing the depth of the 
lists (that is, "dimensions") to begin with?
> d<-1
>
>
> start loop
>
> if d = length(unlist(aa)) then exit loop
Note that length(unlist(aa)) will only equal the product of the 
dimensions if the data is "regular."  That is, there is an entry for 
every combination of indices (within the dimensions).
> else
>      d<-d *<expr>
>
>
> How do I construct<expr>  such that it does
>
>> d<- d * length(aa)                      # first pass
>
>> d<- d * max(sapply(aa, length))         # second pass
>
>> d<- d * max(sapply(aa, sapply, length)) # third pass
>
>
>> # ? #                                    # fourth path etc
>
>
> (Apologies for the pseudo code describing the loop; I am not near a
> machine with R)
I don't really understand this.
> One way I can thing of is to create a loop counter variable, say
> lc<-1 and to increment it within the loop and then use a switch
> statement to execute the appropriate expression. This sems like a
> kludge to me.
>
> Is there a neater way?
If you are trying to get back a vector of the dimensions, then this 
would work:

dimRecursiveList <- function(l) {
    if (class(l) == "list") {
        c(length(l), dimRecursiveList(l[[1]]))
    } else {
        NULL
    }
}

From previous context:

aa <- list(list(list(37531.52, 62787.32, 5503.184, 33832.8),
                 list(20469.60, 27057.27, 51160.25, 45165.24),
                 list(957.932,  21902.94, 37531.52, 62787.32)),
            list(list(5503.184, 33832.8,  20469.6,  27057.27),
                 list(51160.25, 45165.24, 957.932,  21902.94),
                 list(37531.52, 62787.32, 5503.184, 33832.8)))

Which then gives
> dimRecursiveList(aa)[1] 2 3 4

In this case, the data is regular, so
> Reduce(`*`, dimRecursiveList(aa)) == length(unlist(aa))[1] TRUE

If the data are not regular, and you want the dimension to be the 
largest, then it is more complicated (due to bookkeeping)

dimRecursiveListIrregular <- function(l) {
   if (class(l) == "list") {
     dims <- sapply(l, dimRecursiveListIrregular)
     if (is.null(dim(dims))) {
       if (all(is.na(dims))) {
         length(l)
       } else {
         c(length(l), max(dims, na.rm=TRUE))
       }
     } else {
       c(length(l), apply(dims, 1, max, na.rm=TRUE))
     }
   } else {
     NA
   }
}

If the data are regular, then it is better to convert this to an array. 
This function will do it for arbitrary depth

RecursiveListToArray <- function(l) {
   if(class(l) == "list") {
     laply(l, RecursiveListToArray)
   } else {
     l
   }
}
> aaa <- RecursiveListToArray(aa)
> dim(aaa)[1] 2 3 4

-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



------------------------------

Message: 30
Date: Thu, 26 Jan 2012 17:51:34 +0100 (CET)
From: Achim Zeileis <Achim.Zeileis@uibk.ac.at>
To: James Holland <holland.aggie@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] eRm - Rasch modeling - First item missing from
    estimation
Message-ID: <alpine.DEB.2.02.1201261746470.29391@paninaro.uibk.ac.at>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Thu, 26 Jan 2012, James Holland wrote:
> I'm trying to kick off some of the rust and learn some of the R
packages
> for Rasch modeling.  When I tried using the eRm package, I get item
> difficulty estimates for all my items accept the first (in terms of order)
> item.
>
> #Begin code
>
> library(eRm)
>
> r.simulation <- sim.rasch(20,100)
>
>
> r.data <- r.simulation$items
>
> #eRm results
>
> erm.rasch <- RM(r.data)
> names(erm.rasch)
>
> erm.items <- erm.rasch$etapar
>
> erm.items
>
>
> #end code
>
>
> All items accept V1 (the first item in the list) show up.
As explained on the manual page ?RM, one of the parameters is not 
identified (as the Rasch scale is latent with arbitrary 0). Either you can
restrict the sum of all item parameters to zero (default) or the first 
item parameter. In both cases, the print output of the model omits the 
first item parameter. But coef(erm.rasch) will give you the appropriate 
expanded parameter vector.

See also vignette("eRm", package = "eRm") for more details.
Z
>
> James
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


------------------------------

Message: 31
Date: Thu, 26 Jan 2012 09:51:36 -0700
From: Greg Snow <Greg.Snow@imail.org>
To: Chris Wallace <chris.wallace@cimr.cam.ac.uk>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] null distribution of binom.test p values
Message-ID:
    <B37C0A15B8FB3C468B5BC7EBC7DA14CC637F138727@LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset="us-ascii"

Yes that is due to the discreteness of the distribution, consider the following:
> binom.test(39,100,.5)
        Exact binomial test

data:  39 and 100 
number of successes = 39, number of trials = 100, p-value = 0.0352
alternative hypothesis: true probability of success is not equal to 0.5 
95 percent confidence interval:
0.2940104 0.4926855 
sample estimates:
probability of success 
                  0.39 
> binom.test(40,100,.5)
        Exact binomial test

data:  40 and 100 
number of successes = 40, number of trials = 100, p-value = 0.05689
alternative hypothesis: true probability of success is not equal to 0.5 
95 percent confidence interval:
0.3032948 0.5027908 
sample estimates:
probability of success 
                   0.4

(you can do the same for 60 and 61)

So notice that the probability of getting 39 or more extreme is 0.0352, but
anything less extreme will result in not rejecting the null hypothesis (because
the probability of getting a 40 or a 60 (dbinom(40,100,.5)) is about 1% each, so
we see a 2% jump there).  So the size/probability of a type I error will
generally not be equal to alpha unless n is huge or alpha is chosen to
correspond to a jump in the distribution rather than using common round values.

I have seen suggestions that instead of the standard test we use a test that
rejects the null for values 39 and more extreme, don't reject the null for
41 and less extreme, and if you see a 40 or 60 then you generate a uniform
random number and reject if it is below a certain value (that value chosen to
give an overall probability of type I error of 0.05).  This will correctly size
the test, but becomes less reproducible (and makes clients nervous when they
present their data and you pull out a coin, flip it, and tell them if they have
significant results based on your coin flip (or more realistically a die
roll)).  I think it is better in this case if you know your final sample size is
going to be 100 to explicitly state that alpha will be 0.352 (but then you need
to justify why you are not using the common 0.05 to reviewers).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow@imail.org
801.408.8111

> -----Original Message-----
> From: Chris Wallace [mailto:chris.wallace@cimr.cam.ac.uk]
> Sent: Thursday, January 26, 2012 9:36 AM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] null distribution of binom.test p values
> 
> Greg, thanks for the reply.
> 
[[elided Yahoo spam]]> 
> I ran a longer simulation, 100,000 reps.  The size of the test is
> consistently too small (see below) and the histogram shows increasing
> bars even within the parts of the histogram with even bar spacing.  See
> https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png
> 
> y<-sapply(1:100000, function(i,n=100)
>  
binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
> mean(y<0.01)
> # [1] 0.00584
> mean(y<0.05)
> # [1] 0.03431
> mean(y<0.1)
> # [1] 0.08646
> 
> Can that really be due to the discreteness of the distribution?
> 
> C.
> 
> On 26/01/12 16:08, Greg Snow wrote:
> > I believe that what you are seeing is due to the discrete nature of
> the binomial test.  When I run your code below I see the bar between
> 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but
> the bar between 0.8 and 0.9 is not there (height 0), if you average the
> top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to
> that of the lowest bar.  The bar between 0.5 and 0.6 is also 0, if you
> average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are
> also similar to the bars near 0.
> >
> >
> >


------------------------------

Message: 32
Date: Thu, 26 Jan 2012 16:36:00 +0000
From: Chris Wallace <chris.wallace@cimr.cam.ac.uk>
To: Greg Snow <Greg.Snow@imail.org>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] null distribution of binom.test p values
Message-ID: <4F2180F0.1080900@cimr.cam.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Greg, thanks for the reply.

[[elided Yahoo spam]]

I ran a longer simulation, 100,000 reps.  The size of the test is 
consistently too small (see below) and the histogram shows increasing 
bars even within the parts of the histogram with even bar spacing.  See 
https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png

y<-sapply(1:100000, function(i,n=100)
  binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
mean(y<0.01)
# [1] 0.00584
mean(y<0.05)
# [1] 0.03431
mean(y<0.1)
# [1] 0.08646

Can that really be due to the discreteness of the distribution?

C.

On 26/01/12 16:08, Greg Snow wrote:> I believe that what you are seeing is due to the discrete nature of the
binomial test.  When I run your code below I see the bar between 0.9 and 1.0 is
about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and
0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 0.9-1.0)
then the average height is similar to that of the lowest bar.  The bar between
0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 and
0.7-0.8) then they are also similar to the bars near 0.
>
>
>


------------------------------

Message: 33
Date: Thu, 26 Jan 2012 09:16:48 +0000
From: <gaurang.jitendrabhai@aviva.com>
To: <michael.weylandt@gmail.com>
Cc: R-help@r-project.org
Subject: Re: [R] Issues with PearsonIV distribution
Message-ID:
   
<42C99F8A87E4414BB4C297AB63B932380112F7312ACA@SWVLONCUEXDP02.im.root-domain.net>
    
Content-Type: text/plain; charset="us-ascii"

Hi Michael,
I am using the same version contributed by Martin Backer. I have been in touch
with him via email as well. He said yesterday that he can help me with it but is
busy for some time.
This distribution function does not behave properly as can be seen in the
attached spreadsheet. Check sheet "Rcode&Output" for Pearson
Distribution fits in comparison to hyperbolic and normal fits on the same sample
data.
Regards,
Gaurang 

-----Original Message-----
From: R. Michael Weylandt [mailto:michael.weylandt@gmail.com] 
Sent: 25 January 2012 20:00
To: Jitendrabhai Gaurang (AGC)
Cc: R-help@r-project.org
Subject: Re: [R] Issues with PearsonIV distribution

Which implementation of Pearson IV are you using?

If it's trouble with a homebrewed one, you might want to use the p/q
functions here:
http://cran.r-project.org/web/packages/PearsonDS/index.html

Otherwise, you likely will have to share some code for any concrete help.

Michael

On Wed, Jan 25, 2012 at 12:16 PM,  <gaurang.jitendrabhai@aviva.com>
wrote:> Hi team,
> I am facing issues with PearsonIV distribution fitting in R.
> I am applying Hyperbolic and PearsonIV distributions on the equity returns
in UK over a period of 30 years.
> For the same data set i am getting strikingly different results under which
Hyperbolic distribution does produce negative percentiles of the return after
fitting but PearsonIV distribution does not.
> I think there is an issue with PearsonIV distribution in R; also there are
no plots available for PearsonIV distribution.
> Can you help? I am ready to share my code.
>
> Gaurang Mehta
> Risk Calibration | Group Economic Capital | Aviva Group
> Email: gaurang.jitendrabhai@aviva.com
> 14th Floor, St Helen's
> 1 Undershaft
> London, EC3P 3DQ
>
>
> Aviva plc, registered Office: St. Helen's, 1 Undershaft, London EC3P
3DQ. Registered in England No. 02468686. www.aviva.com
> This message and any attachments may be confidential or ...{{dropped:10}}
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Aviva plc, registered Office: St. Helen's, 1 Undershaft, London EC3P 3DQ.
Registered in England No. 02468686. www.aviva.com
This message and any attachments may be confidential or legally privileged. If
you are not the intended recipient, please telephone or e-mail the sender and
delete this message and any attachments from your system. Also, if you are not
the intended recipient you must not copy this message or attachments or disclose
the contents to any other person. Any views or opinions expressed are solely
those of the author and do not necessarily represent those of Aviva.

------------------------------

Message: 34
Date: Thu, 26 Jan 2012 08:32:17 -0800 (PST)
From: Harish Eswaran <hran25@att.net>
To: r-help@r-project.org
Subject: [R] (no subject)
Message-ID:
    <1327595537.33989.YahooMailClassic@web83107.mail.mud.yahoo.com>
Content-Type: text/plain


        Dear Rhelp,



When I try to plot a barplot, I get the following message:



Error in plot.new() : figure margins too large



What does this mean and how can I fix it?



I appreciate your help,

Horatio Gates

    [[alternative HTML version deleted]]



------------------------------

Message: 35
Date: Thu, 26 Jan 2012 11:54:55 -0500
From: Dan Abner <dan.abner99@gmail.com>
To: William Dunlap <wdunlap@tibco.com>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] Inserting a character into a character string XXXX
Message-ID:
    <CAPRGo-k+w4uj-PgqRFyWpTdQPcaTExgjt41is=mBiZQpUL8vaQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi Bill,

Thanks very much for your response.

Can you suggest an approach for the "pre"-padding? Here is a more
respresentative sample of the values:

mytimes<-scan(what="")
1334
2310
39
2300
1556
3
404
37
1320
4
211
2320


Thanks!

Dan



On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap <wdunlap@tibco.com>
wrote:> ?> sub("([[:digit:]]{2,2})$", ":\\1", mytimes)
> ?[1] "14:57" "14:57" "13:10"
"11:58" "1:37" ?"18:55"
>
> That will convert "05" to ":05" and will do nothing
> to "5". ?Pad with 0's before calling sub if that is
> required.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: r-help-bounces@r-project.org
[mailto:r-help-bounces@r-project.org] On Behalf Of Dan Abner
>> Sent: Thursday, January 26, 2012 6:50 AM
>> To: r-help@r-project.org
>> Subject: [R] Inserting a character into a character string XXXX
>>
>> Hello everyone,
>>
>> I have a character vector of 24 hour time values in the format hm
>> without the delimiting ":". How can I insert the
":" immediately to
>> the left of the second digit from the right?
>>
>> mytimes<-scan(what="")
>> ?1457
>> ?1457
>> ?1310
>> ?1158
>> ?137
>> ?1855
>>
>>
>> Thanks!
>>
>> Dan
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 36
Date: Thu, 26 Jan 2012 08:58:29 -0800
From: Brian Diggs <diggsb@ohsu.edu>
To: <R-help@r-project.org>
Subject: Re: [R] aplpy recursive function on a list
Message-ID: <4F218635.8000208@ohsu.edu>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed

On 1/25/2012 10:09 AM, patzoul wrote:> I have 2 series of data a,b and I would like to calculate a new series
which
> is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
> How can I do that without using a loop?
A combination of Reduce and Map will work.  Map to stitch together the a 
and b vectors, Reduce to step along them, allowing for accumulation.

a <- c(2,4,3,5)
b <- c(1,3,5,7)

z <- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
             Map(c, a, b),
             init = b[1], accumulate = TRUE)
> z[1]   1   3  15  50 257

> --
> View this message in context:
http://r.789695.n4.nabble.com/aplpy-recursive-function-on-a-list-tp4328017p4328017.html
> Sent from the R help mailing list archive at Nabble.com.
>

-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



------------------------------

Message: 37
Date: Thu, 26 Jan 2012 17:00:08 +0000
From: Barry Rowlingson <b.rowlingson@lancaster.ac.uk>
To: Dimitri Liakhovitski <dimitri.liakhovitski@gmail.com>
Cc: r-help <r-help@r-project.org>
Subject: Re: [R] Coloring Canada provinces (package maps?)
Message-ID:
    <CANVKczM6awX4iByVLD+v05d+3wiY2kH6nvmPG4EvFPBo7-Qw0A@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Jan 26, 2012 at 4:03 PM, Dimitri Liakhovitski
<dimitri.liakhovitski@gmail.com> wrote:
[[elided Yahoo spam]]> I was able to read in Candian data set from gadm:
>
> library(raster)
> # Finding ISO3 code for Canada
> getData('ISO3') ?# Canada's code is "CAN"
> # Reading in data at different levels
> can0<-getData('GADM', country="CAN", level=0)
> can1<-getData('GADM', country="CAN", level=1)
> can2<-getData('GADM', country="CAN", level=2)
> class(can0)
> str(can0)
> class(can1)
> str(can1)
>
> Apologies for a novice question (I've never worked with maps and
> raster before): what is the way in raster to see what are the
> geographic units within each data set (can0, can1, can2)?
> And what function allows to plot them and color them?
These are now SpatialPolygonDataFrame objects - like data frames, but
each row has an associated polygonal geometry. These actually come
from the sp package which the raster package has loaded for you.

class(can0) will tell you this.

names(can1) will tell you the names of the attributes of the polygons
which you can use like columns of a data frame.

plot(can1) will plot it

spplot(can1, "columnname") will do a coloured plot.

For more info, check the help for sp or read the R-Spatial Task View
on CRAN which has everything you need to know about maps and such in
R.

Ask any more questions like this on the R-sig-geo mailing list where
the mapping R people hang out...

Barry



------------------------------

Message: 38
Date: Thu, 26 Jan 2012 11:59:54 -0500
From: "R. Michael Weylandt" <michael.weylandt@gmail.com>
To: Harish Eswaran <hran25@att.net>
Cc: r-help@r-project.org
Subject: Re: [R] (no subject)
Message-ID:
    <CAAmySGOJ1QGpS2i4oJ5h6bAw_ymQnnyrmBbcOQGdmT0vfpL=HQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

I usually get that error when I'm replotting on a window/device that's
already really small:

e.g., on my Mac

plot(1:5)
## Make window really small
plot(1:6) # Throws error

Michael


On Thu, Jan 26, 2012 at 11:32 AM, Harish Eswaran <hran25@att.net>
wrote:>
> ? ? ? ? ? ? ? ?Dear Rhelp,
>
>
>
> When I try to plot a barplot, I get the following message:
>
>
>
> Error in plot.new() : figure margins too large
>
>
>
> What does this mean and how can I fix it?
>
>
>
> I appreciate your help,
>
> Horatio Gates
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 39
Date: Thu, 26 Jan 2012 17:03:37 +0000
From: William Dunlap <wdunlap@tibco.com>
To: Dan Abner <dan.abner99@gmail.com>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] Inserting a character into a character string XXXX
Message-ID:
    <E66794E69CFDE04D9A70842786030B9326046D@PA-MBX03.na.tibco.com>
Content-Type: text/plain; charset="iso-8859-1"

One way to pad with initial zeros is to convert your
strings to integers and format the integers:
  > sprintf("%04d", as.integer(mytimes))
   [1] "1334" "2310" "0039" "2300"
"1556" "0003" "0404"
   [8] "0037" "1320" "0004" "0211"
"2320"
It has the added benefit that the call to as.integer
will warn you if any supposed integers don't look like
integers.  If you use this approach you don't need the
call to sub():
  > tmp <- as.integer(mytimes)
  > sprintf("%02d:%02d", tmp%/%100, tmp%%100)
   [1] "13:34" "23:10" "00:39" "23:00"
"15:56" "00:03"
   [7] "04:04" "00:37" "13:20" "00:04"
"02:11" "23:20"
You could also check that tmp%/%100 (the hour) and tmp%%100
(the minute) are in their expected ranges before calling
sprintf.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> -----Original Message-----
> From: Dan Abner [mailto:dan.abner99@gmail.com]
> Sent: Thursday, January 26, 2012 8:55 AM
> To: William Dunlap
> Cc: r-help@r-project.org
> Subject: Re: [R] Inserting a character into a character string XXXX
> 
> Hi Bill,
> 
> Thanks very much for your response.
> 
> Can you suggest an approach for the "pre"-padding? Here is a more
> respresentative sample of the values:
> 
> mytimes<-scan(what="")
> 1334
> 2310
> 39
> 2300
> 1556
> 3
> 404
> 37
> 1320
> 4
> 211
> 2320
> 
> 
> Thanks!
> 
> Dan
> 
> 
> 
> On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap <wdunlap@tibco.com>
wrote:
> > ?> sub("([[:digit:]]{2,2})$", ":\\1", mytimes)
> > ?[1] "14:57" "14:57" "13:10"
"11:58" "1:37" ?"18:55"
> >
> > That will convert "05" to ":05" and will do
nothing
> > to "5". ?Pad with 0's before calling sub if that is
> > required.
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >> -----Original Message-----
> >> From: r-help-bounces@r-project.org
[mailto:r-help-bounces@r-project.org] On Behalf Of Dan Abner
> >> Sent: Thursday, January 26, 2012 6:50 AM
> >> To: r-help@r-project.org
> >> Subject: [R] Inserting a character into a character string XXXX
> >>
> >> Hello everyone,
> >>
> >> I have a character vector of 24 hour time values in the format hm
> >> without the delimiting ":". How can I insert the
":" immediately to
> >> the left of the second digit from the right?
> >>
> >> mytimes<-scan(what="")
> >> ?1457
> >> ?1457
> >> ?1310
> >> ?1158
> >> ?137
> >> ?1855
> >>
> >>
> >> Thanks!
> >>
> >> Dan
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 40
Date: Thu, 26 Jan 2012 12:04:38 -0500
From: Dimitri Liakhovitski <dimitri.liakhovitski@gmail.com>
To: Barry Rowlingson <b.rowlingson@lancaster.ac.uk>
Cc: r-help <r-help@r-project.org>
Subject: Re: [R] Coloring Canada provinces (package maps?)
Message-ID:
    <CAN2xGJaUhmioy1p7kQzLH3G4mmqEHML30zH26iL4kYURSQHMkg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

[[elided Yahoo spam]]
Dimitri

On Thu, Jan 26, 2012 at 12:00 PM, Barry Rowlingson
<b.rowlingson@lancaster.ac.uk> wrote:> On Thu, Jan 26, 2012 at 4:03 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski@gmail.com> wrote:
[[elided Yahoo spam]]>> I was able to read in Candian data set from gadm:
>>
>> library(raster)
>> # Finding ISO3 code for Canada
>> getData('ISO3') ?# Canada's code is "CAN"
>> # Reading in data at different levels
>> can0<-getData('GADM', country="CAN", level=0)
>> can1<-getData('GADM', country="CAN", level=1)
>> can2<-getData('GADM', country="CAN", level=2)
>> class(can0)
>> str(can0)
>> class(can1)
>> str(can1)
>>
>> Apologies for a novice question (I've never worked with maps and
>> raster before): what is the way in raster to see what are the
>> geographic units within each data set (can0, can1, can2)?
>> And what function allows to plot them and color them?
>
> ?These are now SpatialPolygonDataFrame objects - like data frames, but
> each row has an associated polygonal geometry. These actually come
> from the sp package which the raster package has loaded for you.
>
> ?class(can0) will tell you this.
>
> ?names(can1) will tell you the names of the attributes of the polygons
> which you can use like columns of a data frame.
>
> ?plot(can1) will plot it
>
> ?spplot(can1, "columnname") will do a coloured plot.
>
> ?For more info, check the help for sp or read the R-Spatial Task View
> on CRAN which has everything you need to know about maps and such in
> R.
>
> ?Ask any more questions like this on the R-sig-geo mailing list where
> the mapping R people hang out...
>
> Barry


-- 
Dimitri Liakhovitski
marketfusionanalytics.com



------------------------------

Message: 41
Date: Thu, 26 Jan 2012 12:05:22 -0500
From: "R. Michael Weylandt" <michael.weylandt@gmail.com>
To: Jhope <jeanwaijang@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] How do I compare a multiple staged response with
    multivariables to a Development Index?
Message-ID:
    <CAAmySGPhgTqtYgTuYgggFhm0jxJiat+SEgckX2TSONYRh3_4-Q@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

This might get more traction on the R-SIG-Ecology lists.

And best of luck to you; I quite like turtles.

Michael

On Thu, Jan 26, 2012 at 4:37 AM, Jhope <jeanwaijang@gmail.com>
wrote:> Hi R- listeners,
>
> I should add that I would like also to compare my field data to an index
> model. The index was created by using the following script:
>
> devel.index <- function(values, weights=c(1, 2, 3, 4, 5, 6)) {
> ?foo <- values*weights
> ?return(apply(foo, 1, sum) / apply(values, 1, sum))
> }
>
> Background:
> Surveyed turtle egg embryos have been categorized into 6 stages of
> development in the field. The stages in the field data are named ST0, ST1,
> ST2, ST3, ST4, Shells. from the data = data.to.analyze.
>
> Q?
> 1. What is the best way to analyze the field data on embryonic development
> of 6 stages?
> 2. Doing this while considering, testing the variables: Veg, HTL,
> Aeventexhumed, Sector, Rayos, TotalEggs?
> 3. And then compare the results to a development index.
>
> The goal is to determine hatching success in various areas of the beach.
And
> try to create a development index of these microenvironments. Seasonality
> would play a key role. Is this possible?
>
> Many thanks!
> Saludos, Jean
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 42
Date: Thu, 26 Jan 2012 09:10:08 -0800 (PST)
From: Frank Harrell <f.harrell@vanderbilt.edu>
To: r-help@r-project.org
Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
    interactions    and    unique combinations?
Message-ID: <1327597808059-4331016.post@n4.nabble.com>
Content-Type: text/plain; charset=UTF-8

To pretend that AIC solves this problem is to ignore that AIC is just a
restatement of P-values.
Frank

Rub?n Roa wrote> 
> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at
> stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model
> outcompetes which in terms of predictive ability by -2AIC or more. Just
> sorting them by the AIC does not give you that if no model is better than
> the next best by less than 2AIC.
> Third, I was not implying that AIC differences play the role of
> significance tests. I agree with you that model selection is better not
> understood as a proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If
> one is bent on selecting just _the model_ then I check numerical
> optimization diagnostics such as size of gradients, KKT criteria, and
> other issues such as standard errors of parameter estimates and the
> correlation matrix of parameter estimates. For some reasons I don't
find
> model averaging appealing. I guess deep in my heart I expect more from my
> model than just the best predictive ability.
> 
> Rub?n
> 
> -----Mensaje original-----
> De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Ben Bolker
> Enviado el: mi?rcoles, 25 de enero de 2012 15:41
> Para: r-help@.ethz
> Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and
> unique combinations?
> 
> Rub?n Roa <rroa <at> azti.es> writes:
> 
>> A more 'manual' way to do it is this.
> 
>> If you have all your models named properly and well organized, say 
>> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces 
>> a list with one component named 'aic') then something like
>> this:
>  
>> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in
1:n){x[,3]
>> <-
abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
>>
unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}
> 
>  
>> will calculate all the 1081 AIC differences between paired comparisons 
>> of the 47 models. It may not be pretty but works for me.
> 
>  
>> An AIC difference equal or less than 2 is a tie, anything higher is 
>> evidence for the model with the less AIC (Sakamoto et al., Akaike 
>> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
>  
> 
>   I wouldn't go quite as far as Frank Harrell did in his answer, but
> 
> (1) it seems silly to do all pairwise comparisons of models; one of the
> big advantages of information-theoretic approaches is that you can just
> list the models in order of AIC ...
> 
> (2) the dredge() function from the MuMIn package may be useful for
> automating this sort of thing.  There is an also an AICtab function in the
> emdbook package.
> 
> (3) If you're really just interested in picking the single model with
the
> best expected predictive capability (and all of the other assumptions of
> the AIC approach are met -- very large data set, good fit to the data,
> etc.), then just picking the model with the best AIC will work.  It's
when
> you start to make inferences on the individual parameters within that
> model, without taking account of the model selection process, that you run
> into trouble.  If you are really concerned about good predictions then it
> may be better to do multi-model averaging *or* use some form of shrinkage
> estimator.
> 
> (4) The "delta-AIC<2 means pretty much the same" rule of thumb
is fine,
> but it drives me nuts when people use it as a quasi-significance-testing
> rule, as in "the simpler model is adequate if dAIC<2".  If
you're going to
> work in the model selection arena, then don't follow hypothesis-testing
> procedures!  A smaller AIC means lower expected K-L distance, period.
> 
> For the record, Brian Ripley has often expressed the (minority) opinion
> that AIC is *not* appropriate for comparing non-nested models (e.g. 
> &lt;http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).
> 
> 
>> 
>> -----Original Message-----
>> From: r-help-bounces <at> r-project.org on behalf of Milan 
>> Bouchet-Valat
>> Sent: Wed 1/25/2012 10:32 AM
>> To: Jhope
>> Cc: r-help <at> r-project.org
> 
>> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
>> interactions and unique combinations?
> 
>  
>> Le mardi 24 janvier 2012 ? 20:41 -0800, Jhope a ?crit :
>> > Hi R-listers,
>> > 
>> > I have developed 47 GLM models with different combinations of 
>> > interactions from 1 variable to 5 variables. I have manually made 
>> > each model separately and put them into individual tables
(organized
>> > by the number of variables) showing the AIC score. I want to
compare
>> all of these models.
>> > 
>> > 1) What is the best way to compare various models with unique 
>> > combinations and different number of variables?
>> See ?step or ?stepAIC (from package MASS) if you want an automated way 
>> of doing this.
>> 
>> > 2) I am trying to develop the most simplest model ideally. Even 
>> > though adding another variable would lower the AIC,
> 
>   No, not necessarily.
> 
>> how do I interpret it is worth
>> > it to include another variable in the model? How do I know when to
>> stop? 
> 
>   When the AIC stops decreasing.
> 
>> This is a general statistical question, not specific to R. As a 
>> general rule, if adding a variable lowers the AIC by a significant 
>> margin, then it's worth including it.
> 
>   If the variable lowers the AIC *at all* then your best estimate is that
> you should include it.
> 
>> You should only stop when a variable increases the AIC. But this is 
>> assuming you consider it a good indicator and you know what you're 
>> doing. There's plenty of literature on this subject.
>> 
>> > Definitions of Variables:
>> > HTL - distance to high tide line (continuous) Veg - distance to 
>> > vegetation Aeventexhumed - Event of exhumation Sector - number 
>> > measurements along the beach Rayos - major sections of beach 
>> > (grouped sectors) TotalEggs - nest egg density
>> > 
>> > Example of how all models were created: 
>> > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
Aeventexhumed,
>> > data=data.to.analyze, family=binomial) Model7.glm <- 
>> > glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, 
>> > data.to.analyze) Model21.glm <- glm(cbind(Shells,
TotalEggs-Shells)
>> > ~ HTL:Veg:TotalEggs, data.to.analyze, family = binomial)
Model37.glm
>> > <- glm(cbind(Shells, TotalEggs-Shells) ~ 
>> > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
>> To extract the AICs of all these models, it's easier to put them in
a
>> list and get their AICs like this:
>> m <- list()
>> m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, 
>> data=data.to.analyze, family=binomial)
>> m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = 
>> binomial, data.to.analyze)
>> 
>> sapply(m, extractAIC)
>> 
>> Cheers
>>
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331016.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 43
Date: Thu, 26 Jan 2012 18:13:32 +0100
From: Petr Savicky <savicky@cs.cas.cz>
To: r-help@r-project.org
Subject: Re: [R] function for grouping
Message-ID: <20120126171332.GA20653@cs.cas.cz>
Content-Type: text/plain; charset=utf-8

On Thu, Jan 26, 2012 at 08:29:22AM -0800, yan wrote:> what if I don't need to store the combination results, I just want to
get the
> combination result for a given row.
> e.g. for the 5 elements divided into 3 groups , and if I give another
> parameter which is the row number of the results, in petr's example,
say if
> I set row number to 1, then I get 1,2,3,1,1.
> 
> So I need a systematic way of generating the combination, but I don't
need
> all the combinations in one go.
> 
> Many thanks!!!!
Hi.

I suppose that your question is meant for some larger
example than 5 elements into 3 groups, since for this
small case, we can store the whole table.

Considering, for example, 200 elements and 3 groups, which
you mentioned previously, this can probably be done using
some implementation of large integers. The reason is that
the final table has approximately 3^200/6 rows. So, the index
of the last row has approx log2(3^200/6) = 314.4 bits.
The standard numbers have 53 bits. For large integers,
one can use packages "gmp", "Rmpfr" or some package
for symbolic algebra, for example "Ryacas".

Do you want to use a large integer m for a search of
the exactly m-th partitioning in lexicographic order?

Let me point out that sampling random partitionings can
be done much easier.

Petr.



------------------------------

Message: 44
Date: Thu, 26 Jan 2012 12:15:07 -0500
From: Dan Abner <dan.abner99@gmail.com>
To: William Dunlap <wdunlap@tibco.com>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] Inserting a character into a character string XXXX
Message-ID:
    <CAPRGo-=v-A2_+vhmpfLY7KFzza+w-QozEce7XZknEJh7JaZEUg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Cheers Bill!!



On Thu, Jan 26, 2012 at 12:03 PM, William Dunlap <wdunlap@tibco.com>
wrote:> One way to pad with initial zeros is to convert your
> strings to integers and format the integers:
> ?> sprintf("%04d", as.integer(mytimes))
> ? [1] "1334" "2310" "0039" "2300"
"1556" "0003" "0404"
> ? [8] "0037" "1320" "0004" "0211"
"2320"
> It has the added benefit that the call to as.integer
> will warn you if any supposed integers don't look like
> integers. ?If you use this approach you don't need the
> call to sub():
> ?> tmp <- as.integer(mytimes)
> ?> sprintf("%02d:%02d", tmp%/%100, tmp%%100)
> ? [1] "13:34" "23:10" "00:39"
"23:00" "15:56" "00:03"
> ? [7] "04:04" "00:37" "13:20"
"00:04" "02:11" "23:20"
> You could also check that tmp%/%100 (the hour) and tmp%%100
> (the minute) are in their expected ranges before calling
> sprintf.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -----Original Message-----
>> From: Dan Abner [mailto:dan.abner99@gmail.com]
>> Sent: Thursday, January 26, 2012 8:55 AM
>> To: William Dunlap
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Inserting a character into a character string XXXX
>>
>> Hi Bill,
>>
>> Thanks very much for your response.
>>
>> Can you suggest an approach for the "pre"-padding? Here is a
more
>> respresentative sample of the values:
>>
>> mytimes<-scan(what="")
>> 1334
>> 2310
>> 39
>> 2300
>> 1556
>> 3
>> 404
>> 37
>> 1320
>> 4
>> 211
>> 2320
>>
>>
>> Thanks!
>>
>> Dan
>>
>>
>>
>> On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap
<wdunlap@tibco.com> wrote:
>> > ?> sub("([[:digit:]]{2,2})$", ":\\1",
mytimes)
>> > ?[1] "14:57" "14:57" "13:10"
"11:58" "1:37" ?"18:55"
>> >
>> > That will convert "05" to ":05" and will do
nothing
>> > to "5". ?Pad with 0's before calling sub if that is
>> > required.
>> >
>> > Bill Dunlap
>> > Spotfire, TIBCO Software
>> > wdunlap tibco.com
>> >
>> >> -----Original Message-----
>> >> From: r-help-bounces@r-project.org
[mailto:r-help-bounces@r-project.org] On Behalf Of Dan Abner
>> >> Sent: Thursday, January 26, 2012 6:50 AM
>> >> To: r-help@r-project.org
>> >> Subject: [R] Inserting a character into a character string
XXXX
>> >>
>> >> Hello everyone,
>> >>
>> >> I have a character vector of 24 hour time values in the format
hm
>> >> without the delimiting ":". How can I insert the
":" immediately to
>> >> the left of the second digit from the right?
>> >>
>> >> mytimes<-scan(what="")
>> >> ?1457
>> >> ?1457
>> >> ?1310
>> >> ?1158
>> >> ?137
>> >> ?1855
>> >>
>> >>
>> >> Thanks!
>> >>
>> >> Dan
>> >>
>> >> ______________________________________________
>> >> R-help@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible
code.


------------------------------

Message: 45
Date: Thu, 26 Jan 2012 11:59:09 -0600
From: Marc Schwartz <marc_schwartz@me.com>

Cc: r-help@r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID: <1D0C307E-84DA-48FD-9919-03D6D8E42BFA@me.com>
Content-Type: text/plain; CHARSET=US-ASCII

Paul,

I have a partial solution for you. It is partial in that I have not quite
figured out the correct incantation to convert a 5 digit year (eg. 11/23/21931)
properly using the R date functions. According to various sources (eg. man
strptime and man strftime) as well as the R help for both functions, there are
extended formats available, but I am having a bout of cerebral flatulence in
getting them to work correctly and a search has not been fruitful. Perhaps
someone else can offer some insights.

That being said, with the exception of correctly handling that one situation,
which arguably IS a valid date a long time in the future and which would
otherwise result in a truncated year (first four digits only)
> as.Date("11/23/21931", format = "%m/%d/%Y")[1] "2193-11-23"

Here is one approach:

# Check the date. If as.Date() fails or the input is > 10 characters return
it
checkDate <- function(x) as.character(x[is.na(as.Date(x, format =
"%m/%d/%Y")) |
                                        nchar(as.character(x)) > 10])
> lapply(TestDates[, -1], checkDate)$birthDT
[1] "11/23/21931" "06/31/1933" 

$diagnosisDT
[1] "02/30/2010"

$metastaticDT
[1] "un/17/2011" "07/un/2011" "11/20/un"  


You could fine tune the checkDate() function to handle other formats, etc.

HTH,

Marc Schwartz


On Jan 26, 2012, at 9:54 AM, Paul Miller wrote:
> Sorry, sent this earlier but forgot to add an informative subject line. Am
resending, in the hopes of getting further replies. My apologies. Hope this is
OK.
> 
> Paul
> 
> 
> Hi Rui,
> 
> Thanks for your reply to my post. My code still has various shortcomings
but at least now it is fully functional.
> 
> It may be that, as I transition to using R, I'll have to live with some
less than ideal code, at least at the outset. I'll just have to write and
re-write my code as I improve.
> 
> Appreciate your help.
> 
> Paul
> 
> 
> Message: 66
> Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST)
> From: Rui Barradas <ruipbarradas@sapo.pt>
> To: r-help@r-project.org
> Subject: Re: [R] Checking for invalid dates: Code works but needs
>    improvement
> Message-ID: <1327427697928-4324533.post@n4.nabble.com>
> Content-Type: text/plain; charset=us-ascii
> 
> Hello,
> 
> Point 3 is very simple, instead of 'print' use 'cat'.
> Unlike 'print' it allows for several arguments and (very) simple
formating.
> 
>  { cat("Error: Invalid date values in", DateNames[[i]],
"\n",
>               TestDates[DateNames][[i]][TestDates$Invalid==1],
"\n") }
> 
> Rui Barradas
> 
> Message: 53
> Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST)
> To: r-help@r-project.org
> Subject: [R] Checking for invalid dates: Code works but needs
>    improvement
> Message-ID:
>    <1327424089.1149.YahooMailClassic@web161604.mail.bf1.yahoo.com>
> Content-Type: text/plain; charset=us-ascii
> 
> Hello Everyone,
> 
> Still new to R. Wrote some code that finds and prints invalid dates (see
below). This code works but I suspect it's not very good. If someone could
show me a better way, I'd greatly appreciate it.
> 
> Here is some information about what I'm trying to accomplish. My sense
is that the R date functions are best at identifying invalid dates when fed
character data in their default format. So my code converts the input dates to
character, breaks them apart using strsplit, and then reformats them. It then
identifies which dates are "missing" in the sense that the month or
year are unknown and prints out any remaining invalid date values.
> 
> As I see it, the code has at least 4 shortcomings.
> 
> 1. It's too long. My understanding is that skilled programmers can
usually or often complete tasks like this in a few lines.
> 
> 2. It's not vectorized. I started out trying to do something that was
vectorized but ran into problems with the strsplit function. I looked at the
help file and it appears this function will only accept a single character
vector.
> 
> 3. It prints out the incorrect dates but doesn't indicate which date
variable they belong to. I tried various things with paste but never came up
with anything that worked. Ideally, I'd like to get something that looks
roughly like:
> 
> Error: Invalid date values in birthDT
> 
> "21931-11-23" 
> "1933-06-31"
> 
> Error: Invalid date values in diagnosisDT
> 
> "2010-02-30"
> 
> 4. There's no way to specify names for input and output data. I imagine
this would be fairly easy to specify this in the arguments to a function but am
not sure how to incorporate it into a for loop.
> 
> Thanks,
> 
> Paul  
> 
> ##########################################
> #### Code for detecting invalid dates ####
> ##########################################
> 
> #### Test Data ####
> 
> connection <- textConnection("
> 1 11/23/21931 05/23/2009 un/17/2011
> 2 06/20/1940  02/30/2010 03/17/2011
> 3 06/17/1935  12/20/2008 07/un/2011
> 4 05/31/1937  01/18/2007 04/30/2011
> 5 06/31/1933  05/16/2009 11/20/un
> ")
> 
> TestDates <- data.frame(scan(connection, 
>         list(Patient=0, birthDT="", diagnosisDT="",
metastaticDT="")))
> 
> close(connection)
> 
> TestDates
> 
> class(TestDates$birthDT)
> class(TestDates$diagnosisDT)
> class(TestDates$metastaticDT)
> 
> #### List of Date Variables ####
> 
> DateNames <- c("birthDT", "diagnosisDT",
"metastaticDT")
> 
> #### Read Dates ####
> 
> for (i in seq(TestDates[DateNames])){
> TestDates[DateNames][[i]] <- as.character(TestDates[DateNames][[i]])
> TestDates$ParsedDT <- strsplit(TestDates[DateNames][[i]],"/")
> TestDates$Month <- sapply(TestDates$ParsedDT,function(x)x[1])
> TestDates$Day <- sapply(TestDates$ParsedDT,function(x)x[2])
> TestDates$Year <- sapply(TestDates$ParsedDT,function(x)x[3])
> TestDates$Day[TestDates$Day=="un"] <- "15"
> TestDates[DateNames][[i]] <- with(TestDates, paste(Year, Month, Day, sep
= "-"))
> is.na( TestDates[DateNames][[i]] [TestDates$Month=="un"] ) <-
T
> is.na( TestDates[DateNames][[i]] [TestDates$Year=="un"] ) <- T
> TestDates$Date <- as.Date(TestDates[DateNames][[i]],
format="%Y-%m-%d")
> TestDates$Invalid <- ifelse(is.na(TestDates$Date) &
!is.na(TestDates[DateNames][[i]]), 1, 0)
> if( sum(TestDates$Invalid)==0 ) 
>    { TestDates[DateNames][[i]] <- TestDates$Date } else
>    { print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
> TestDates <- subset(TestDates, select = -c(ParsedDT, Month, Day, Year,
Date, Invalid))
> }
> 
> TestDates
> 
> class(TestDates$birthDT)
> class(TestDates$diagnosisDT)
> class(TestDates$metastaticDT)


------------------------------

Message: 46
Date: Thu, 26 Jan 2012 13:07:10 -0500
From: Gabor Grothendieck <ggrothendieck@gmail.com>

Cc: r-help@r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID:
    <CAP01uRmdqSU2EqovySJeahNNgDYQrSzsYv7oP8QSiAW5RoAd+g@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> Hello Everyone,
>
> Still new to R. Wrote some code that finds and prints invalid dates (see
below). This code works but I suspect it's not very good. If someone could
show me a better way, I'd greatly appreciate it.
>
> Here is some information about what I'm trying to accomplish. My sense
is that the R date functions are best at identifying invalid dates when fed
character data in their default format. So my code converts the input dates to
character, breaks them apart using strsplit, and then reformats them. It then
identifies which dates are "missing" in the sense that the month or
year are unknown and prints out any remaining invalid date values.
>
> As I see it, the code has at least 4 shortcomings.
>
> 1. It's too long. My understanding is that skilled programmers can
usually or often complete tasks like this in a few lines.
>
> 2. It's not vectorized. I started out trying to do something that was
vectorized but ran into problems with the strsplit function. I looked at the
help file and it appears this function will only accept a single character
vector.
>
> 3. It prints out the incorrect dates but doesn't indicate which date
variable they belong to. I tried various things with paste but never came up
with anything that worked. Ideally, I'd like to get something that looks
roughly like:
>
> Error: Invalid date values in birthDT
>
> "21931-11-23"
> "1933-06-31"
>
> Error: Invalid date values in diagnosisDT
>
> "2010-02-30"
>
> 4. There's no way to specify names for input and output data. I imagine
this would be fairly easy to specify this in the arguments to a function but am
not sure how to incorporate it into a for loop.
>
> Thanks,
>
> Paul
>
> ##########################################
> #### Code for detecting invalid dates ####
> ##########################################
>
> #### Test Data ####
>
> connection <- textConnection("
> 1 11/23/21931 05/23/2009 un/17/2011
> 2 06/20/1940 ?02/30/2010 03/17/2011
> 3 06/17/1935 ?12/20/2008 07/un/2011
> 4 05/31/1937 ?01/18/2007 04/30/2011
> 5 06/31/1933 ?05/16/2009 11/20/un
> ")
>
> TestDates <- data.frame(scan(connection,
> ? ? ? ? ? ? ? ? list(Patient=0, birthDT="",
diagnosisDT="", metastaticDT="")))
>
> close(connection)
>
> TestDates
>
> class(TestDates$birthDT)
> class(TestDates$diagnosisDT)
> class(TestDates$metastaticDT)
>
> #### List of Date Variables ####
>
> DateNames <- c("birthDT", "diagnosisDT",
"metastaticDT")
>
> #### Read Dates ####
>
> for (i in seq(TestDates[DateNames])){
> TestDates[DateNames][[i]] <- as.character(TestDates[DateNames][[i]])
> TestDates$ParsedDT <- strsplit(TestDates[DateNames][[i]],"/")
> TestDates$Month <- sapply(TestDates$ParsedDT,function(x)x[1])
> TestDates$Day <- sapply(TestDates$ParsedDT,function(x)x[2])
> TestDates$Year <- sapply(TestDates$ParsedDT,function(x)x[3])
> TestDates$Day[TestDates$Day=="un"] <- "15"
> TestDates[DateNames][[i]] <- with(TestDates, paste(Year, Month, Day, sep
= "-"))
> is.na( TestDates[DateNames][[i]] [TestDates$Month=="un"] ) <-
T
> is.na( TestDates[DateNames][[i]] [TestDates$Year=="un"] ) <- T
> TestDates$Date <- as.Date(TestDates[DateNames][[i]],
format="%Y-%m-%d")
> TestDates$Invalid <- ifelse(is.na(TestDates$Date) &
!is.na(TestDates[DateNames][[i]]), 1, 0)
> if( sum(TestDates$Invalid)==0 )
> ? ? ? ?{ TestDates[DateNames][[i]] <- TestDates$Date } else
> ? ? ? ?{ print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
> TestDates <- subset(TestDates, select = -c(ParsedDT, Month, Day, Year,
Date, Invalid))
> }
>
> TestDates
>
> class(TestDates$birthDT)
> class(TestDates$diagnosisDT)
> class(TestDates$metastaticDT)
If s is a vector of character strings representing dates then bad is a
logical vector which is TRUE for the bad ones and FALSE for the good
ones (adjust as needed if a different date range is valid) so s[bad]
is the bad inputs and the output d is a "Date" vector with NAs for the
bad ones:

    x <- gsub("un", 15, s)
    d <- as.Date(x, "%m/%d/%Y")
    bad <- is.na(d) | d < as.Date("1900-01-01") | d >
Sys.Date()
        d[bad] <- NA

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



------------------------------

Message: 47
Date: Thu, 26 Jan 2012 19:10:44 +0100
From: Berend Hasselman <bhh@xs4all.nl>
To: Brian Diggs <diggsb@ohsu.edu>
Cc: R-help@r-project.org
Subject: Re: [R] aplpy recursive function on a list
Message-ID: <30532837-ED1A-4DF5-8E5B-13658EC049EE@xs4all.nl>
Content-Type: text/plain; charset=us-ascii


On 26-01-2012, at 17:58, Brian Diggs wrote:
> On 1/25/2012 10:09 AM, patzoul wrote:
>> I have 2 series of data a,b and I would like to calculate a new series
which
>> is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
>> How can I do that without using a loop?
> 
> A combination of Reduce and Map will work.  Map to stitch together the a
and b vectors, Reduce to step along them, allowing for accumulation.
> 
> a <- c(2,4,3,5)
> b <- c(1,3,5,7)
> 
> z <- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
>            Map(c, a, b),
>            init = b[1], accumulate = TRUE)
> 
> > z
> [1]   1   3  15  50 257
I don't think so.

a <- c(2,4,3,5)
b <- c(1,3,5,7)

z <- rep(0,length(a))
z[1] <- b[1]
for( t in 2:length(a) ) { z[t] <- a[t] * z[t-1] + b[t] }
z

gives

[1]   1   7  26 137

and this agrees with a manual calculation.

You get a vector of length 5 as result. It should be of length 4 with your data.
If you change the Reduce expression to this

u <- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
           Map(c, a[-1], b[-1]),
           init = b[1], accumulate = TRUE)

then you get the correct result.
> u[1]   1   7  26 137

Berend



------------------------------

Message: 48
Date: Thu, 26 Jan 2012 18:27:27 +0000 (GMT)
From: Ajay Askoolum <aa2e72e@yahoo.co.uk>
To: R General Forum <r-help@r-project.org>
Subject: [R] What does [[1]] mean?
Message-ID:
    <1327602447.36212.YahooMailNeo@web27004.mail.ukl.yahoo.com>
Content-Type: text/plain

I know that [] is used for indexing.
I know that [[]] is used for reference to a property of a COM object.

I cannot find any explanation of what [[1]] does or, more pertinently, where it
should be used.

Thank you.

    [[alternative HTML version deleted]]



------------------------------

Message: 49
Date: Thu, 26 Jan 2012 19:33:49 +0100
From: Berend Hasselman <bhh@xs4all.nl>
To: Berend Hasselman <bhh@xs4all.nl>
Cc: R-help@r-project.org, Brian Diggs <diggsb@ohsu.edu>
Subject: Re: [R] aplpy recursive function on a list
Message-ID: <50BC693B-1B51-4E19-8B28-4082495282E9@xs4all.nl>
Content-Type: text/plain; charset=us-ascii


On 26-01-2012, at 19:10, Berend Hasselman wrote:
> 
> On 26-01-2012, at 17:58, Brian Diggs wrote:
> 
>> On 1/25/2012 10:09 AM, patzoul wrote:
>>> I have 2 series of data a,b and I would like to calculate a new
series which
>>> is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
>>> How can I do that without using a loop?
>> 
>> ........
> 
> I don't think so.
> 
> a <- c(2,4,3,5)
> b <- c(1,3,5,7)
> 
> z <- rep(0,length(a))
> z[1] <- b[1]
> for( t in 2:length(a) ) { z[t] <- a[t] * z[t-1] + b[t] }
> z
> 
> gives
> 
> [1]   1   7  26 137
> 
> and this agrees with a manual calculation.
> 
> You get a vector of length 5 as result. It should be of length 4 with your
data.
> If you change the Reduce expression to this
> 
> u <- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
>           Map(c, a[-1], b[-1]),
>           init = b[1], accumulate = TRUE)
> 
> then you get the correct result.
> 
>> u
> [1]   1   7  26 137

And the loop especially if byte compiled with cmpfun  appears to be quite a bit
quicker.

Nrep <- 1000

tfrml.loop <- function(a,b) {
    z <- rep(0,length(a))
    z[1] <- b[1]
    for( t in 2:length(a) ) {
        z[t] <- a[t] * z[t-1] + b[t]
    }

    z
}

tfrml.rdce <- function(a,b) {
    u <- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
               Map(c, a[-1], b[-1]),
               init = b[1], accumulate = TRUE)
    u
}

library(compiler)
tfrml.loop.c <- cmpfun(tfrml.loop)
tfrml.rdce.c <- cmpfun(tfrml.rdce)

z.loop <- tfrml.loop(a,b)
z.rdce <- tfrml.rdce(a,b)
all.equal(z.loop, z.rdce)

library(rbenchmark)

N <- 500
set.seed(1)
a <- runif(N)
b <- runif(N)

benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b),
tfrml.rdce.c(a,b),
                replications=Nrep, columns=c("test",
"replications", "elapsed"))

                test replications elapsed
1   tfrml.loop(a, b)         1000   2.665
3 tfrml.loop.c(a, b)         1000   0.554
2   tfrml.rdce(a, b)         1000   4.082
4 tfrml.rdce.c(a, b)         1000   3.143

Berend

R-2.14.1 (32-bits), Mac OS X 10.6.8.



------------------------------

Message: 50
Date: Thu, 26 Jan 2012 11:55:08 -0700
From: Greg Snow <Greg.Snow@imail.org>
To: Ajay Askoolum <aa2e72e@yahoo.co.uk>, R General Forum
    <r-help@r-project.org>
Subject: Re: [R] What does [[1]] mean?
Message-ID:
    <B37C0A15B8FB3C468B5BC7EBC7DA14CC637F1387F8@LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset="us-ascii"

Have you read ?"[[" ?

The short answer is that you can use both [] and [[]] on lists, the [] construct
will return a subset of  the list (which will be a list) while [[]] will return
a single element of the list (which could be a list or a vector or whatever that
element may be):  compare:
> tmp <- list( a=1, b=letters )
> tmp[1]$a
[1] 1
> tmp[1] + 1Error in tmp[1] + 1 : non-numeric argument to binary
operator> tmp[[1]]
[1] 1> tmp[[1]] + 1[1] 2

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow@imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Ajay Askoolum
> Sent: Thursday, January 26, 2012 11:27 AM
> To: R General Forum
> Subject: [R] What does [[1]] mean?
> 
> I know that [] is used for indexing.
> I know that [[]] is used for reference to a property of a COM object.
> 
> I cannot find any explanation of what [[1]] does or, more pertinently,
> where it should be used.
> 
> Thank you.
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 51
Date: Thu, 26 Jan 2012 09:29:18 -0800 (PST)
From: maxbre <mbressan@arpa.veneto.it>
To: r-help@r-project.org
Subject: [R] merge multiple data frames
Message-ID: <1327598958777-4331089.post@n4.nabble.com>
Content-Type: text/plain; charset=UTF-8

This is my reproducible example (three data frames: a, b, c)

a<-structure(list(date = structure(1:6, .Label = c("2012-01-03", 
"2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
"2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
"2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
"2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
), class = "factor"), so2 = c(0.799401398190476, 0, 0,
0.0100453950434783,
0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 
178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 
110.425185027727), no = c(48.8543691516522, 88.7197448817391, 
93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
49.2986070321739, 46.5978461731818), co = c(0.618856168125,
0.996593475083333, 
0.666987416083333, 0.383437311166667, 0.281604928875, 0.155383408913043
), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 
26.9122200013043, 13.8421695947826, 12.3788847045455), ipa c(167.541954974667, 
252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 
173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 
0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 
319.753674830936, 33.8713897347193, 228.368119533759), temp c(7.91972825883333, 
3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 
3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 
94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
)), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
"ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class "data.frame")


b<-structure(list(date = structure(1:6, .Label = c("2012-01-03", 
"2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
"2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
"2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
"2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
), class = "factor"), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511,
105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 
16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 
28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 
0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 
9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 
66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 
221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 
215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 
0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 
88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
)), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
"ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class "data.frame")


c<-structure(list(date = structure(1:6, .Label = c("2012-01-03", 
"2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
"2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
"2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
"2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
), class = "factor"), so2 = c(2.617839247, 0, 0, 0.231044086, 
0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142,
178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498, 
221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447,
59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745, 
0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989, 
37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056,
432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128,
1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921, 
55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029, 
4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116,
96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038
)), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
"ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class "data.frame")


Given the three data frames ?a?, ?b? and ?c?, I need to merge them all by
the common field ?date?.
The following attempt is working fine but?

# code start
all<-merge(a,b,by="date",suffixes=c(".a",".b"),all=T)
all<-merge(all,c,by="date",all=T)
# code end

?I would like to possibly do it in just ?one shot? (a generalisation of the
code for a more complex case of handling many data frames) also by assigning
proper suffixes to each variable (not possible with the previous code
snippet)

Then I also try a different approach with the use of the library reshape and
the function ?merge_all? but?

# code start
library("reshape")
all.new<-merge_all(a, b, c, by="date", all=T,
suffixes=c(".a",".b",".c"))
# code end

?I got the following error message:
error in merge in merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
: 
  formal argument "all" associated to different argument passed
(a free translation from italian)

My question is: 
how to accomplish the merging of multiple data frames with all the same
variable names and by a single id field with the need of ?keeping track? of
the original data frame by means of the use of suffixes appended to new
variables?

Any help much appreciated

Thank you




--
View this message in context:
http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p4331089.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 52
Date: Thu, 26 Jan 2012 18:46:20 +0100
From: "Pfrengle Andreas (GS-SI/ENX7)"
<Andreas.Pfrengle@de.bosch.com>
To: "r-help@r-project.org" <r-help@r-project.org>
Subject: [R] 3-parametric Weibull regression
Message-ID:
    <FB30B323E68F834FB7D813F4EAD13E9933A6B487B0@SI-MBX03.de.bosch.com>
Content-Type: text/plain

Hello,

I'm quite new to R and want to make a Weibull-regression with the survival
package. I know how to build my "Surv"-object and how to make a
standard-weibull regression with "survreg".
However, I want to fit a translated or 3-parametric weibull dist to account for
a failure-free time.
I think I would need a new object in survreg.distributions, but I don't know
how to create that correctly. I've tried to inherit it from the
"extreme" value dist, analogous to the implemented weibull-dist like
so:
survreg.distributions$weibull3p$name <- "weibull"
survreg.distributions$weibull3p$dist <- "extreme"
survreg.distributions$weibull3p$trans <- function(y, x0) log(y) + x0
survreg.distributions$weibull3p$dtrans <- function(y) 1/y
survreg.distributions$weibull3p$itrans <- function(x, x0) exp(x-x0)
I then get a failure by survregDtest(survreg.distributions$weibull3p) that x0 is
missing (since the transformation function is expected to take only one
argument).

Any ideas? Or is there maybe a package somewhere that supports 3-parametric
weibull regression which I missed?

Regards,
Andreas


    [[alternative HTML version deleted]]


------------------------------

Message: 53
Date: Thu, 26 Jan 2012 14:19:10 -0500
From: "R. Michael Weylandt" <michael.weylandt@gmail.com>
To: maxbre <mbressan@arpa.veneto.it>
Cc: r-help@r-project.org
Subject: Re: [R] merge multiple data frames
Message-ID:
    <CAAmySGM1XT3Wm3dK44Yb4N=ceCCXOP-DDaWZkox7Mj8ZSkyETw@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252

I might do something like this:

mergeAll <- function(..., by = "date", all = TRUE) {
  dotArgs <- list(...)
  Reduce(function(x, y)
  merge(x, y, by = by, all = all, suffixes=paste(".", names(dotArgs),
sep = "")),
  dotArgs)}

mergeAll(a = a, b = b, c = c)

str(.Last.value)

You also might be able to set it up to capture names without you
having to put "a = a" etc. using substitute.

On Thu, Jan 26, 2012 at 12:29 PM, maxbre <mbressan@arpa.veneto.it>
wrote:> This is my reproducible example (three data frames: a, b, c)
>
> a<-structure(list(date = structure(1:6, .Label =
c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(0.799401398190476, 0, 0,
0.0100453950434783,
> 0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
> 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
> 110.425185027727), no = c(48.8543691516522, 88.7197448817391,
> 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
> ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
> 49.2986070321739, 46.5978461731818), co = c(0.618856168125,
> 0.996593475083333,
> 0.666987416083333, 0.383437311166667, 0.281604928875, 0.155383408913043
> ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
> 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa >
c(167.541954974667,
> 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
> 173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
> 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
> ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
> 319.753674830936, 33.8713897347193, 228.368119533759), temp >
c(7.91972825883333,
> 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
> 3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
> 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
> )), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class > "data.frame")
>
>
> b<-structure(list(date = structure(1:6, .Label =
c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(0, 0, 0, 0, 0, 0), nox =
c(13.74758511,
> 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
> ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
> 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
> 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
> 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
> 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
> ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
> 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
> 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
> 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
> 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
> 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
> 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
> )), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class > "data.frame")
>
>
> c<-structure(list(date = structure(1:6, .Label =
c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(2.617839247, 0, 0, 0.231044086,
> 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142,
> 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498,
> 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447,
> 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
> ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745,
> 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989,
> 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056,
> 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128,
> 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
> ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381,
24.0739884380921,
> 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029,
> 4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116,
> 96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038
> )), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class > "data.frame")
>
>
> Given the three data frames ?a?, ?b? and ?c?, I need to merge them all by
> the common field ?date?.
> The following attempt is working fine but?
>
> # code start
>
all<-merge(a,b,by="date",suffixes=c(".a",".b"),all=T)
> all<-merge(all,c,by="date",all=T)
> # code end
>
> ?I would like to possibly do it in just ?one shot? (a generalisation of the
> code for a more complex case of handling many data frames) also by
assigning
> proper suffixes to each variable (not possible with the previous code
> snippet)
>
> Then I also try a different approach with the use of the library reshape
and
> the function ?merge_all? but?
>
> # code start
> library("reshape")
> all.new<-merge_all(a, b, c, by="date", all=T,
suffixes=c(".a",".b",".c"))
> # code end
>
> ?I got the following error message:
> error in merge in merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
> :
> ?formal argument "all" associated to different argument passed
> (a free translation from italian)
>
> My question is:
> how to accomplish the merging of multiple data frames with all the same
> variable names and by a single id field with the need of ?keeping track? of
> the original data frame by means of the use of suffixes appended to new
> variables?
>
> Any help much appreciated
>
> Thank you
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p4331089.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 54
Date: Thu, 26 Jan 2012 19:20:57 +0000
From: Patrick Burns <pburns@pburns.seanet.com>
To: Ajay Askoolum <aa2e72e@yahoo.co.uk>
Cc: R General Forum <r-help@r-project.org>
Subject: Re: [R] What does [[1]] mean?
Message-ID: <4F21A799.8090407@pburns.seanet.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Here is a page that should help:

http://www.burns-stat.com/pages/Tutor/more_R_subscript.html

Also Circle 8.1.54 of 'The R Inferno'
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf


On 26/01/2012 18:27, Ajay Askoolum wrote:> I know that [] is used for indexing.
> I know that [[]] is used for reference to a property of a COM object.
>
> I cannot find any explanation of what [[1]] does or, more pertinently,
where it should be used.
>
> Thank you.
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Patrick Burns
pburns@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')



------------------------------

Message: 55
Date: Fri, 27 Jan 2012 09:03:51 +1300
From: Thomas Lumley <tlumley@uw.edu>
To: Chris Wallace <chris.wallace@cimr.cam.ac.uk>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] null distribution of binom.test p values
Message-ID:
    <CAJ55+dJrgSMridC3o0+fa=b9BpT8hzqt6kXThBy7Y2JvAfeNcg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Jan 27, 2012 at 5:36 AM, Chris Wallace
<chris.wallace@cimr.cam.ac.uk> wrote:> Greg, thanks for the reply.
>
[[elided Yahoo spam]]>
> I ran a longer simulation, 100,000 reps. ?The size of the test is
> consistently too small (see below) and the histogram shows increasing bars
> even within the parts of the histogram with even bar spacing. ?See
> https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png
>
> y<-sapply(1:100000, function(i,n=100)
>
?binom.test(sum(rnorm(n)>0),n,p=0.5,alternative="two")$p.value)
> mean(y<0.01)
> # [1] 0.00584
> mean(y<0.05)
> # [1] 0.03431
> mean(y<0.1)
> # [1] 0.08646
>
> Can that really be due to the discreteness of the distribution?
Yes.  All so-called exact tests tend to be conservative due to
discreteness, and there's quite a lot of discreteness in the tails

The problem is far worse for Fisher's exact test, and worse still for
Fisher's other exact test (of Hardy-Weinberg equilibrium --
http://www.genetics.org/content/180/3/1609.full).

You don't need to rely on finite-sample simulations here: you can
evaluate the level exactly.  Using binom.test() you find that the
rejection regions are y<=39 and y>=61, so the level at nominal 0.05
is:> pbinom(39,100,0.5)+pbinom(60,100,0.5,lower.tail=FALSE)[1] 0.0352002
agreeing very well with your 0.03431

At nominal 0.01 the exact level is> pbinom(36,100,0.5)+pbinom(63,100,0.5,lower.tail=FALSE)[1] 0.006637121
and at 0.1 it is> pbinom(41,100,0.5)+pbinom(58,100,0.5,lower.tail=FALSE)[1] 0.08862608

Your result at nominal 0.01 is a bit low, but I think that's bad luck.
When I ran your code I got 0.00659 for the estimated level at nominal
0.01, which matches the exact calculations very well


Theoreticians sweep this under the carpet by inventing randomized
tests, where you interpolate a random p-value between the upper and
lower values from a discrete distribution.  It's a very elegant idea
that I'm glad to say I haven't seen used in practice.

     -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



------------------------------

Message: 56
Date: Thu, 26 Jan 2012 12:27:54 -0800
From: Brian Diggs <diggsb@ohsu.edu>
To: Berend Hasselman <bhh@xs4all.nl>
Cc: R-help@r-project.org
Subject: Re: [R] aplpy recursive function on a list
Message-ID: <4F21B74A.9000102@ohsu.edu>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed

On 1/26/2012 10:33 AM, Berend Hasselman wrote:>
> On 26-01-2012, at 19:10, Berend Hasselman wrote:
>
>>
>> On 26-01-2012, at 17:58, Brian Diggs wrote:
>>
>>> On 1/25/2012 10:09 AM, patzoul wrote:
>>>> I have 2 series of data a,b and I would like to calculate a new
series which
>>>> is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
>>>> How can I do that without using a loop?
>>>
>>> ........
>>
>> I don't think so.
>>
>> a<- c(2,4,3,5)
>> b<- c(1,3,5,7)
>>
>> z<- rep(0,length(a))
>> z[1]<- b[1]
>> for( t in 2:length(a) ) { z[t]<- a[t] * z[t-1] + b[t] }
>> z
>>
>> gives
>>
>> [1]   1   7  26 137
>>
>> and this agrees with a manual calculation.
>>
>> You get a vector of length 5 as result. It should be of length 4 with
your data.
>> If you change the Reduce expression to this
>>
>> u<- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
>>            Map(c, a[-1], b[-1]),
>>            init = b[1], accumulate = TRUE)
>>
>> then you get the correct result.
>>
>>> u
>> [1]   1   7  26 137
You are correct; I had an off-by-one error.  It agreed with my manual
calculation, which also had the same error.
> And the loop especially if byte compiled with cmpfun  appears to be quite a
bit quicker.
>
> Nrep<- 1000
>
> tfrml.loop<- function(a,b) {
>      z<- rep(0,length(a))
>      z[1]<- b[1]
>      for( t in 2:length(a) ) {
>          z[t]<- a[t] * z[t-1] + b[t]
>      }
>
>      z
> }
>
> tfrml.rdce<- function(a,b) {
>      u<- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
>                 Map(c, a[-1], b[-1]),
>                 init = b[1], accumulate = TRUE)
>      u
> }
>
> library(compiler)
> tfrml.loop.c<- cmpfun(tfrml.loop)
> tfrml.rdce.c<- cmpfun(tfrml.rdce)
>
> z.loop<- tfrml.loop(a,b)
> z.rdce<- tfrml.rdce(a,b)
> all.equal(z.loop, z.rdce)
>
> library(rbenchmark)
>
> N<- 500
> set.seed(1)
> a<- runif(N)
> b<- runif(N)
>
> benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b),
tfrml.rdce.c(a,b),
>                  replications=Nrep, columns=c("test",
"replications", "elapsed"))
>
>                  test replications elapsed
> 1   tfrml.loop(a, b)         1000   2.665
> 3 tfrml.loop.c(a, b)         1000   0.554
> 2   tfrml.rdce(a, b)         1000   4.082
> 4 tfrml.rdce.c(a, b)         1000   3.143
>
> Berend
>
> R-2.14.1 (32-bits), Mac OS X 10.6.8.
The timings are interesting; I would not have expected the loop to have 
outperformed Reduce, or at least not by that much.  The loop also 
benefits much more from compiling, which is not as surprising since 
Reduce and Map are already compiled. I would guess the difference is due 
to overhead changing the format of the a/b data and being able to 
specialize the code.

I also ran timings for comparison and got qualitatively the same thing:

benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b), 
tfrml.rdce.c(a,b),
           replications=Nrep,
           columns=c("test", "replications",
"elapsed", "relative"),
           order="relative")

                 test replications elapsed relative
3 tfrml.loop.c(a, b)         1000    0.34 1.000000
1   tfrml.loop(a, b)         1000    1.89 5.558824
4 tfrml.rdce.c(a, b)         1000    2.12 6.235294
2   tfrml.rdce(a, b)         1000    2.79 8.205882

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
(Windows 7)


-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



------------------------------

Message: 57
Date: Thu, 26 Jan 2012 19:24:04 +0000
From: Jorge Molinos <JGARCIAM@tcd.ie>
To: "r-help@R-project.org" <r-help@R-project.org>
Subject: [R] Calculate a function repeatedly over sections of a ts
    object
Message-ID:
    <45898BAA78D93940AD0B50066D38635B02EA6F091F6E@GOMAIL.college.tcd.ie>
Content-Type: text/plain; charset="Windows-1252"


Hi,

I want to apply a function (in my case SDF; package ?sapa?) repeatedly over
discrete sections of a daily time series object by sliding a time window of
constant length (e.g. 10 consecutive years or 1825 days) over the entire ts at
increments of 1 time unit (e.g. 1 year or 365 days). So for example, the first
SDF would be calculated for the daily values of my variable recorded between
years 1 to 5, SDF2 to those for years 2 to 6 and so on until the total length of
the series is covered. How can I implement this into a R script? Any help is
much appreciated.

Jorge


------------------------------

Message: 58
Date: Thu, 26 Jan 2012 15:16:07 -0500
From: Max Brondfield <max.brondfield@gmail.com>
To: r-help@r-project.org
Subject: [R] Quality of fit statistics for NLS?
Message-ID:
    <CADu+jDr4bub0kDy=Bzftq+H-CCxd8DTh1yQYVuAsu=+phR9q3g@mail.gmail.com>
Content-Type: text/plain

Dear all,
I am trying to analyze some non-linear data to which I have fit a curve of
the following form:

dum <- nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000))

I am wondering if there is any way to determine meaningful quality of fit
statistics from the nls function?

A summary yields highly significant p-values, but it is my impression that
these are questionable at best given the iterative nature of the fit:
> summary(dum)
Formula: y ~ (A + (B * x)/(C + x))

Parameters:
   Estimate Std. Error t value Pr(>|t|)
A   388.753      4.794  81.090  < 2e-16 ***
B   115.215      5.006  23.015  < 2e-16 ***
C 20843.832   4646.937   4.485 1.12e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.25 on 245 degrees of freedom

Number of iterations to convergence: 4
Achieved convergence tolerance: 2.244e-06


Is there any other means of determining the quality of the curve fit? I
have tried applying confidence intervals using confint(dum), but these
[[elided Yahoo spam]]
-Max

    [[alternative HTML version deleted]]



------------------------------

Message: 59
Date: Thu, 26 Jan 2012 22:49:13 +0200
From: Tal Galili <tal.galili@gmail.com>
To: Uwe Ligges <ligges@statistik.tu-dortmund.de>
Cc: r-help@r-project.org, Barry Rowlingson
    <b.rowlingson@lancaster.ac.uk>
Subject: Re: [R] Why was the ?doSMP? package removed from CRAN?
Message-ID:
    <CANdJ3dVuJCKRgrec+5-piyOGMQH-KXiBXMVOrR3Z29pCyE1sXg@mail.gmail.com>
Content-Type: text/plain

Thank you for the explanation Uwe.

With regards,
Tal


----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Thu, Jan 26, 2012 at 10:42 AM, Uwe Ligges <
ligges@statistik.tu-dortmund.de> wrote:
>
>
> On 26.01.2012 09:39, Barry Rowlingson wrote:
>
>> On Thu, Jan 26, 2012 at 8:02 AM, Uwe Ligges
>> <ligges@statistik.tu-dortmund.**de
<ligges@statistik.tu-dortmund.de>>
>>  wrote:
>>
>>>
>>>
>>> On 25.01.2012 22:20, Tal Galili wrote:
>>>
>>
>>  Does any one know the reason for this?
>>>> Is this a technical or a legal (e.g: license) issue?
>>>>
>>>
>>>
>>>
>>> If legal issues were the reason, you had not found it in the
archives
>>> anymore.
>>>
>>>
>>  Licensing can change - as a copyright holder I could decide the next
>> release of my package is going to be under a proprietary license that
>> doesn't allow redistribution. Any code already released under
>> something like the GPL can't be forcibly removed, and people can
make
>> new forks from those, but if I'm the only person writing it and I
>> decide to change the license and the latest free version isn't
>> compatible with the current version of R then I'd expect to see the
>> old versions in the archives and no version for the latest version of
>> R.
>>
>>  Last checkin at R-forge was only six weeks ago, and 1.0-3 installs
>> fine on my latest R:
>>
>> 
https://r-forge.r-project.org/**scm/?group_id=950<https://r-forge.r-project.org/scm/?group_id=950>
>>
>> I suspect they just haven't pushed it to R-forge yet. Cockup before
>> conspiracy.
>>
>> Barry
>>
>
> Before people start with even more speculations: Reason is that the
> revoIPC package does no compile on modern versions of gcc and hence has
> been archived, doSMP depends on it and therefore went to the archives as
> well.
>
> Uwe Ligges
>
    [[alternative HTML version deleted]]



------------------------------

Message: 60
Date: Thu, 26 Jan 2012 16:00:21 -0500
From: "R. Michael Weylandt" <michael.weylandt@gmail.com>
To: Jorge Molinos <JGARCIAM@tcd.ie>
Cc: "r-help@R-project.org" <r-help@r-project.org>
Subject: Re: [R] Calculate a function repeatedly over sections of a ts
    object
Message-ID:
    <CAAmySGMRk9k9dZayQfDxTWpa9B-hga9RRXX_nDr-5im3sOaOKg@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252

I'm not sure if it's easily doable with a ts class, but the rollapply
function in the zoo package will do this easily. (Also, I find zoo to
be a much more natural time-series workflow than ts so it might make
the rest of your life easier as well)

Michael

On Thu, Jan 26, 2012 at 2:24 PM, Jorge Molinos <JGARCIAM@tcd.ie>
wrote:>
> Hi,
>
> I want to apply a function (in my case SDF; package ?sapa?) repeatedly over
discrete sections of a daily time series object by sliding a time window of
constant length (e.g. 10 consecutive years or 1825 days) over the entire ts at
increments of 1 time unit (e.g. 1 year or 365 days). So for example, the first
SDF would be calculated for the daily values of my variable recorded between
years 1 to 5, SDF2 to those for years 2 to 6 and so on until the total length of
the series is covered. How can I implement this into a R script? Any help is
much appreciated.
>
> Jorge
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 61
Date: Thu, 26 Jan 2012 16:05:34 -0500
From: Gabor Grothendieck <ggrothendieck@gmail.com>
To: "R. Michael Weylandt" <michael.weylandt@gmail.com>
Cc: "r-help@R-project.org" <r-help@r-project.org>,    Jorge
Molinos
    <JGARCIAM@tcd.ie>
Subject: Re: [R] Calculate a function repeatedly over sections of a ts
    object
Message-ID:
    <CAP01uRkVZkvUoH-4u282=2uWvbtoxnki6Zui9qB76gzvjuyHxQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Jan 26, 2012 at 4:00 PM, R. Michael Weylandt
<michael.weylandt@gmail.com> wrote:> I'm not sure if it's easily doable with a ts class, but the
rollapply
> function in the zoo package will do this easily. (Also, I find zoo to
> be a much more natural time-series workflow than ts so it might make
> the rest of your life easier as well)
>
rollapply does have ts and default methods in addition to a zoo method.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



------------------------------

Message: 62
Date: Thu, 26 Jan 2012 13:17:52 -0800 (PST)
From: maxbre <mbressan@arpa.veneto.it>
To: r-help@r-project.org
Subject: Re: [R] merge multiple data frames
Message-ID: <1327612672641-4331830.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

thank you for your reply
I'll study and test your code (which is a bit obscure to me up to now);
by the way do you think that "merge_all" is a wrong way to hit? 
thanks again
m

--
View this message in context:
http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p4331830.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 63
Date: Thu, 26 Jan 2012 13:25:56 -0800 (PST)
From: Jhope <jeanwaijang@gmail.com>
To: r-help@r-project.org
Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
    interactions    and    unique combinations?
Message-ID: <1327613156182-4331848.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

I ask the question about when to stop adding another variable even though it
lowers the AIC because each time I add a variable the AIC is lower. How do I
know when the model is a good fit? When to stop adding variables, keeping
the model simple?

Thanks, J

--
View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 64
Date: Thu, 26 Jan 2012 15:38:38 -0600
From: Jean V Adams <jvadams@usgs.gov>
To: johannes rara <johannesraja@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] How to remove rows representing concurrent sessions
    from    data.frame?
Message-ID:
    <OF287DEF50.0B7E8DFB-ON86257991.0076AAE0-86257991.0076E8CE@usgs.gov>
Content-Type: text/plain

johannes rara wrote on 01/26/2012 02:46:57 AM:
> I have a dataset like this (dput for this below) which represents user
> computer sessions:
> 
>    username          machine               start                 end
> 1     user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27
> 2     user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16
> 3     user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17
> 4     user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15
> 5     user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16
> 6     user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23
> 7     user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22
> 8     user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59
> 9     user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43
> 10    USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53
> 11    USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22
> 12    USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43
> 13    USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38
> 14    USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43
> 15    USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47
> >
> 
> There may be serveral concurrent sessions for same username from the
> same computer. How can I remove those rows so that only one sessions
> is left for this data? I have no idea how to do this, maybe using
> difftime?
> 
> -J
> 
> structure(list(username = c("user1", "user1",
"user1",
> "user1", "user1", "user1", "user1",
"user1",
> "user1", "USER2", "USER2", "USER2",
"USER2", "USER2", "USER2"
> ), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("D5599.domain.com", 
"D5645.domain.com",> "D5682.domain.com", "D5686.domain.com",
"D5694.domain.com",
> "D5696.domain.com",
> "D5772.domain.com", "D5772.domain.com",
"D5847.domain.com",
> "D5855.domain.com",
> "D5871.domain.com", "D5927.domain.com",
"D5927.domain.com",
> "D5952.domain.com",
> "D5993.domain.com", "D6012.domain.com",
"D6048.domain.com",
> "D6077.domain.com",
> "D5688.domain.com", "D5815.domain.com",
"D6106.domain.com",
"D6128.domain.com"> ), class = "factor"), start = structure(c(1294040658, 1294040789,
> 1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561,
> 1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580,
> 1297688363), class = c("POSIXct", "POSIXt"), tzone =
""), end > structure(c(1294040847,
> 1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342,
> 1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878,
> 1297688863, 1297689287), class = c("POSIXct",
"POSIXt"), tzone = "")),
> .Names = c("username",
> "machine", "start", "end"), row.names = c(NA,
15L), class ="data.frame")


# rearrange the data, so that there is one "date/time" variable
# and another variable indicates start/end
library(reshape)
df2 <- melt(df)
# sort the data by user, machine, date/time
df3 <- df2[order(df2$username, df2$machine, df2$value), ]
# for each user and machine, 
# keep only the first "start" record and the last "end"
record
first <- function(x) {
    l <- length(x)
    c(1, 1-(x[-1]==x[-l]))
    }
last <- function(x) {
    y <- rev(x)
    l <- length(y)
    rev(c(1, 1-(y[-1]==y[-l])))
    }
df4 <- df3[(df3$variable=="start" & first(df3$variable)) | 
        (df3$variable=="end" & last(df3$variable)), ]
# combine the results
df5 <- cbind(df4[df4$variable=="start", 
        c("username", "machine", "value")], 
        value2=df4$value[df4$variable=="end"])
df5

Jean
    [[alternative HTML version deleted]]



------------------------------

Message: 65
Date: Thu, 26 Jan 2012 13:51:36 -0800
From: Bert Gunter <gunter.berton@gene.com>
To: Max Brondfield <max.brondfield@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] Quality of fit statistics for NLS?
Message-ID:
    <CACk-te2FQwLfCB0pFNtcOacND7QoXSk2bg4zyy8p_2NPYfuDFw@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252

Inline below.

-- Bert

On Thu, Jan 26, 2012 at 12:16 PM, Max Brondfield
<max.brondfield@gmail.com> wrote:> Dear all,
> I am trying to analyze some non-linear data to which I have fit a curve of
> the following form:
>
> dum <- nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000))
>
> I am wondering if there is any way to determine meaningful quality of fit
> statistics from the nls function?
>
> A summary yields highly significant p-values, but it is my impression that
> these are questionable at best given the iterative nature of the fit:No. They are questionable primarily because there is no clear null
model. They are based on profile likelihoods (as ?confint tells you),
which may or may not be what you want for "goodness of fit."

One can always get "goodness of fit" statistics but the question in
nonlinear models is: goodness of fit with respect to what? So the
answer to your question is: if you know what you're doing, certainly.
Otherwise, find someone who does.



>
>> summary(dum)
>
> Formula: y ~ (A + (B * x)/(C + x))
>
> Parameters:
> ? Estimate Std. Error t value Pr(>|t|)
> A ? 388.753 ? ? ?4.794 ?81.090 ?< 2e-16 ***
> B ? 115.215 ? ? ?5.006 ?23.015 ?< 2e-16 ***
> C 20843.832 ? 4646.937 ? 4.485 1.12e-05 ***
> ---
> Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 18.25 on 245 degrees of freedom
>
> Number of iterations to convergence: 4
> Achieved convergence tolerance: 2.244e-06
>
>
> Is there any other means of determining the quality of the curve fit? I
> have tried applying confidence intervals using confint(dum), but these
[[elided Yahoo spam]]> -Max
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



------------------------------

Message: 66
Date: Thu, 26 Jan 2012 13:56:22 -0800
From: Bert Gunter <gunter.berton@gene.com>
To: Jhope <jeanwaijang@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
    interactions and unique combinations?
Message-ID:
    <CACk-te0wQp0Puc9r8aT_nXbKLmNehhEOXCCfBi_8EvWEcmxe=A@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Simple question. 8 million pages in the statistical literature of
answers. What, indeed, is the secret to life?

Post on a statistical help list (e.g. stats.stackexchange.com). This
has almost nothing to do with R. Be prepared for an onslaught of often
conflicting "wisdom."

-- Bert

On Thu, Jan 26, 2012 at 1:25 PM, Jhope <jeanwaijang@gmail.com>
wrote:> I ask the question about when to stop adding another variable even though
it
> lowers the AIC because each time I add a variable the AIC is lower. How do
I
> know when the model is a good fit? When to stop adding variables, keeping
> the model simple?
>
> Thanks, J
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



------------------------------

Message: 67
Date: Thu, 26 Jan 2012 14:15:10 -0800 (PST)
From: sjkimble <sjkimble@gmail.com>
To: r-help@r-project.org
Subject: Re: [R] warning In fun(...) : no DISPLAY variable so Tk is
    not available
Message-ID: <1327616110335-4331992.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii


Nevil Amos wrote:> I am getting the above warning following loading of Geneland 3.1.5 on 
> unix , while a simple plot sends output to the pdf file ( see attached 
> code) no output results from Geneland functions, resulting in empty pdf 
> files
>  
That message is saying it can't find an X11 server for Tcltk to use.  If 
you do have X11 available, then just set the DISPLAY environment 
variable in the normal way; if you don't, then you're not going to be 
able to use that package on Unix.  (The Windows implementation is 
self-contained, so it should work there.)

I am having the same error message for Geneland, but I am implementing on a
Windows machine.

--
View this message in context:
http://r.789695.n4.nabble.com/warning-In-fun-no-DISPLAY-variable-so-Tk-is-not-available-tp2235656p4331992.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 68
Date: Thu, 26 Jan 2012 23:46:11 +0100
From: Petr Savicky <savicky@cs.cas.cz>
To: r-help@r-project.org
Subject: Re: [R] function for grouping
Message-ID: <20120126224611.GA24113@cs.cas.cz>
Content-Type: text/plain; charset=utf-8

On Thu, Jan 26, 2012 at 08:29:22AM -0800, yan wrote:> what if I don't need to store the combination results, I just want to
get the
> combination result for a given row.
> e.g. for the 5 elements divided into 3 groups , and if I give another
> parameter which is the row number of the results, in petr's example,
say if
> I set row number to 1, then I get 1,2,3,1,1.
> 
> So I need a systematic way of generating the combination, but I don't
need
> all the combinations in one go.
Hi.

The following function partitioning() computes m-th
partitioning of n elements into "groups" groups in the
lexicographic order.

  partitioning <- function(n, groups, m)
  {
      a <- matrix(nrow=n+1, ncol=groups+2)
      a[, groups + 2] <- 0
      a[1, 0:groups + 1] <- 0
      a[1, groups + 1] <- 1
      for (i in seq.int(from=2, length=n)) {
          for (j in 0:groups) {
              a[i, j + 1] <- j*a[i - 1, j + 1] + a[i - 1, j + 2]
          }
      }
      available <- a[n + 1, 1]
      stopifnot(m <= available)
      x <- rep(NA, times=n)
      for (k in seq.int(length=n)) {
          free <- max(x[seq.int(length=k-1)], 0)
          for (j in seq.int(length=min(free + 1, groups))) {
              subtree <- a[n - k + 1, max(free, j) + 1]
              if (subtree < m) {
                  available <- available - subtree
                  m <- m - subtree
              } else {
                  x[k] <- j
                  break
              }
          }
      }
      x
  }
  
  # the previous code for comparison modified to
  # produce partitionings in the same order
  
  check.row <- function(x, k)
  {
      y <- unique(x)
      length(y) == k && all(y == 1:k)
  }
  
  allPart <- as.matrix(rev(expand.grid(x5=1:3, x4=1:3, x3=1:3, x2=1:2,
x1=1)))
  ok <- apply(allPart, 1, check.row, k=3)
  allPart <- allPart[ok, ]
  
  # the comparison itself
  
  for (m in seq.int(length=nrow(allPart)+1)) {
      x <- partitioning(5, 3, m)
      cat(m, x, x - allPart[m, ], "\n")
  }

  # the output is 

  1 1 1 1 2 3 0 0 0 0 0 
  2 1 1 2 1 3 0 0 0 0 0 
  3 1 1 2 2 3 0 0 0 0 0 
  4 1 1 2 3 1 0 0 0 0 0 
  5 1 1 2 3 2 0 0 0 0 0 
  6 1 1 2 3 3 0 0 0 0 0 
  7 1 2 1 1 3 0 0 0 0 0 
  8 1 2 1 2 3 0 0 0 0 0 
  9 1 2 1 3 1 0 0 0 0 0 
  10 1 2 1 3 2 0 0 0 0 0 
  11 1 2 1 3 3 0 0 0 0 0 
  12 1 2 2 1 3 0 0 0 0 0 
  13 1 2 2 2 3 0 0 0 0 0 
  14 1 2 2 3 1 0 0 0 0 0 
  15 1 2 2 3 2 0 0 0 0 0 
  16 1 2 2 3 3 0 0 0 0 0 
  17 1 2 3 1 1 0 0 0 0 0 
  18 1 2 3 1 2 0 0 0 0 0 
  19 1 2 3 1 3 0 0 0 0 0 
  20 1 2 3 2 1 0 0 0 0 0 
  21 1 2 3 2 2 0 0 0 0 0 
  22 1 2 3 2 3 0 0 0 0 0 
  23 1 2 3 3 1 0 0 0 0 0 
  24 1 2 3 3 2 0 0 0 0 0 
  25 1 2 3 3 3 0 0 0 0 0 
  Error: m <= available is not TRUE

The zeros confirm that partitioning(5, 3, m) is exactly
m-th row of allPart. The error at the end shows that
the function partitioning() recognizes the situation
that m is larger than the number of partitionings with
the given parameters.

The idea of the algorithm is that it searches through
the tree of all encodings of the partitionings and
the precomputed matrix a[,] allows to determine the
number of leaves under any node of the tree. Using
this, the algorithm can choose the right branch at
each node.

Hope this helps.

Petr.



------------------------------

Message: 69
Date: Thu, 26 Jan 2012 19:17:37 -0500
From: Axel Urbiz <axel.urbiz@gmail.com>
To: R-help@r-project.org
Subject: [R] Conditional cumulative sum
Message-ID:
    <CAAyVsXJ-p=JqNcdhYoOJKOShBFA1NSkSNj-ZtjkGSTSfhhN8Bw@mail.gmail.com>
Content-Type: text/plain

Dear List,

I'll appreciate your help on this. I'm trying to create a variable as in
"cumsum_y.cond1" below, which should compute the cumulative sum of
"y"
conditional on the value of cond==1.

set.seed(1)
d <- data.frame(y= sample(c(0,1), 10, replace= T),
                cond= sample(c(0,1), 10, replace= T))


   y cond  cumsum_y.cond1
1  0    0     0
2  0    0     0
3  1    1     1
4  1    0     1
5  0    1     1
6  1    0     1
7  1    1     2
8  1    1     3
9  1    0     3
10 0    1     3

Thank you.

Regards,
Axel.

    [[alternative HTML version deleted]]



------------------------------

Message: 70
Date: Thu, 26 Jan 2012 17:30:12 -0800 (PST)
From: Pete Brecknock <Peter.Brecknock@bp.com>
To: r-help@r-project.org
Subject: Re: [R] Conditional cumulative sum
Message-ID: <1327627812887-4332344.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii


Axel Urbiz wrote> 
> Dear List,
> 
> I'll appreciate your help on this. I'm trying to create a variable
as in
> "cumsum_y.cond1" below, which should compute the cumulative sum
of "y"
> conditional on the value of cond==1.
> 
> set.seed(1)
> d <- data.frame(y= sample(c(0,1), 10, replace= T),
>                 cond= sample(c(0,1), 10, replace= T))
> 
> 
>    y cond  cumsum_y.cond1
>  1  0    0     0
>  2  0    0     0
>  3  1    1     1
>  4  1    0     1
>  5  0    1     1
>  6  1    0     1
>  7  1    1     2
>  8  1    1     3
>  9  1    0     3
> 10 0    1     3
> 
> Thank you.
> 
> Regards,
> Axel.
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

is this what you are looking for ...

set.seed(1)
d <- data.frame(y= sample(c(0,1), 10, replace= T),
                cond= sample(c(0,1), 10, replace= T)) 

d$cumsum_y.cond1 = cumsum(d$y & d$cond)

# Output
   y cond cumsum_y.cond1
1  0    0              0
2  0    0              0
3  1    1              1
4  1    0              1
5  0    1              1
6  1    0              1
7  1    1              2
8  1    1              3
9  1    0              3
10 0    1              3

HTH

Pete

--
View this message in context:
http://r.789695.n4.nabble.com/Conditional-cumulative-sum-tp4332254p4332344.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 71
Date: Thu, 26 Jan 2012 20:18:04 -0800 (PST)
From: Rui Barradas <ruipbarradas@sapo.pt>
To: r-help@r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
    improvement
Message-ID: <1327637884058-4332529.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

Hello, again.

I now have a more complete answer to your points.
> 1. It's too long. My understanding is that skilled programmers can
usually
> or often complete tasks like this in a few lines.
It's not very shorter but it's more readable. (The programmer is always
suspect)
> 2. It's not vectorized. I started out trying to do something that was
> vectorized
> but ran into problems with the strsplit function. I looked at the help
> file and
> it appears this function will only accept a single character vector. 
All but one instructions are vectorized. And the one that is not only loops
for
a few column names.
Use 'unlist' on the 'strsplit' function's output to give a
vector.
> 4. There's no way to specify names for input and output data. I imagine
> this would
> be fairly easy to specify this in the arguments to a function but am not
> sure how to
> incorporate it into a for loop.
You can now specify any matrix or data.frame, but it will only process the
columns with
dates. (This is not true, it will process anything with a '/' on it. Pay
attention.)

Near the beginning of your code include the following:

> TestDates <- data.frame(scan(connection,
>                 list(Patient=0, birthDT="",
diagnosisDT="",
> metastaticDT="")))
>
> close(connection)
TDSaved <- TestDates    # to avoid reopenning the connection

And then, after all of it,

fun <- function(Dat){
    f <- function(jj, DF){
        x <- as.character(DF[, jj])
        x <- unlist(strsplit(x, "/"))
        n <- length(x)
        M <- x[seq(1, n, 3)]
        D <- x[seq(2, n, 3)]
        Y <- x[seq(3, n, 3)]
        D[D == "un"] <- "15"
        Y <- ifelse(nchar(Y) > 4 | Y < 1900, NA, Y)
        x <- as.Date(paste(Y, M, D, sep="-"),
format="%Y-%m-%d")
        if(any(is.na(x)))
            cat("Warning: Invalid date values in", jj, "\n",
                as.character(DF[is.na(x), jj]), "\n")
        x
    }
    colinx <- colnames(as.data.frame(Dat))
    Dat <- data.frame(sapply(colinx, function(j) f(j, Dat)))
    for(i in colinx) class(Dat[[i]]) <- "Date"
    Dat
}

TD <- TDSaved

TD[, DateNames] <- fun(TD[, DateNames])

TD

Had fun in writing it.
Good luck.

Rui Barradas



--
View this message in context:
http://r.789695.n4.nabble.com/Checking-for-invalid-dates-Code-works-but-needs-improvement-tp4324356p4332529.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 72
Date: Thu, 26 Jan 2012 22:59:04 -0600
From: Michael <comtech.usa@gmail.com>
To: r-help <R-help@stat.math.ethz.ch>
Subject: [R] Is there a R command for testing the difference of two
    liear    regressions?
Message-ID:
    <CAPNjSFbifNH2k03dtc3H5kfJ=RXrqgrtyaibOaC7-T8eNBtXAA@mail.gmail.com>
Content-Type: text/plain

Hi al,

I am looking for a R command to test the difference of two linear
regressoon betas.

Lets say I have data x1, x2...x(nï¼‹1).
beta1 is obtained from regressing x1 to xn onto 1 to n.

beta2 is obtained from regressing x2 to x(nï¼‹1) onto 1 to n.

Is there a way in R to test whether beta1 and beta2 are statistically
different?

Thanks a lot!

    [[alternative HTML version deleted]]



------------------------------

Message: 73
Date: Fri, 27 Jan 2012 00:29:47 -0500
From: Mark Leeds <markleeds2@gmail.com>
To: Michael <comtech.usa@gmail.com>
Cc: r-help <R-help@stat.math.ethz.ch>
Subject: Re: [R] Is there a R command for testing the difference of
    two    liear regressions?
Message-ID:
    <CAHz+bWZXiKXhKuXj1iK6pOvpFs_KjvPze3Z_LNv8eVna3_=Zpw@mail.gmail.com>
Content-Type: text/plain

I don't know what is available in R but a relevant paper that provides a
test is by

Hotelling, H ( September, 1940 )

"The Selection of Variates For Use in Prediction With Some Comments On The
General Problem of Nuisance Parameters".

Annals of Mathematical Statistics, 11, 271-283.




On Thu, Jan 26, 2012 at 11:59 PM, Michael <comtech.usa@gmail.com> wrote:
> Hi al,
>
> I am looking for a R command to test the difference of two linear
> regressoon betas.
>
> Lets say I have data x1, x2...x(nï¼‹1).
> beta1 is obtained from regressing x1 to xn onto 1 to n.
>
> beta2 is obtained from regressing x2 to x(nï¼‹1) onto 1 to n.
>
> Is there a way in R to test whether beta1 and beta2 are statistically
> different?
>
> Thanks a lot!
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
    [[alternative HTML version deleted]]



------------------------------

Message: 74
Date: Thu, 26 Jan 2012 21:00:38 -0800 (PST)
From: ryanfuller <ryanfuller@gmail.com>
To: r-help@r-project.org
Subject: [R] multiple column comparison
Message-ID: <1327640438261-4332604.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

Hello, 
I have a very large content analysis project, which I've just begun to
collect training data on. I have three coders, who are entering data on up
to 95 measurements. Traditionally, I've used Excel to check coder agreement
(e.g., percentage agreement), by lining up each coder's measurements
side-by-side, creating a new column with the results using if statements.
That is, if (a=b, 1, 0). With this many variables, I am clearly interested
in something that I don't have to create manually every time I check
percentage agreement for coders. 

The data are set up like this: 

ID        CODER V1  V2   V3   V4 ... V95
ID1      C1         y      int   doc  y
ID2      C1         y      ext   doc  y
ID1      C2        n        int  doc  y
ID2      C2        n        int  doc  y
ID1     C3         n         int  doc  y
ID2     C3         n         int  doc  y

I would like to write a script to do the following:
For each variable compare each pair of coders using if statements (e.g., if
C1.V1.==C1.V2, 1, 0)

ID        C1.V1  C2.V1 C3.V1
ID1       y           y           y 
ID2      y            y           y      

For each coding pair, enter the resulting 1s and 0s into a new column. 

The new column name would reflect the results of the comparison, such as
C1.C2.V1

I'd ideally like to create this so that it can handle any number of
variables and any number of coders. 

I appreciate any thoughts, help, and pointers on this. 

Thanks in advance. 

Best,
Ryan Fuller
Doctoral Candidate, Communication
Graduate Student Researcher, Carsey-Wolf Center
http://carseywolf.ucsb.edu
University of California, Santa Barbara



--
View this message in context:
http://r.789695.n4.nabble.com/multiple-column-comparison-tp4332604p4332604.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 75
Date: Thu, 26 Jan 2012 22:42:01 -0800 (PST)
From: womak <womak@pochta.ru>
To: r-help@r-project.org
Subject: Re: [R] Clusters: How to build a "Amalgamation Schedule"
    (sequence of jointing )?
Message-ID: <1327646521911-4332721.post@n4.nabble.com>
Content-Type: text/plain; charset=UTF-8

http://r.789695.n4.nabble.com/file/n4332721/???????1.jpg 
Ultimately, I want, when I will set the limits of height's variation from
*min* to *max*, get into the file a set of clusters (*A, B, C, ... ..*) in
the form of:

l1; 32, 33, 34, 30, 31, 55, 60, 58, 59, 57, 61
l2; 50, 51, 52, 54, 47, 53, 48, 49, 28, 6, 39, 4, 27, 3, 1, 25 2, 26
l3; 41, 44, 43, 45, 40, 46, 8, 13, 9, 15, 11, 35, 7, 14, 10, 12, 36
l4 ... ...
... ...
Lm
Where *min*  / l1, l2, l3 ..... lm /  *max*

/l1,l2,l3/ - the height, /number/ -  case's number in the original file data

--
View this message in context:
http://r.789695.n4.nabble.com/Clusters-How-to-build-a-Amalgamation-Schedule-sequence-of-jointing-tp4319741p4332721.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 76
Date: Fri, 27 Jan 2012 03:03:29 -0500
From: Stephanie Cooke <cooke.stephanie@gmail.com>
To: r-help@r-project.org
Subject: [R] Placing a Shaded Box on a Plot
Message-ID:
    <CAKp4H6zp=JMQNHvBYQ=msZBOBa99DkgKF0MchBhbEzFKa6CiJQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hello,

I would like to place shaded boxes on different areas of a
phylogenetic tree plot. Since I can not determine how to find axes on
the phylogenetic tree plot I am not able to place the box over certain
areas. Below is example code for the shaded box that I have tried to
use, and the first four values specify the position.

rect(110, 400, 135, 450, col="grey", border="transparent")

Any suggestions on how to place a shaded box to highlight certain
areas of a plot I would greatly appreciate.



------------------------------

Message: 77
Date: Fri, 27 Jan 2012 09:03:52 +0100
From: Rub?n Roa <rroa@azti.es>
To: Bert Gunter <gunter.berton@gene.com>
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
    interactions and    unique combinations?
Message-ID:
    <5CD78996B8F8844D963C875D3159B94A030EC466@DSRCORREO.azti.local>
Content-Type: text/plain; charset="iso-8859-1"

-----Mensaje original-----
De: Bert Gunter [mailto:gunter.berton@gene.com] 
Enviado el: jueves, 26 de enero de 2012 21:20
Para: Rub?n Roa
CC: Ben Bolker; Frank Harrell
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and
unique combinations?

On Wed, Jan 25, 2012 at 11:39 PM, Rub?n Roa <rroa@azti.es>
wrote:> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at
stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model
outcompetes which in terms of predictive ability by -2AIC or more. Just sorting
them by the AIC does not give you that if no model is better than the next best
by less than 2AIC.
> Third, I was not implying that AIC differences play the role of
significance tests. I agree with you that model selection is better not
understood as a proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If
one is bent on selecting just _the model_ then I check numerical optimization
diagnostics such as size of gradients, KKT criteria, and other issues such as
standard errors of parameter estimates and the correlation matrix of parameter
estimates.
-- And the mathematical basis for this claim is ....  ??

--
Ruben: In my area of work (building/testing/applying mechanistic nonlinear
models of natural and economic phenomena) the issue of numerical optimization is
a very serious one. It is often the case that a really good looking model does
not converge properly (that's why ADMB is so popular among us). So if the
AIC is inconclusive, but one AIC-tied model yields reasonably looking standard
errors and low pairwise parameter estimates correlations, while the other wasn?t
even able to produce a positive definite Hessian matrix (though it was able to
maximize the log-likelihood), I think I have good reasons to select the properly
converged model. I assume here that the lack of convergence is a symptom of
model inadequacy to the data, that the AIC was not able to detect.
---
Ruben: For some reasons I don't find model averaging appealing. I guess deep
in my heart I expect more from my model than just the best predictive ability.

-- This is a religious, not a scientific statement, and has no place in any
scientific discussion.

--
Ruben: Seriously, there is a wide and open place in scientific discussion for
mechanistic model-building. I expect the models I built to be more than able
predictors, I want them to capture some aspect of nature, to teach me something
about nature, so I refrain from model averaging, which is an open admission that
you don't care too much about what's really going on.

-- The belief that certain data analysis practices -- standard or not -- somehow
can be trusted to yield reliable scientific results in the face of clear
theoretical (mathematical )and practical results to the contrary, while
widespread, impedes and often thwarts the progress of science, Evidence-based
medicine and clinical trials came about for a reason. I would encourage you to
reexamine the basis of your scientific practice and the role that "magical
thinking" plays in it.

Best,
Bert

--
Ruben: All right Bert. I often re-examine the basis of my scientific praxis but
less often than I did before, I have to confess. I like to think it is because I
am converging on the right praxis so there are less issues to re-examine. But
this problem of model selection is a tough one. Being a likelihoodist in
inference naturally leads you to AIC-based model selection, Jim Lindsey being
influent too. Wanting that your models say some something about nature is
another strong conditioning factor. Put this two together and it becomes hard to
do model-averaging. And it has nothing to do with religion (yuck!).



------------------------------

Message: 78
Date: Fri, 27 Jan 2012 09:19:58 +0100
From: "Massimo Bressan" <mbressan@arpa.veneto.it>
To: "R. Michael Weylandt" <michael.weylandt@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] merge multiple data frames
Message-ID: <82BDDF2BA5C24DD99562DF3579766951@arpa.veneto.it>
Content-Type: text/plain; format=flowed; charset="Windows-1252";
    reply-type=original

I tested your code: it's OK but there is still the problem of the suffixes
for the last dataframe....
thank you for the support


----- Original Message ----- 
From: "R. Michael Weylandt" <michael.weylandt@gmail.com>
To: "maxbre" <mbressan@arpa.veneto.it>
Cc: <r-help@r-project.org>
Sent: Thursday, January 26, 2012 8:19 PM
Subject: Re: [R] merge multiple data frames


I might do something like this:

mergeAll <- function(..., by = "date", all = TRUE) {
  dotArgs <- list(...)
  Reduce(function(x, y)
  merge(x, y, by = by, all = all, suffixes=paste(".", names(dotArgs),
sep = "")),
  dotArgs)}

mergeAll(a = a, b = b, c = c)

str(.Last.value)

You also might be able to set it up to capture names without you
having to put "a = a" etc. using substitute.

On Thu, Jan 26, 2012 at 12:29 PM, maxbre <mbressan@arpa.veneto.it>
wrote:> This is my reproducible example (three data frames: a, b, c)
>
> a<-structure(list(date = structure(1:6, .Label =
c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(0.799401398190476, 0, 0,
0.0100453950434783,
> 0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
> 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
> 110.425185027727), no = c(48.8543691516522, 88.7197448817391,
> 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
> ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
> 49.2986070321739, 46.5978461731818), co = c(0.618856168125,
> 0.996593475083333,
> 0.666987416083333, 0.383437311166667, 0.281604928875, 0.155383408913043
> ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
> 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa >
c(167.541954974667,
> 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
> 173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
> 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
> ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
> 319.753674830936, 33.8713897347193, 228.368119533759), temp >
c(7.91972825883333,
> 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
> 3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
> 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
> )), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class > "data.frame")
>
>
> b<-structure(list(date = structure(1:6, .Label =
c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(0, 0, 0, 0, 0, 0), nox =
c(13.74758511,
> 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
> ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
> 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
> 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
> 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
> 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
> ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
> 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
> 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
> 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
> 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
> 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
> 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
> )), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class > "data.frame")
>
>
> c<-structure(list(date = structure(1:6, .Label =
c("2012-01-03",
> "2012-01-04", "2012-01-05", "2012-01-06",
"2012-01-07", "2012-01-08",
> "2012-01-09", "2012-01-10", "2012-01-11",
"2012-01-12", "2012-01-13",
> "2012-01-14", "2012-01-15", "2012-01-16",
"2012-01-17", "2012-01-18",
> "2012-01-19", "2012-01-20", "2012-01-21",
"2012-01-22", "2012-01-23"
> ), class = "factor"), so2 = c(2.617839247, 0, 0, 0.231044086,
> 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142,
> 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498,
> 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447,
> 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
> ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745,
> 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989,
> 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056,
> 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128,
> 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
> ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 
> 24.0739884380921,
> 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029,
> 4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116,
> 96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038
> )), .Names = c("date", "so2", "nox",
"no", "no2", "co", "o3",
> "ipa", "ws", "wd", "temp",
"umr"), row.names = c(NA, 6L), class > "data.frame")
>
>
> Given the three data frames ?a?, ?b? and ?c?, I need to merge them all by
> the common field ?date?.
> The following attempt is working fine but?
>
> # code start
>
all<-merge(a,b,by="date",suffixes=c(".a",".b"),all=T)
> all<-merge(all,c,by="date",all=T)
> # code end
>
> ?I would like to possibly do it in just ?one shot? (a generalisation of 
> the
> code for a more complex case of handling many data frames) also by 
> assigning
> proper suffixes to each variable (not possible with the previous code
> snippet)
>
> Then I also try a different approach with the use of the library reshape
> and
> the function ?merge_all? but?
>
> # code start
> library("reshape")
> all.new<-merge_all(a, b, c, by="date", all=T,
suffixes=c(".a",".b",".c"))
> # code end
>
> ?I got the following error message:
> error in merge in merge.data.frame(as.data.frame(x), as.data.frame(y), 
> ...)
> :
> formal argument "all" associated to different argument passed
> (a free translation from italian)
>
> My question is:
> how to accomplish the merging of multiple data frames with all the same
> variable names and by a single id field with the need of ?keeping track?
> of
> the original data frame by means of the use of suffixes appended to new
> variables?
>
> Any help much appreciated
>
> Thank you
>
>
>
>
> --
> View this message in context: 
>
http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p4331089.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


------------------------------

Message: 79
Date: Fri, 27 Jan 2012 19:51:12 +1100
From: Jim Lemon <jim@bitwrit.com.au>
To: Raphael Bauduin <rblists@gmail.com>
Cc: r-help@r-project.org
Subject: Re: [R] adding additional information to histogram
Message-ID: <4F226580.1080707@bitwrit.com.au>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 01/27/2012 03:12 AM, Raphael Bauduin wrote:> Hi,
>
> I am a beginner with R, and I think the answer to my question will
> seem obvious, but after searching and trying without success I've
> decided to post to the list.
>
> I am working with data loaded from a csv filewith these fields:
>    order_id, item_value
> As an order can have multiple items, an order_id may be present
> multiple times in the CSV.
>
> I managed to compute the total value  and the number of items for each
order:
>
>    oli<- read.csv("/tmp/order_line_items_data.csv",
header=TRUE)
>    orders_values<- tapply(oli[[2]], oli[[1]], sum)
>    items_per_order<- tapply(oli[[2]], oli[[1]], length)
>
> I then can display the histogram of the order values:
>
>    hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)
>
> Now on this histogram, I would like to display the average number of
> items of the orders in each group (defined with the breaks).
> So for the bar of orders with value 0 to 10, I'd like to display the
> average number of items of these orders.
>Hi Raph,
As this looks a tiny bit like homework, I'll only provide suggestions. 
You have the value and number of items for each order. What you need to 
do is to match them in groups. In order to do that, you want a factor 
that will show the group for each value-items pair. The "cut" function
will give you such a factor, using the breaks above. You seem to 
understand the *apply functions, so you can use one of these to return 
the mean number of items for each value group. Alternatively, you could 
use the factor in the "by" function to get the mean number of items.

You should now have a factor that can be sent to "table" to get the 
number of orders in each value range, and a vector of the corresponding 
mean numbers of items in each value grouping. Why you could even use the 
same trick to calculate the mean price of the orders in each value 
grouping...

I would use "barplot" to display all this information, as it is a bit 
easier to place the mean number on items on the bars (if you check the 
return value for barplot).

Jim



------------------------------

Message: 80
Date: Fri, 27 Jan 2012 01:14:29 -0800 (PST)
From: deivit <david.cassany@transmuralbiotech.com>
To: r-help@r-project.org
Subject: [R] Call dynamic functions
Message-ID: <1327655669482-4332942.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

Hi all,

Does anybody know a better way to call functions than using 'eval'?

I need to call functions which it's name is stored in a variable. Right now
create the command string I'd like to run and then I run it using
"eval".

Running functions inside "eval" is starting annoying because it leads
to a
difficult error handling and difficult scenario to debug. So that I am
asking if there is someone that knows and easier way to do that.

Thanks


--
View this message in context:
http://r.789695.n4.nabble.com/Call-dynamic-functions-tp4332942p4332942.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 81
Date: Fri, 27 Jan 2012 10:58:04 +0100
From: peter dalgaard <pdalgd@gmail.com>
To: Bert Gunter <gunter.berton@gene.com>
Cc: Max Brondfield <max.brondfield@gmail.com>, r-help@r-project.org
Subject: Re: [R] Quality of fit statistics for NLS?
Message-ID: <BDC6D36D-F152-41E0-87DC-38A28CCF3B53@gmail.com>
Content-Type: text/plain; charset=windows-1252


On Jan 26, 2012, at 22:51 , Bert Gunter wrote:
> Inline below.
> 
> -- Bert
> 
> On Thu, Jan 26, 2012 at 12:16 PM, Max Brondfield
> <max.brondfield@gmail.com> wrote:
>> Dear all,
>> I am trying to analyze some non-linear data to which I have fit a curve
of
>> the following form:
>> 
>> dum <- nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000))
>> 
>> I am wondering if there is any way to determine meaningful quality of
fit
>> statistics from the nls function?
>> 
>> A summary yields highly significant p-values, but it is my impression
that
>> these are questionable at best given the iterative nature of the fit:
> No. They are questionable primarily because there is no clear null
> model. They are based on profile likelihoods (as ?confint tells you),
> which may or may not be what you want for "goodness of fit."
> 
> One can always get "goodness of fit" statistics but the question
in
> nonlinear models is: goodness of fit with respect to what? So the
> answer to your question is: if you know what you're doing, certainly.
> Otherwise, find someone who does.
...and if you are in the process of learning what you are doing: p-values are
almost _never_ a good measure of goodness-of-fit, whereas the residual standard
error might be, especially if you take a prediction approach to things. For
one-dimensional curve fits, a graph of the data with the fitted curve is often
what is really needed.

Also notice that summaries of fitted models are not useful for detecting
systematic deviations from the model (like systematic over/under-estimation in
some regions), for that you need diagnostic plots, and/or comparisons with
extended models.
> 
> 
> 
>> 
>>> summary(dum)
>> 
>> Formula: y ~ (A + (B * x)/(C + x))
>> 
>> Parameters:
>>   Estimate Std. Error t value Pr(>|t|)
>> A   388.753      4.794  81.090  < 2e-16 ***
>> B   115.215      5.006  23.015  < 2e-16 ***
>> C 20843.832   4646.937   4.485 1.12e-05 ***
>> ---
>> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>> 
>> Residual standard error: 18.25 on 245 degrees of freedom
>> 
>> Number of iterations to convergence: 4
>> Achieved convergence tolerance: 2.244e-06
>> 
>> 
>> Is there any other means of determining the quality of the curve fit? I
>> have tried applying confidence intervals using confint(dum), but these
[[elided Yahoo spam]]>> -Max
>> 
>>        [[alternative HTML version deleted]]
>> 
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes@cbs.dk  Priv: PDalgd@gmail.com



------------------------------

Message: 82
Date: Fri, 27 Jan 2012 11:04:20 +0100
From: Alex Hofmann <ahah@gmx.net>
To: R-help@r-project.org
Subject: [R] laod multichannel-audio-files with readWave (tuneR)
Message-ID: <4F2276A4.7070309@gmx.net>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed

Hi,

I have a huge collection of 6-channel .wav files containing audio and 
sensor recordings, which I need to analyse, across all 6 channels.
I'm thinking about what will be easier to do.

1. Just split the channels in an external audio-editor and than read 
each mono channel seperatly with readWave (from tuneR), or
2. inherite the readWave class and extend it to multichannel reading?

What do you think?

Best,
Alex

-- 
Alex Hofmann (Dipl.-Musikl.)
Institute of Musical Acoustics (Wiener Klangstil)
University of Music and Performing Arts Vienna
Anton-von-Webern-Platz 1
1030 Vienna, Austria
Tel. +43 1 71155 4322
Fax. +43 1 71155 4399
http://iwk.mdw.ac.at
http://www.isas.tuwien.ac.at/clarinet/index.html



------------------------------

Message: 83
Date: Fri, 27 Jan 2012 02:45:05 -0800 (PST)
From: deivit <david.cassany@transmuralbiotech.com>
To: r-help@r-project.org
Subject: Re: [R] Call dynamic functions
Message-ID: <1327661105835-4333083.post@n4.nabble.com>
Content-Type: text/plain; charset=us-ascii

WoW Peter!

That worked like charm. I earn you some hours of my time :)

Thanks a lot!

On Jan 27, 2012, at 10:14 , deivit wrote:
> Hi all,
>
> Does anybody know a better way to call functions than using 'eval'?
>
> I need to call functions which it's name is stored in a variable. Right
> now
> create the command string I'd like to run and then I run it using
"eval".
>
> Running functions inside "eval" is starting annoying because it
leads to a
> difficult error handling and difficult scenario to debug. So that I am
> asking if there is someone that knows and easier way to do that.
>>Well,
>> > fortune(106)
>>If the answer is parse() you should usually rethink the question.
>>  -- Thomas Lumley
>>     R-help (February 2005)
>>
>>
>>so how did you get yourself into that predicament in the first place?
That
being said, there are options like>>
>>f <- get(funname, mode="function")
>>f(arguments)
>>
>>or
>>
>>call(funname, arglist)
>>
>>which may or may not do it for you.

--
View this message in context:
http://r.789695.n4.nabble.com/Call-dynamic-functions-tp4332942p4333083.html
Sent from the R help mailing list archive at Nabble.com.



------------------------------

Message: 84
Date: Fri, 27 Jan 2012 11:49:13 +0100
From: Adel ESSAFI <adelessafi@gmail.com>
To: r-help <r-help@r-project.org>
Subject: [R] generate a random number with rexp ?
Message-ID:
    <CAF=-T0ijWpfWxx76G-nZkfwkBk18tga2kN2Px1sTAMrFFbDFDw@mail.gmail.com>
Content-Type: text/plain

dear list

I use runif to generate a ramdom number between min and max

runif(n, min=0, max=1)


however , the syntaxe of rexp does not allow that

rexp(n, rate = 1)


and it generate a number with the corresponding rate.

The question is: how to generate a number between min and max using rexp().

Regards


-- 
PhD candidate in Computer Science
Address
3 avenue lamine, cité ezzahra, Sousse 4000
Tunisia
tel: +216 97 246 706 (+33640302046 jusqu'au 15/6)
fax: +216 71 391 166

    [[alternative HTML version deleted]]



------------------------------

_______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

End of R-help Digest, Vol 107, Issue 27
***************************************
	[[alternative HTML version deleted]]

Christopher Kelvin

2012-Jan-27 18:41 UTC

head link

[R] R-HELP

please help;
I want to know how to generate an interval-censored data of about 20% and a
right censored data of about 30% 
using the weibull distribution of say, x=rweibull(100,shape=1.2,scale=1.5)

	[[alternative HTML version deleted]]

Christopher Kelvin

2012-Jan-28 02:14 UTC

head link

[R] R-Help

Can somebody help me,
 How do I generate data from the weibull distribution if the data contain both
failure and interval censored,
 For example, I want to generate n=100, shape=2 and scale =4 with 30% interval
censored.
 Thank you
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jan 2012 - R-Help

[R] R-Help

[R] R-Help

[R] R-help

[R] R-HELP

[R] R-Help

Seemingly Similar Threads