Dear Paul,
Try :
ddTable[ match(unique(ddTable$ID), ddTable$ID), ]
Regards,
Beat Bapst
-----Ursprungliche Nachricht-----
Von: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]Im Auftrag von
r-help-request at r-project.org
Gesendet: Samstag, 7. Juni 2008 12:00
An: r-help at r-project.org
Betreff: R-help Digest, Vol 64, Issue 7
Send R-help mailing list submissions to
r-help at r-project.org
To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
r-help-request at r-project.org
You can reach the person managing the list at
r-help-owner at r-project.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-help digest..."
Today's Topics:
1. Agreggating data using external aggregation rules
(ANGELO.LINARDI at bancaditalia.it)
2. Re: Lattice: key does not accept German umlaute (Dieter Menne)
3. Re: which question (Dieter Menne)
4. Re: Y values below the X plot (Jim Lemon)
5. Re: Lattice: key does not accept German umlaute
(Prof Brian Ripley)
6. Merging two dataframes (Michael Pearmain)
7. Re: Lattice: key does not accept German umlaute (Bernd Weiss)
8. boxplot changes fontsize of labels (Sebastian Merz)
9. simple data question (stephen sefick)
10. Re: Multiple comment.char under read.table (Daniel Folkinshteyn)
11. Re: simple data question (Daniel Folkinshteyn)
12. Re: R: Securities earning covariance (Gabor Grothendieck)
13. Re: Merging two dataframes (Daniel Folkinshteyn)
14. Re: Agreggating data using external aggregation rules
(Gabor Grothendieck)
15. request: a class having max frequency (Muhammad Azam)
16. Re: request: a class having max frequency (Chuck Cleland)
17. Re: request: a class having max frequency (Michael Conklin)
18. Re: request: a class having max frequency (Daniel Folkinshteyn)
19. Re: Problem in executing R on server (Erik Iverson)
20. Re: request: a class having max frequency (Chuck Cleland)
21. Manipulating DataSets (Neil Gupta)
22. Subsetting to unique values (Emslie, Paul [Ctr])
23. Re: How can I display a characters table ? (Katharine Mullen)
24. Giovanna Jonalasinio ? fuori ufficio, I'm away
(Giovanna.Jonalasinio at uniroma1.it)
25. Re: Subsetting to unique values (Chuck Cleland)
26. Re: simple data question (stephen sefick)
27. Re: Subsetting to unique values (John Kane)
28. Re: which question (Eleni Christodoulou)
29. Re: Subsetting to unique values (Adrian Dusa)
30. Startup speed for a lengthy script (Dennis Fisher)
31. Re: Java to R interface (Dumblauskas, Jerry)
32. Re: which question (Richard Pearson)
33. Re: Merging two dataframes (Daniel Folkinshteyn)
34. fit.variogram sgeostat error (Alexys Herleym Rodriguez Avellaneda)
35. lsmeans (Dani Valverde)
36. Re: Improving data processing efficiency (Daniel Folkinshteyn)
37. store filename (DAVID ARTETA GARCIA)
38. Re: label outliers in geom_boxplot (ggplot2) (hadley wickham)
39. Re: Improving data processing efficiency (Patrick Burns)
40. Re: Improving data processing efficiency (Gabor Grothendieck)
41. Store filename (DAVID ARTETA GARCIA)
42. where to download BRugs? (Nanye Long)
43. Re: Improving data processing efficiency (Daniel Folkinshteyn)
44. Re: Improving data processing efficiency (Gabor Grothendieck)
45. Re: Improving data processing efficiency (Gabor Grothendieck)
46. How to force two regression coefficients to be equal but
opposite in sign? (Woolner, Keith)
47. Re: Store filename (Daniel Folkinshteyn)
48. Re: Store filename (Henrique Dallazuanna)
49. fit.contrast error (Dani Valverde)
50. Re: where to download BRugs? (Uwe Ligges)
51. Re: choosing an appropriate linear model (Levi Waldron)
52. reorder breaking by half (avilella)
53. Re: rmeta package: metaplot or forestplot of meta-analysis
under DSL (ramdon) model (Thomas Lumley)
54. Problem with subset (Luca Mortarini)
55. Re: Manipulating DataSets (Charles C. Berry)
56. Re: Improving data processing efficiency (Daniel Folkinshteyn)
57. Re: lsmeans (John Fox)
58. Re: reorder breaking by half (Daniel Folkinshteyn)
59. Re: Improving data processing efficiency (Daniel Folkinshteyn)
60. Re: Improving data processing efficiency (Gabor Grothendieck)
61. Re: Improving data processing efficiency (Daniel Folkinshteyn)
62. Re: How to force two regression coefficients to be equal but
opposite in sign? (Greg Snow)
63. Re: Subsetting to unique values (jim holtman)
64. Re: where to download BRugs? (Prof Brian Ripley)
65. Re: Problem with subset (Charles C. Berry)
66. Re: Improving data processing efficiency (Daniel Folkinshteyn)
67. Re: ggplot questions (Thompson, David (MNR))
68. Re: Improving data processing efficiency (Patrick Burns)
69. Re: ggplot questions (hadley wickham)
70. Re: Improving data processing efficiency (Daniel Folkinshteyn)
71. Re: boxplot changes fontsize of labels (Prof Brian Ripley)
72. Re: Improving data processing efficiency (Greg Snow)
73. Re: Improving data processing efficiency (Gabor Grothendieck)
74. Random Forest (Bertrand Pub Michel)
75. mean (Marco Chiapello)
76. Re: Java to R interface (madhura)
77. R (D)COM Server not working on windows domain account
(Evans_CSHL)
78. Random Forest and for multivariate response data
(Bertrand Pub Michel)
79. Random Forest (Bertrand Pub Michel)
80. R + Linux (steven wilson)
81. Re: Improving data processing efficiency (Greg Snow)
82. Re: mean (ctu at bigred.unl.edu)
83. Re: Improving data processing efficiency (Patrick Burns)
84. Plot matrix as many lines (Alberto Monteiro)
85. Re: mean (Chuck Cleland)
86. col.names ? (tolga.i.uzuner at jpmorgan.com)
87. Re: ggplot questions (Thompson, David (MNR))
88. Re: mean (Douglas Bates)
89. New vocabulary on a Friday afternoon. Was: Improving data
processing efficiency (Greg Snow)
90. Re: R + Linux (Douglas Bates)
91. editing a data.frame (john.polo)
92. Re: Plot matrix as many lines (Henrique Dallazuanna)
93. calling a C function with a struct (John Nolan)
94. Re: col.names ? (Henrique Dallazuanna)
95. Re: Plot matrix as many lines (Chuck Cleland)
96. Re: col.names ? (Chuck Cleland)
97. Re: col.names ? (William Pepe)
98. Re: col.names ? (tolga.i.uzuner at jpmorgan.com)
99. Re: Subsetting to unique values (Jorge Ivan Velez)
100. Re: calling a C function with a struct (Duncan Murdoch)
101. Re: R + Linux (Kevin E. Thorpe)
102. Re: R + Linux (Markus J?ntti)
103. color scale mapped to B/W (Michael Friendly)
104. Re: Random Forest (Yasir Kaheil)
105. Re: R + Linux (Roland Rau)
106. Re: R + Linux (Dirk Eddelbuettel)
107. Re: R + Linux (Prof Brian Ripley)
108. Re: R + Linux (Esmail Bonakdarian)
109. Re: R + Linux (Abhijit Dasgupta)
110. Re: editing a data.frame (Daniel Folkinshteyn)
111. Re: R + Linux (Daniel Folkinshteyn)
112. Re: R + Linux (Jonathan Baron)
113. Re: Improving data processing efficiency (Daniel Folkinshteyn)
114. Re: R + Linux (Esmail Bonakdarian)
115. FW: R + Linux (Horace Tso)
116. Re: Improving data processing efficiency (Don MacQueen)
117. Re: Improving data processing efficiency (hadley wickham)
118. Re: Improving data processing efficiency (Daniel Folkinshteyn)
119. Re: color scale mapped to B/W (hadley wickham)
120. Re: Improving data processing efficiency (Daniel Folkinshteyn)
121. Re: color scale mapped to B/W (Achim Zeileis)
122. Re: color scale mapped to B/W (Greg Snow)
123. Re: Improving data processing efficiency (Daniel Folkinshteyn)
124. Re: color scale mapped to B/W (Greg Snow)
125. Re: Improving data processing efficiency (Esmail Bonakdarian)
126. Re: Improving data processing efficiency (Horace Tso)
127. Re: Improving data processing efficiency (Esmail Bonakdarian)
128. Problem of installing Matrix (ronggui)
129. Re: Improving data processing efficiency (Charles C. Berry)
130. Re: Improving data processing efficiency (hadley wickham)
131. Re: color scale mapped to B/W (hadley wickham)
132. Re: editing a data.frame (john.polo)
133. error message with dat (Paul Adams)
134. Re: Problem of installing Matrix (Prof Brian Ripley)
135. Predicting a single observatio using LME (Rebecca Sela)
136. expected risk from coxph (survival) (Reid Tingley)
137. txt file, 14000+ rows, only last 8000 appear (RobertsLRRI)
138. functions for high dimensional integral (ZT2008)
139. compilation failed on MacOSX.5 / icc 10.1 / ifort 10.1 / R
2.7.0 (Mathieu Prevot)
140. Re: expected risk from coxph (survival) (Dieter Menne)
141. Re: txt file, 14000+ rows, only last 8000 appear (Paul Smith)
142. Re: color scale mapped to B/W (Achim Zeileis)
143. Re: Predicting a single observatio using LME (Dieter Menne)
144. Re: lsmeans (Dieter Menne)
145. Re: functions for high dimensional integral (Prof Brian Ripley)
----------------------------------------------------------------------
Message: 1
Date: Fri, 6 Jun 2008 12:12:36 +0200
From: <ANGELO.LINARDI at bancaditalia.it>
Subject: [R] Agreggating data using external aggregation rules
To: <r-help at R-project.org>
Message-ID:
<C844A6B20A3322429988FDA0E042FFDB01501AEA at SERVPE2.ac.bankit.it>
Content-Type: text/plain; charset="us-ascii"
Dear R experts,
I am currently facing a tricky problem which I have read a lot about in
the various R mailing lists without finding exactly what I need.
I have a big data frame DF (about 2,000,000 rows) with 7 columns being
variables and 1 being a measure (using reshape package nomeclature).
There are no "duplicates" in it.
Fot each of the variables I have some "rules" to apply, being COD_IN
the
value of the variable in the DF, COD_OUT the one to be transformed to;
once obtained the "new codes" in the DF I have to aggregate the
"new DF"
(for example summing the measure).
Usually the total transformation (merge+aggregate) really decreases the
number of lines in the data frame, but sometimes it can grows depending
on the rule. Just to give an idea, the first "rule" in v1 maps 820
different values into 7 ones.
Using SQL and a database this can be done in a very straightforward way
(for example on the variable v1):
Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)>From DF, RULE_v1
Where v1=COD_IN
Group by v2, v3,v4, v5, v6, v7
So the first choice would be using a database; the second one would be
splitting the data frame and then joining the results.
Is there any other possibility to merge+aggregate caused by the merge ?
Thank you in advance
Angelo Linardi
** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e
non
comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che
cio' non
sia espressamente previsto da un accordo scritto.
Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La
preghiamo di
comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto.
La
informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi
allegati
potrebbe costituire reato. Grazie per la collaborazione.
-- E-mails from the Bank of Italy are sent in good faith but they are neither
binding on
the Bank nor to be understood as creating any obligation on its part except
where
provided for in a written agreement. This e-mail is confidential. If you have
received it
by mistake, please inform the sender by reply e-mail and delete it from your
system.
Please also note that the unauthorized disclosure or use of the message or any
attachments could be an offence. Thank you for your cooperation. **
------------------------------
Message: 2
Date: Fri, 6 Jun 2008 10:18:34 +0000 (UTC)
From: Dieter Menne <dieter.menne at menne-biomed.de>
Subject: Re: [R] Lattice: key does not accept German umlaute
To: r-help at stat.math.ethz.ch
Message-ID: <loom.20080606T101648-707 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
Bernd Weiss <bernd.weiss <at> uni-koeln.de> writes:
> library(lattice)
> ## gives an error
> xyplot(1~1, key = list(text = list(c("M\344nner"))))
>
> Is this a bug?
You forgot to mention your version, assuming 2.7.0 unpatched.
Corrected by Brian Ripley in developer version (and probably also in patched)
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/129251.html
Dieter
------------------------------
Message: 3
Date: Fri, 6 Jun 2008 10:23:32 +0000 (UTC)
From: Dieter Menne <dieter.menne at menne-biomed.de>
Subject: Re: [R] which question
To: r-help at stat.math.ethz.ch
Message-ID: <loom.20080606T102224-478 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
Eleni Christodoulou <elenichri <at> gmail.com> writes:
> I was trying to select a column of a data frame using the *which* command.
I
> was actually selecting the rows of the data frame using *which, *and then
> displayed a certain column of it. The command that I was using is:
> sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
....
Please provide a running example. The *mydata* are difficult to read.
Dieter
------------------------------
Message: 4
Date: Fri, 06 Jun 2008 20:44:31 +1000
From: Jim Lemon <jim at bitwrit.com.au>
Subject: Re: [R] Y values below the X plot
To: jpardila <bluejp at gmail.com>
Cc: r-help at r-project.org
Message-ID: <4849150F.6000501 at bitwrit.com.au>
Content-Type: text/plain; charset=us-ascii; format=flowed
jpardila wrote:> Dear List,
> I am creating a plot and I want to insert the tabular data below the X
axis.
> I mean for every value of X I want to show the value in Y as a table below
> the plot. I think the attached image gives an idea of what I mean by this.
>
> Below is the code i am using now... but as you see the Y values don't
have
> the right location. Maybe I should insert them as a table? Any ideas on
> that. This should be easy to do but I don't have much experience in R.
> Many thanks in advanced,
> JP
>
> http://www.nabble.com/file/p17670311/legend.jpg legend.jpg
> -------------------------
>
img1<-c(-5.28191709,-5.364480081,-4.829456677,-5.325101503,-5.212952356,-5.181171896,-5.211122693,-5.153677663,-5.292961077,-5.151612394,-5.056544559,-5.151457115,-5.332984571,-5.325259917,-5.523870109,-5.429800485,-5.436455325)
>
img2<-c(-5.55,-5.56,-5.72,-5.57,-5.34,-5.18,-5.18,-5.36,-5.46,-5.32,-5.29,-5.37,-5.42,-5.45,-5.75,-5.75,-5.77)
> angle<-26:42
> plot(img1~angle, type="o", xlab="Incident angle",
ylab="sigma",
> ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes="false")
> lines(img2~angle,lwd=2,type="o", col=1, pch=19,cex=1)
> legend(38,-2,format(img1,digits=2), cex=0.8)
> legend(40,-2,format(img2,digits=2),cex=0.8)
> legend(26, -2, c("Image 1","Image 2"),
cex=0.8,lwd=2,col=c("8","1"), pch=19,
> lty=1:2,bty="n")
> abline(h = -1:-8, v = 25:45, col = "lightgray", lty=3)
>
> axis(1, at=2*0:22)
> axis(2, at=-8:-2)
> -----------------------------------
Hi JP,
I thought I could do this with addtable2plot, but I hadn't coded a
column spacing into it (maybe next version). However, this is almost
what you want, and I'm sure you can work out how to add the lines.
plot(img1~angle, type="o", xlab="Incident angle",
ylab="sigma",
ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes="false")
box()
lines(img2~angle,lwd=2,type="o", col=1, pch=19,cex=1)
tablerownames<-"Angle\nImage1\nImage2"
mtext(c(tablerownames,
paste(angle,round(img1,2),round(img2,2),sep="\n")),
1,line=1,at=c(24.7,angle),cex=0.5)
Jim
------------------------------
Message: 5
Date: Fri, 6 Jun 2008 11:48:56 +0100 (BST)
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Subject: Re: [R] Lattice: key does not accept German umlaute
To: Bernd Weiss <bernd.weiss at uni-koeln.de>
Cc: r-help at stat.math.ethz.ch
Message-ID:
<alpine.LFD.1.10.0806061139280.14980 at gannet.stats.ox.ac.uk>
Content-Type: text/plain; charset="iso-8859-15";
Format="flowed"
Well, you failed to give the 'at a minimum information' asked for in the
posting guide, and \344 is locale-specific. I see 'MingW32' below, so
will guess this is German-language Windows. We don't know what the error
was, either.
It works correctly for me in CP1252 with R-patched, and gives an error in
2.7.0 (and works in 2.6.2). I think it was fixed as side effect of
o Rare string width calculations in package grid were not
interpreting the string encoding correctly.
although it is not the same problem that NEWS item refers to.
My error message in 2.7.0 was
Error in grid.Call.graphics("L_setviewport", pvp, TRUE) :
invalid input 'M?nner' in 'utf8towcs'
which is what makes me think this was to do with sizing the viewport.
So please update to R-patched and try again.
On Fri, 6 Jun 2008, Bernd Weiss wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> library(lattice)
>
> ## works as expected
> xyplot(1~1, key = list(text = list(c("Maenner"))))
>
> ## works as expected
> xyplot(1~1, key = list(text = list(c("Maenner"))), xlab =
"M\344nner")
>
> ## gives an error
> xyplot(1~1, key = list(text = list(c("M\344nner"))))
>
> Is this a bug?
>
> TIA,
>
> Bernd
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFISPo6Usbvfbd00+ERArJFAJsEvWq2Cai7chuOADadZHT2pnRJOgCfWLdx
> 3Hs3PnCzd6nuTqt6JwCl+VM> =RVUk
> -----END PGP SIGNATURE-----
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
------------------------------
Message: 6
Date: Fri, 6 Jun 2008 12:30:25 +0100
From: "Michael Pearmain" <mpearmain at google.com>
Subject: [R] Merging two dataframes
To: r-help at r-project.org
Message-ID:
<2763e000806060430k1b16328fw9d5e73e4683a6f13 at mail.gmail.com>
Content-Type: text/plain
Hi All,
Newbie question for you all but i have been looking at the archieves and the
help dtuff to get a rough idea of what i want to do
I would like to merge two dataframes together based on a keyed variable in
one dataframe linking to the other dataframe. Only some of the cases will
match but i would like to keep the others as well.
My dataframes have 67 and 28 cases respectively and i would like ot end uip
with one file 67 cases long (all 28 are matched cases).
I can use the merge command to merge two datasets together this but i still
get some
odd results, i'm using the code below;
ETC <- read.csv(file="CSV_Data2.csv",head=TRUE,sep=",")
'SURVEY <-
read.csv(file="survey.csv",head=TRUE,sep=",")
'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid", by.ETC =
"ord")
The merged file seems to have 1800 cases while the ETC data file only
has 67 and the SURVEY file only has 28. (Reading the help it looks as if it
merges 1 case with all cases in the other file, which is not what i want)
The matching variables fields are the 'ord' field and the 'uid'
field
Can anyone advise please?
--
Michael Pearmain
[[alternative HTML version deleted]]
------------------------------
Message: 7
Date: Fri, 06 Jun 2008 14:22:58 +0200
From: Bernd Weiss <bernd.weiss at uni-koeln.de>
Subject: Re: [R] Lattice: key does not accept German umlaute
To: r-help at stat.math.ethz.ch
Message-ID: <48492C22.9060102 at uni-koeln.de>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Prof Brian Ripley schrieb:
[...]
| It works correctly for me in CP1252 with R-patched, and gives an error
| in 2.7.0 (and works in 2.6.2). I think it was fixed as side effect of
|
| o Rare string width calculations in package grid were not
| interpreting the string encoding correctly.
|
| although it is not the same problem that NEWS item refers to.
|
| My error message in 2.7.0 was
|
| Error in grid.Call.graphics("L_setviewport", pvp, TRUE) :
| invalid input 'M?nner' in 'utf8towcs'
|
| which is what makes me think this was to do with sizing the viewport.
|
|
| So please update to R-patched and try again.
That's it! Thanks for your help.
Bernd
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFISSwiUsbvfbd00+ERAphpAJ9I5vxmzCYIkl52potRXsMG322J1gCgxe4S
BgPTcyWju9A74csTgVPQSi4=urOX
-----END PGP SIGNATURE-----
------------------------------
Message: 8
Date: Fri, 6 Jun 2008 14:37:47 +0200
From: Sebastian Merz <sebastian.merz at web.de>
Subject: [R] boxplot changes fontsize of labels
To: r-help at r-project.org
Message-ID: <20080606143747.5f91ef4d at fred>
Content-Type: text/plain; charset=US-ASCII
Hi all!
So far I learned some R but finilizing my plots so they look
publishable seems not to be possible.
I set up some boxplots. Everything works well but when I put more then
two of them in one plot the labels of the axes appear smaller than the
normal font size.
> x <- rnorm(30)
> y <- rnorm(30)
> par(mfrow=c(1,4))
> boxplot(x,y, names=c("horray", "hurra"))
> mtext("Jubel", side=1, line=2)
In case I take one or two boxplots this does not happen:> par(mfrow=c(1,2))
> boxplot(x,y, names=c("horray", "hurra"))
> mtext("Jubel", side=1, line=2)
The cex.axis seems not to be changed, as setting it to 1.0 doesn't
change the behaviour. If cex.axis=1.3 in the first example the font
size used by boxplot and by mtext is about the same. But as I use a
function to draw quite some of these plots this "hack" is not a proper
solution.
I couldn't find anything about this behaviour in the documention or
the inet.
Can anybody explain? All hints are appriciated.
Thanks,
S. Merz
------------------------------
Message: 9
Date: Fri, 6 Jun 2008 08:43:01 -0400
From: "stephen sefick" <ssefick at gmail.com>
Subject: [R] simple data question
To: r-help at r-project.org
Message-ID:
<c502a9e10806060543s3203756cj7efbe23f6a517bf6 at mail.gmail.com>
Content-Type: text/plain
if I wanted to use a name for a column with two words say Dick Cheney and
George Bush
can I put these in quotes "Dick Cheney" and "George Bush" to
get them to
read into R using both read.table and read.zoo to recognize this.
thanks
Stephen
--
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.
-K. Mullis
[[alternative HTML version deleted]]
------------------------------
Message: 10
Date: Fri, 06 Jun 2008 08:51:54 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Multiple comment.char under read.table
To: Gundala Viswanath <gundalav at gmail.com>
Cc: r-help at stat.math.ethz.ch
Message-ID: <484932EA.9040305 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
according to the helpfile, comment only takes one character, so you'll
have to do some 'magic' :)
i'd suggest to first run mydata through sed, and replace one of the
comment chars with another, then run read.table with the one comment
char that remains.
sed -e 's/^\^/!/' mydata.txt > mydata2.txt
alternatively, you could do read.table twice, once with ! and once with
^, and then pull out all the common rows from the two results.
on 06/06/2008 03:47 AM Gundala Viswanath said the
following:> Hi all,
>
> Suppose I want to read a text file with read.table.
> It containt lines to be skipped that begins with "!" and
"^".
>
> Is there a way to include this two values in the read.table function?
> I tried this but doesn't seem to work.
>
> dat <- read.table("mydata.txt", comment.char =
c("!","^") , na.strings
> = "null", sep = "\t");
>
> Please advice.
>
------------------------------
Message: 11
Date: Fri, 06 Jun 2008 09:05:02 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] simple data question
To: stephen sefick <ssefick at gmail.com>
Cc: r-help at r-project.org
Message-ID: <484935FE.7020706 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
should work - don't even have to put them in quotes, if your field
separator is not space. why don't you just try it and see what comes out? :)
on 06/06/2008 08:43 AM stephen sefick said the
following:> if I wanted to use a name for a column with two words say Dick Cheney and
> George Bush
> can I put these in quotes "Dick Cheney" and "George
Bush" to get them to
> read into R using both read.table and read.zoo to recognize this.
> thanks
>
> Stephen
>
------------------------------
Message: 12
Date: Fri, 6 Jun 2008 09:05:56 -0400
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Subject: Re: [R] R: Securities earning covariance
To: ANGELO.LINARDI at bancaditalia.it
Cc: r-help at r-project.org
Message-ID:
<971536df0806060605m43e55dffh712255d835c58e63 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Update your version of zoo to the latest one.
On Fri, Jun 6, 2008 at 3:18 AM, <ANGELO.LINARDI at bancaditalia.it>
wrote:> Thank you for your very fast response.
> I just tried to use the zoo package, after having read the vignettes, but I
get this error message:
>
> Warning messages:
> 1: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
> 2: In x$EARNINGS :
> $ operator is invalid for atomic vectors, returning NULL
> 3: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
> 4: In x$EARNINGS :
> $ operator is invalid for atomic vectors, returning NULL
> 5: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
> 6: In x$EARNINGS :
> $ operator is invalid for atomic vectors, returning NULL
>
> Am I missing something ?
>
> Thank you again
>
> Angelo Linardi
>
>
> -----Messaggio originale-----
> Da: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Inviato: gioved? 5 giugno 2008 17.55
> A: LINARDI ANGELO
> Cc: r-help at r-project.org
> Oggetto: Re: [R] Securities earning covariance
>
> Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.
>
>
> Lines <- "SEC_ID DAY EARNING
> IT0000001 20070101 5.467
> IT0000001 20070102 5.456
> IT0000001 20070103 4.954
> IT0000001 20070104 3.456
> IT0000002 20070101 1.456
> IT0000002 20070102 1.345
> IT0000002 20070103 1.233
> IT0000003 20070101 0.345
> IT0000003 20070102 0.367
> IT0000003 20070103 0.319
> "
> DF <- read.table(textConnection(Lines), header = TRUE) DFs <-
split(DF, DF$SEC_ID)
>
> library(zoo)
> f <- function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY),
"%Y%m%d")) z <- do.call(merge, lapply(DFs, f))
> cov(z) # uses n-1
>
>
> On Thu, Jun 5, 2008 at 11:41 AM, <ANGELO.LINARDI at bancaditalia.it>
wrote:
>> Good morning,
>>
>> I am a new R user and I am trying to learn how to use it.
>> I am trying to solve this problem.
>> I have a dataframe df of daily securities (for a year) earnings as
>> follows:
>>
>> SEC_ID DAY EARNING
>> IT0000001 20070101 5.467
>> IT0000001 20070102 5.456
>> IT0000001 20070103 4.954
>> IT0000001 20070104 3.456
>> ..........................
>> IT0000002 20070101 1.456
>> IT0000002 20070102 1.345
>> IT0000002 20070103 1.233
>> ..........................
>> IT0000003 20070101 0.345
>> IT0000003 20070102 0.367
>> IT0000003 20070103 0.319
>> ..........................
>>
>> And so on: about 800 different SEC_ID and about 180000 rows.
>> I have to calculate the "covariance" for each couple of
securities x
>> and y according to the formula:
>>
>> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
>>
>> being x' and y' the mean of securities earning in the year, N
the
>> number of observations, sx and sy the standard deviation of x and y.
>> To do this I could build a df2 data frame like this:
>>
>> DAY SEC_ID.x SEC_ID.y EARNING.x
>> EARNING.y x' y' sx sy
>> 20070101 IT0000001 IT0000002 5.467 1.456
>> a b aa bb
>> 20070101 IT0000001 IT0000003 5.467 0.345
>> a c aa cc
>> 20070101 IT0000002 IT0000003 1.456 0.345
>> b c bb cc
>> 20070102 IT0000001 IT0000002 5.456 1.345
>> a b aa bb
>> 20070102 IT0000001 IT0000003 5.456 0.367
>> a c aa cc
>> 20070102 IT0000002 IT0000003 1.345 0.367
>> b c bb cc
>>
........................................................................
>> .......................................................
>>
>> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and
then
>> easily calculate the formula; but the dimensions are too big (the
>> process stops whit an out-of-memory message).
>> Besides partitioning the input and using a loop, are there any smarter
>> solutions (eventually using split and other ways of "subgroup
merging"
>> to solve the problem ?
>> Are there any "shortcuts" using statistical built-in
functions (e.g.
>> cov, vcov) ?
>> Thank you in advance
>>
>> Angelo Linardi
>>
>>
>>
>> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in
buona
>> fede e non comportano alcun vincolo ne' creano obblighi per la
Banca
>> stessa, salvo che cio' non sia espressamente previsto da un accordo
scritto.
>> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per
errore,
>> La preghiamo di comunicarne via e-mail la ricezione al mittente e di
>> distruggerne il contenuto. La informiamo inoltre che l'utilizzo non
>> autorizzato del messaggio o dei suoi allegati potrebbe costituire
reato. Grazie per la collaborazione.
>> -- E-mails from the Bank of Italy are sent in good faith but they are
>> neither binding on the Bank nor to be understood as creating any
>> obligation on its part except where provided for in a written
>> agreement. This e-mail is confidential. If you have received it by
mistake, please inform the sender by reply e-mail and delete it from your
system.
>> Please also note that the unauthorized disclosure or use of the
>> message or any attachments could be an offence. Thank you for your
>> cooperation. **
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona
fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo
che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per
errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il
contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei
suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are
neither binding on
> the Bank nor to be understood as creating any obligation on its part except
where
> provided for in a written agreement. This e-mail is confidential. If you
have received it
> by mistake, please inform the sender by reply e-mail and delete it from
your system.
> Please also note that the unauthorized disclosure or use of the message or
any
> attachments could be an offence. Thank you for your cooperation. **
>
------------------------------
Message: 13
Date: Fri, 06 Jun 2008 09:07:22 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Merging two dataframes
To: Michael Pearmain <mpearmain at google.com>
Cc: r-help at r-project.org
Message-ID: <4849368A.6080903 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
try this:
FullData <- merge(ETC, SURVEY, by.x = "ord", by.y =
"uid", all.x = T,
all.y = F)
on 06/06/2008 07:30 AM Michael Pearmain said the
following:> Hi All,
>
> Newbie question for you all but i have been looking at the archieves and
the
> help dtuff to get a rough idea of what i want to do
>
> I would like to merge two dataframes together based on a keyed variable in
> one dataframe linking to the other dataframe. Only some of the cases will
> match but i would like to keep the others as well.
>
> My dataframes have 67 and 28 cases respectively and i would like ot end uip
> with one file 67 cases long (all 28 are matched cases).
>
>
> I can use the merge command to merge two datasets together this but i still
> get some
> odd results, i'm using the code below;
>
> ETC <-
read.csv(file="CSV_Data2.csv",head=TRUE,sep=",")
> 'SURVEY <-
read.csv(file="survey.csv",head=TRUE,sep=",")
> 'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid", by.ETC
= "ord")
>
> The merged file seems to have 1800 cases while the ETC data file only
> has 67 and the SURVEY file only has 28. (Reading the help it looks as if
it
> merges 1 case with all cases in the other file, which is not what i want)
>
> The matching variables fields are the 'ord' field and the
'uid' field
> Can anyone advise please?
>
------------------------------
Message: 14
Date: Fri, 6 Jun 2008 09:10:22 -0400
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Subject: Re: [R] Agreggating data using external aggregation rules
To: ANGELO.LINARDI at bancaditalia.it
Cc: r-help at r-project.org
Message-ID:
<971536df0806060610m4d80d0fbh2db428a7389a6ef at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Use aggregate() for aggregation and use indexing or subset() for selection.
Alternately try the sqldf package: http://sqldf.googlecode.com
which allows one to perform SQL operations on data frames.
On Fri, Jun 6, 2008 at 6:12 AM, <ANGELO.LINARDI at bancaditalia.it>
wrote:> Dear R experts,
>
> I am currently facing a tricky problem which I have read a lot about in
> the various R mailing lists without finding exactly what I need.
> I have a big data frame DF (about 2,000,000 rows) with 7 columns being
> variables and 1 being a measure (using reshape package nomeclature).
> There are no "duplicates" in it.
> Fot each of the variables I have some "rules" to apply, being
COD_IN the
> value of the variable in the DF, COD_OUT the one to be transformed to;
> once obtained the "new codes" in the DF I have to aggregate the
"new DF"
> (for example summing the measure).
> Usually the total transformation (merge+aggregate) really decreases the
> number of lines in the data frame, but sometimes it can grows depending
> on the rule. Just to give an idea, the first "rule" in v1 maps
820
> different values into 7 ones.
> Using SQL and a database this can be done in a very straightforward way
> (for example on the variable v1):
>
> Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
> >From DF, RULE_v1
> Where v1=COD_IN
> Group by v2, v3,v4, v5, v6, v7
>
> So the first choice would be using a database; the second one would be
> splitting the data frame and then joining the results.
> Is there any other possibility to merge+aggregate caused by the merge ?
>
> Thank you in advance
>
> Angelo Linardi
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona
fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo
che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per
errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il
contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei
suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are
neither binding on
> the Bank nor to be understood as creating any obligation on its part except
where
> provided for in a written agreement. This e-mail is confidential. If you
have received it
> by mistake, please inform the sender by reply e-mail and delete it from
your system.
> Please also note that the unauthorized disclosure or use of the message or
any
> attachments could be an offence. Thank you for your cooperation. **
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 15
Date: Fri, 6 Jun 2008 06:14:54 -0700 (PDT)
From: Muhammad Azam <mazam72 at yahoo.com>
Subject: [R] request: a class having max frequency
To: R Help <r-help at r-project.org>, R-help request
<r-help-request at r-project.org>
Message-ID: <302639.59596.qm at web32203.mail.mud.yahoo.com>
Content-Type: text/plain
Dear R users
I have a very basic question. I tried but could not find the required result.
using
dat <- pima
f <- table(dat[,9])
> f
0 1
500 268
i want to find that class say "0" having maximum frequency i.e 500. I
used>which.max(f)
which provide
0
1
How can i get only the "0". Thanks and
best regards
Muhammad Azam
Ph.D. Student
Department of Medical Statistics,
Informatics and Health Economics
University of Innsbruck, Austria
[[alternative HTML version deleted]]
------------------------------
Message: 16
Date: Fri, 06 Jun 2008 09:18:55 -0400
From: Chuck Cleland <ccleland at optonline.net>
Subject: Re: [R] request: a class having max frequency
Cc: R Help <r-help at r-project.org>
Message-ID: <4849393F.8030900 at optonline.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 6/6/2008 9:14 AM, Muhammad Azam wrote:> Dear R users
> I have a very basic question. I tried but could not find the required
result. using
> dat <- pima
> f <- table(dat[,9])
>
>> f
> 0 1
> 500 268
> i want to find that class say "0" having maximum frequency i.e
500. I used
>> which.max(f)
> which provide
> 0
> 1
> How can i get only the "0". Thanks and
table(iris$Species)
setosa versicolor virginica
50 50 50
which.max(table(iris$Species))
setosa
1
names(which.max(table(iris$Species)))
[1] "setosa"
> best regards
>
> Muhammad Azam
> Ph.D. Student
> Department of Medical Statistics,
> Informatics and Health Economics
> University of Innsbruck, Austria
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
------------------------------
Message: 17
Date: Fri, 6 Jun 2008 08:21:41 -0500
From: "Michael Conklin" <michael.conklin at markettools.com>
Subject: Re: [R] request: a class having max frequency
<r-help at r-project.org>, "R-help request"
<r-help-request at r-project.org>
Message-ID:
<8EA061E48306894180DB020B0C6907A1015F37AF at MNMAIL02.markettools.com>
Content-Type: text/plain; charset="US-ASCII"
The 0 is the name of the item and the 1 is the index in f of the maximum
class. (since f is a table, and the first element of the table is the
maximum, which.max returns a 1) So, if you just want to know which class
is maximum you can say
names(which.max(f))
Michael Conklin
Chief Methodologist - Advanced Analytics
MarketTools, Inc.
6465 Wayzata Blvd. Suite 170
Minneapolis, MN 55426
Tel: 952.417.4719 | Mobile:612.201.8978
Michael.Conklin at markettools.com
MarketTools(r) http://www.markettools.com
This e-mail and any attachments may contain privileged, confidential or
proprietary information. If you are not the intended recipient, be aware
that any review, copying, or distribution of this e-mail or any
attachment is strictly prohibited. If you have received this e-mail in
error, please return it to the sender immediately, and permanently
delete the original and any copies from your system. Thank you for your
cooperation.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Muhammad Azam
Sent: Friday, June 06, 2008 8:15 AM
To: R Help; R-help request
Subject: [R] request: a class having max frequency
Dear R users
I have a very basic question. I tried but could not find the required
result. using
dat <- pima
f <- table(dat[,9])
> f
0 1
500 268
i want to find that class say "0" having maximum frequency i.e 500. I
used>which.max(f)
which provide
0
1
How can i get only the "0". Thanks and
best regards
Muhammad Azam
Ph.D. Student
Department of Medical Statistics,
Informatics and Health Economics
University of Innsbruck, Austria
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 18
Date: Fri, 06 Jun 2008 09:25:44 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] request: a class having max frequency
Cc: R Help <r-help at r-project.org>, R-help request
<r-help-request at r-project.org>
Message-ID: <48493AD8.6050204 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
names(f)[which.max(f)]
on 06/06/2008 09:14 AM Muhammad Azam said the following:> Dear R users
> I have a very basic question. I tried but could not find the required
result. using
> dat <- pima
> f <- table(dat[,9])
>
>> f
> 0 1
> 500 268
> i want to find that class say "0" having maximum frequency i.e
500. I used
>> which.max(f)
> which provide
> 0
> 1
> How can i get only the "0". Thanks and
>
>
> best regards
>
> Muhammad Azam
> Ph.D. Student
> Department of Medical Statistics,
> Informatics and Health Economics
> University of Innsbruck, Austria
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 19
Date: Fri, 06 Jun 2008 08:27:18 -0500
From: Erik Iverson <iverson at biostat.wisc.edu>
Subject: Re: [R] Problem in executing R on server
To: Jason Lee <huajie.lee at gmail.com>
Cc: r-help at r-project.org
Message-ID: <48493B36.1060605 at biostat.wisc.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
[[elided Yahoo spam]]
Jason Lee wrote:> Hi,
>
> I am not too sure its what you meant :-
> Below is the closest data for each session from "top"
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 26792 jason 25 0 283m 199m 2620 R 100 0.6 0:00.38 R
>
> The numbers changed as the processes are running. I am actually sharing
> the server with other few people. I dont think this is a problem.
>
> And, for my own pc,
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 6192 jason 25 0 157m 148m 2888 R 100 14.8 1081:21 R
>
> On Fri, Jun 6, 2008 at 12:46 PM, Erik Iverson <iverson at
biostat.wisc.edu
> <mailto:iverson at biostat.wisc.edu>> wrote:
>
[[elided Yahoo spam]]>
> Jason Lee wrote:
>
> Hi,
>
> I query free -m,
>
> On my server it is,
>
> total used free shared buffers
> cached
> Mem: 32190 8758 23431 0 742
> 2156
>
> And on my pc,
>
> total used free shared buffers
> cached
> Mem: 1002 986 16 0 132
> 255
>
>
> On the server, the above figure is after I exited the R.
> It seems that there are still alot free MB available if I am not
> wrong.
>
> On Fri, Jun 6, 2008 at 12:29 PM, Erik Iverson
> <iverson at biostat.wisc.edu <mailto:iverson at
biostat.wisc.edu>
> <mailto:iverson at biostat.wisc.edu
> <mailto:iverson at biostat.wisc.edu>>> wrote:
>
> How much RAM is installed in your Sun Solaris server? How
> much RAM
> is installed on your PC?
>
> Jason Lee wrote:
>
> Hi,
>
> I am actually trying to do some matrix multiplications of
> large
> datasets of 3000 columns and 150 rows.
>
> And I am running R version 2.7.0. <http://2.7.0.>
> <http://2.7.0.> <http://2.7.0.>
>
>
>
> I tried setting R --min-vsize=10M --max-vsize=100M
> --min-nsize=500k --max-nsize=1000M
>
> Yet I still get:-
>
> Error: cannot allocate vector of size 17.7 Mb
>
> I am running on Sun Solaris server.
>
> Please advise.
> Thanks.
> On Fri, Jun 6, 2008 at 11:50 AM, Erik Iverson
> <iverson at biostat.wisc.edu
> <mailto:iverson at biostat.wisc.edu>
> <mailto:iverson at biostat.wisc.edu <mailto:iverson at
biostat.wisc.edu>>
> <mailto:iverson at biostat.wisc.edu
> <mailto:iverson at biostat.wisc.edu>
> <mailto:iverson at biostat.wisc.edu
> <mailto:iverson at biostat.wisc.edu>>>> wrote:
>
>
>
> Jason Lee wrote:
>
> Hi R-listers,
>
> I have problem in executing my R on server. It
> returns me
>
> Error: cannot allocate vector of size 15.8 Mb
>
> each time when i execute R on the server. But it
> doesnt
> give me
> any problem
> when i try executing on my own Pc (except it runs
> extremely slow).
>
> Any pointers to this? I tried to read the FAQ on
> this issue
> before in the
> archive but it seems there is no one solution to
this.
>
>
> And that is because there is no one cause to this
> issue. I might
> guess your 'server' has less memory than your
'PC',
> but you
> didn't
> say anything your respective setups, or what you are even
> trying to
> do with R.
>
>
> I tried to
>
> simplified my code but it seems the problem is
> still the
> same.
>
>
>
> Please advise. Thanks.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at
r-project.org>
> <mailto:R-help at r-project.org <mailto:R-help at
r-project.org>>
> <mailto:R-help at r-project.org
> <mailto:R-help at r-project.org> <mailto:R-help at
r-project.org
> <mailto:R-help at r-project.org>>>
>
> mailing list
>
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
>
>
>
------------------------------
Message: 20
Date: Fri, 06 Jun 2008 09:28:38 -0400
From: Chuck Cleland <ccleland at optonline.net>
Subject: Re: [R] request: a class having max frequency
Cc: R Help <r-help at r-project.org>
Message-ID: <48493B86.3060409 at optonline.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 6/6/2008 9:18 AM, Chuck Cleland wrote:> On 6/6/2008 9:14 AM, Muhammad Azam wrote:
>> Dear R users
>> I have a very basic question. I tried but could not find the
>> required result. using
>> dat <- pima
>> f <- table(dat[,9])
>>
>>> f
>> 0 1 500 268
>> i want to find that class say "0" having maximum frequency
i.e 500. I
>> used
>>> which.max(f)
>> which provide 0 1 How can i get only the "0". Thanks and
>
> table(iris$Species)
>
> setosa versicolor virginica
> 50 50 50
>
> which.max(table(iris$Species))
> setosa
> 1
>
> names(which.max(table(iris$Species)))
> [1] "setosa"
If, as above, more than one category frequency is at the maximum, you
might want something like this:
x <- table(iris$Species)
which(x == max(x))
setosa versicolor virginica
1 2 3
names(which(x == max(x)))
[1] "setosa" "versicolor" "virginica"
>> best regards
>>
>> Muhammad Azam Ph.D. Student Department of Medical Statistics,
>> Informatics and Health Economics University of Innsbruck, Austria
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
------------------------------
Message: 21
Date: Fri, 6 Jun 2008 08:40:18 -0500
From: "Neil Gupta" <neil.gup at gmail.com>
Subject: [R] Manipulating DataSets
To: R-help at r-project.org
Message-ID:
<a51fe2df0806060640j4f1677b4h2e3b332ec0c2fd at mail.gmail.com>
Content-Type: text/plain
Hello R-users,
I have a very simple problem I wanted to solve. I have a large dataset as
such:
Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1
Time.1 TickType.1 ReferenceNumber.1
1 ES 3:ESZ7.GB 08:30:00 B 74390987 151075 44
3:ESZ7.GB08:30:00 A 74390988
2 ES 3:YMZ7.EC 08:30:00 B 74390993 13686 17
3:YMZ7.EC08:30:00 A 74390994
3 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49
3:ESZ7.GB08:30:00 A 74391136
4 YM 3:YMZ7.EC 08:30:00 B 74390998 13686 17
3:YMZ7.EC08:30:00 A 74390999
5 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49
3:ESZ7.GB08:30:00 A 74391136
6 YM 3:YMZ7.EC 08:30:00 B 74391000 13686 14
3:YMZ7.EC08:30:00 A 74391001
Price.1 Size.1 LeadTime MidPoint Spread
1 151100 22 08:30:00 *151087.5* 25
2 13688 27 08:30:00 13687.0 2
3 151100 22 08:30:00 *151087.5* 25
4 13688 27 08:30:00 13687.0 2
5 151100 22 08:30:00 151087.5 25
6 13688 27 08:30:00 13687.0 2
All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a
symbol "3:ESZ7.GB"
So the first one would be log(151087.5) - log(151087.5). I wanted to do this
throughout the data set and add that in another column. I would appreciate
any help.
Regards,
Neil Gupta
[[alternative HTML version deleted]]
------------------------------
Message: 22
Date: Fri, 6 Jun 2008 09:35:42 -0400
From: "Emslie, Paul [Ctr]" <emsliep at atac.mil>
Subject: [R] Subsetting to unique values
To: <r-help at r-project.org>
Message-ID: <9E17510E40D158498789549DD2D5DE7101F3C30D at nex01.atac.mil>
Content-Type: text/plain; charset="us-ascii"
I want to take the first row of each unique ID value from a data frame.
For instance> ddTable <-
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
I want a dataset that is
Id Name
1 Paul
2 Bob
> unique(ddTable)
Will give me all 4 rows, and> unique(ddTable$Id)
Will give me c(1,2), but not accompanied by the name column.
------------------------------
Message: 23
Date: Fri, 6 Jun 2008 15:58:10 +0200 (CEST)
From: Katharine Mullen <kate at few.vu.nl>
Subject: Re: [R] How can I display a characters table ?
To: Maura E Monville <maura.monville at gmail.com>
Cc: r-help <r-help at stat.math.ethz.ch>
Message-ID: <Pine.GSO.4.56.0806061555001.14989 at laurel.few.vu.nl>
Content-Type: TEXT/PLAIN; charset=US-ASCII
Dear Maura,
try the function textplot from the package gplots. you can say
textplot(yourmatrix) and get a plot of a character matrix.
On Fri, 6 Jun 2008, Maura E Monville wrote:
> I would like to generate a graphics text. I have a 67x2 table with
> 5-character string in col 1 and 2-character string in col 2.
> Is it possible to make such a table appear on a graphics or a
> message-box pop-up window ?
>
> Thank you so much.
> --
> Maura E.M
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 24
Date: Fri, 6 Jun 2008 16:01:23 +0200
From: Giovanna.Jonalasinio at uniroma1.it
Subject: [R] Giovanna Jonalasinio ? fuori ufficio, I'm away
To: r-help at r-project.org
Message-ID:
<OF43D77A5D.F29EAA64-ONC1257460.004D0831-C1257460.004D0831 at
Uniroma1.it>
Content-Type: text/plain
Risposta automatica dal 06/06/08 fino al 14/06/08
I'm going to have limited access to my email untill the 14th of june 2008
Avrr accesso limitato all'email fino al 14 giugno 2008
[[alternative HTML version deleted]]
------------------------------
Message: 25
Date: Fri, 06 Jun 2008 10:08:34 -0400
From: Chuck Cleland <ccleland at optonline.net>
Subject: Re: [R] Subsetting to unique values
To: "Emslie, Paul [Ctr]" <emsliep at atac.mil>
Cc: r-help at r-project.org
Message-ID: <484944E2.1090600 at optonline.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 6/6/2008 9:35 AM, Emslie, Paul [Ctr] wrote:> I want to take the first row of each unique ID value from a data frame.
> For instance
>> ddTable <-
>
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
>> unique(ddTable)
> Will give me all 4 rows, and
>> unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
ddTable <-
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
!duplicated(ddTable$Id)
[1] TRUE FALSE TRUE FALSE
ddTable[!duplicated(ddTable$Id),]
Id name
1 1 Paul
3 2 Bob
?duplicated
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
------------------------------
Message: 26
Date: Fri, 6 Jun 2008 10:13:55 -0400
From: "stephen sefick" <ssefick at gmail.com>
Subject: Re: [R] simple data question
To: "Daniel Folkinshteyn" <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<c502a9e10806060713w199e5511xedebb5d3b1cad7bc at mail.gmail.com>
Content-Type: text/plain
Good point. Thanks
On Fri, Jun 6, 2008 at 9:05 AM, Daniel Folkinshteyn <dfolkins at
gmail.com>
wrote:
> should work - don't even have to put them in quotes, if your field
> separator is not space. why don't you just try it and see what comes
out? :)
>
> on 06/06/2008 08:43 AM stephen sefick said the following:
>
> if I wanted to use a name for a column with two words say Dick Cheney and
>> George Bush
>> can I put these in quotes "Dick Cheney" and "George
Bush" to get them to
>> read into R using both read.table and read.zoo to recognize this.
>> thanks
>>
>> Stephen
>>
>>
--
Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods. We are mammals, and have not exhausted the annoying little
problems of being mammals.
-K. Mullis
[[alternative HTML version deleted]]
------------------------------
Message: 27
Date: Fri, 6 Jun 2008 07:22:23 -0700 (PDT)
From: John Kane <jrkrideau at yahoo.ca>
Subject: Re: [R] Subsetting to unique values
To: r-help at r-project.org, "Emslie, Paul \[Ctr\]" <emsliep at
atac.mil>
Message-ID: <217081.40430.qm at web32807.mail.mud.yahoo.com>
Content-Type: text/plain; charset=us-ascii
I don't have R on this machine but will this work.
myrows <- unique(ddTable[,1])
unis <- ddTable(myrows, ]
--- On Fri, 6/6/08, Emslie, Paul [Ctr] <emsliep at atac.mil> wrote:
> From: Emslie, Paul [Ctr] <emsliep at atac.mil>
> Subject: [R] Subsetting to unique values
> To: r-help at r-project.org
> Received: Friday, June 6, 2008, 9:35 AM
> I want to take the first row of each unique ID value from a
> data frame.
> For instance
> > ddTable <-
>
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name
> column.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
------------------------------
Message: 28
Date: Fri, 6 Jun 2008 17:29:26 +0300
From: "Eleni Christodoulou" <elenichri at gmail.com>
Subject: Re: [R] which question
To: "Dieter Menne" <dieter.menne at menne-biomed.de>
Cc: r-help at stat.math.ethz.ch
Message-ID:
<2293b7660806060729t2d0cf78mefae840b427807f5 at mail.gmail.com>
Content-Type: text/plain
An example is:
symbol=human[which(human[,3] %in% genes.sam.names),8]
The data* human* and *genes.sam.names* are attached. The result of the above
command is:> symbol
[1] CCL18 MARCO SYT13
[4] FOXC1 CDH3
[7] CA12 CELSR1 NM_018440
[10] MICROTUBULE-ASSOCIATED NM_015529 ESR1
[13] PHGDH GABRP LGMN
[16] MMP9 BMP7 KLF5
[19] RIPK2 GATA3 NM_032023
[22] TRIM2 CCND1 MMP12
[25] LDHB AF493978 SOD2
[28] SOD2 SOD2 NME5
[31] STC2 RBP1 ROPN1
[34] RDH10 KRTHB1 SLPI
[37] BBOX1 FOXA1 NM_005669
[40] MCCC2 CHI3L1 GSTM3
[43] LPIN1 DSC2 FADS2
[46] ELF5 CYP1B1 LMO4
[49] AL035297 NM_152398 AB018342
[52] PIK3R1 NFKBIE MLZE
[55] NFIB NM_052997 NM_006023
[58] CPB1 CXCL13 CBR3
[61] NM_017527 FABP7 DACH
[64] IFI27 ACOX2 CXCL11
[67] UGP2 CLDN4 M12740
[70] IGKC IGKC CLECSF12
[73] AY069977 HOXB2 SOX11
[76] NM_017422 TLR2
[79] CKS1B BC017946 APOBEC3B
[82] HLA-DRB1 HLA-DQB1
[85] CCL13 C4orf7
[88] NM_173552
21345 Levels: (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX
As you can see, apart from gene symbols, which is the required thing, RefSeq
ID sare also retrieved...
Thanks a lot,
Eleni
On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne <dieter.menne at
menne-biomed.de>
wrote:
> Eleni Christodoulou <elenichri <at> gmail.com> writes:
>
> > I was trying to select a column of a data frame using the *which*
> command. I
> > was actually selecting the rows of the data frame using *which, *and
then
> > displayed a certain column of it. The command that I was using is:
> > sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
> ....
> Please provide a running example. The *mydata* are difficult to read.
>
>
> Dieter
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
------------------------------
Message: 29
Date: Fri, 6 Jun 2008 14:38:50 +0000 (UTC)
From: Adrian Dusa <dusa.adrian at gmail.com>
Subject: Re: [R] Subsetting to unique values
To: r-help at stat.math.ethz.ch
Message-ID: <loom.20080606T143745-451 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
Emslie, Paul [Ctr] <emsliep <at> atac.mil> writes:
>
> I want to take the first row of each unique ID value from a data frame.
> For instance
> > ddTable <-
>
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
ddTable[-which(duplicated(ddTable$Id)), ]
HTH,
Adrian
------------------------------
Message: 30
Date: Fri, 6 Jun 2008 07:39:32 -0700
From: Dennis Fisher <fisher at plessthan.com>
Subject: [R] Startup speed for a lengthy script
To: r-help at stat.math.ethz.ch
Message-ID: <808B5F10-1D7A-4A51-9BC2-548FA9391DEC at plessthan.com>
Content-Type: text/plain
Colleagues,
Several days ago, I wrote to the list about a lengthy delay in startup
of a a script. I will start with a brief summary of that email. I
have a 10,000 line script of which the final 3000 lines constitute a
function. The script contains time-markers (cat(date()) to that I can
determine how fast it was read. When I invoke the script from the OS
("R --slave < Script.R"; similar performance with R 2.6.1 or 2.7.0
on
a Mac / Linux / Windows), the first 7000 lines were read in 5 seconds,
then it took 2 minutes to read the remaining 3000 lines. I inquired
as to the cause for the lengthy reading of the final 3000 lines.
Subsequently, I whittled the 3000 lines to ~ 1000 (moving 2000 lines
to smaller functions). Now the first 9000 lines still reads in ~ 6
seconds and the final 1000 lines in ~ 15 seconds. Better but not ideal.
However, I just encountered a new situation that I don't understand.
The R code is now embedded in a graphical interface built with Real
Basic. When I invoke the script in that environment, the first 9000
lines takes the usual 6 seconds. But, to my surprise, the final 1000
[[elided Yahoo spam]]
There is one major difference in the implementation. With the GUI,
the commands are "pushed", i.e., the GUI opens R, then sends a
continuous stream of code.
Does anyone have any idea as to why the delay should be so different
in the two settings?
Dennis
Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-415-564-2220
www.PLessThan.com
[[alternative HTML version deleted]]
------------------------------
Message: 31
Date: Fri, 6 Jun 2008 09:37:59 -0500
From: "Dumblauskas, Jerry" <jerry.dumblauskas at
credit-suisse.com>
Subject: Re: [R] Java to R interface
To: r-help at r-project.org
Message-ID:
<6BEE6042FD73A54BB6E88969828C6B06014622CA at
ECHI17P30001A.csfb.cs-group.com>
Content-Type: text/plain
Try and make sure that R is in your windows Path variable
I got your message when I first did this, but when I did the about it
then worked...
=============================================================================Please
access the attached hyperlink for an important electronic communications
disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
=============================================================================
[[alternative HTML version deleted]]
------------------------------
Message: 32
Date: Fri, 06 Jun 2008 15:41:49 +0100
From: Richard Pearson <richard.pearson at postgrad.manchester.ac.uk>
Subject: Re: [R] which question
To: Eleni Christodoulou <elenichri at gmail.com>
Cc: Dieter Menne <dieter.menne at menne-biomed.de>,
r-help at stat.math.ethz.ch
Message-ID: <48494CAD.9010204 at postgrad.manchester.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
I didn't get any attached data, but my suspicion here is that you have
somehow got RefSeq IDs in column 8 of human, as well as the gene symbols. Did
you read this data in from a text file?
Eleni Christodoulou wrote:> An example is:
>
> symbol=human[which(human[,3] %in% genes.sam.names),8]
>
> The data* human* and *genes.sam.names* are attached. The result of the
above
> command is:
>> symbol
> [1] CCL18 MARCO SYT13
> [4] FOXC1 CDH3
> [7] CA12 CELSR1 NM_018440
> [10] MICROTUBULE-ASSOCIATED NM_015529 ESR1
> [13] PHGDH GABRP LGMN
> [16] MMP9 BMP7 KLF5
> [19] RIPK2 GATA3 NM_032023
> [22] TRIM2 CCND1 MMP12
> [25] LDHB AF493978 SOD2
> [28] SOD2 SOD2 NME5
> [31] STC2 RBP1 ROPN1
> [34] RDH10 KRTHB1 SLPI
> [37] BBOX1 FOXA1 NM_005669
> [40] MCCC2 CHI3L1 GSTM3
> [43] LPIN1 DSC2 FADS2
> [46] ELF5 CYP1B1 LMO4
> [49] AL035297 NM_152398 AB018342
> [52] PIK3R1 NFKBIE MLZE
> [55] NFIB NM_052997 NM_006023
> [58] CPB1 CXCL13 CBR3
> [61] NM_017527 FABP7 DACH
> [64] IFI27 ACOX2 CXCL11
> [67] UGP2 CLDN4 M12740
> [70] IGKC IGKC CLECSF12
> [73] AY069977 HOXB2 SOX11
> [76] NM_017422 TLR2
> [79] CKS1B BC017946 APOBEC3B
> [82] HLA-DRB1 HLA-DQB1
> [85] CCL13 C4orf7
> [88] NM_173552
> 21345 Levels: (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX
>
> As you can see, apart from gene symbols, which is the required thing,
RefSeq
> ID sare also retrieved...
>
> Thanks a lot,
> Eleni
>
>
>
>
>
>
> On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne <dieter.menne at
menne-biomed.de>
> wrote:
>
>> Eleni Christodoulou <elenichri <at> gmail.com> writes:
>>
>>> I was trying to select a column of a data frame using the *which*
>> command. I
>>> was actually selecting the rows of the data frame using *which,
*and then
>>> displayed a certain column of it. The command that I was using is:
>>> sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
>> ....
>> Please provide a running example. The *mydata* are difficult to read.
>>
>>
>> Dieter
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Richard D. Pearson richard.pearson at postgrad.manchester.ac.uk
School of Computer Science, http://www.cs.man.ac.uk/~pearsonr
University of Manchester, Tel: +44 161 275 6178
Oxford Road, Mob: +44 7971 221181
Manchester M13 9PL, UK. Fax: +44 161 275 6204
------------------------------
Message: 33
Date: Fri, 06 Jun 2008 10:55:53 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Merging two dataframes
To: Michael Pearmain <mpearmain at google.com>, R Help
<r-help at r-project.org>
Message-ID: <48494FF9.6010900 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
cool. :) yea, the argument names are by.x and by.y, so your by.etc were
ignored in the black hole of "arguments passed to other methods"
on 06/06/2008 09:11 AM Michael Pearmain said the
following:> Thanks
> Works perfectly.
> Was the problem due to me putting by.survey and by.etc rather than by.y
> and by.x?
>
> I think when i was playing around i tried the all. command in that setup
> as well
>
> Mike
>
>
>
> On Fri, Jun 6, 2008 at 2:07 PM, Daniel Folkinshteyn <dfolkins at
gmail.com
> <mailto:dfolkins at gmail.com>> wrote:
>
> try this:
> FullData <- merge(ETC, SURVEY, by.x = "ord", by.y =
"uid", all.x > T, all.y = F)
>
> on 06/06/2008 07:30 AM Michael Pearmain said the following:
>
> Hi All,
>
> Newbie question for you all but i have been looking at the
> archieves and the
> help dtuff to get a rough idea of what i want to do
>
> I would like to merge two dataframes together based on a keyed
> variable in
> one dataframe linking to the other dataframe. Only some of the
> cases will
> match but i would like to keep the others as well.
>
> My dataframes have 67 and 28 cases respectively and i would like
> ot end uip
> with one file 67 cases long (all 28 are matched cases).
>
>
> I can use the merge command to merge two datasets together this
> but i still
> get some
> odd results, i'm using the code below;
>
> ETC <-
read.csv(file="CSV_Data2.csv",head=TRUE,sep=",")
> 'SURVEY <-
read.csv(file="survey.csv",head=TRUE,sep=",")
> 'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid",
by.ETC = "ord")
>
> The merged file seems to have 1800 cases while the ETC data file
> only
> has 67 and the SURVEY file only has 28. (Reading the help it
> looks as if it
> merges 1 case with all cases in the other file, which is not
> what i want)
>
> The matching variables fields are the 'ord' field and the
'uid'
> field
> Can anyone advise please?
>
>
>
>
> --
> Michael Pearmain
> Senior Statistical Analyst
>
>
> 1st Floor, 180 Great Portland St. London W1W 5QZ
> t +44 (0) 2032191684
> mpearmain at google.com <mailto:mpearmain at google.com>
> mpearmain at doubleclick.com <mailto:mpearmain at doubleclick.com>
>
>
> Doubleclick is a part of the Google group of companies
------------------------------
Message: 34
Date: Fri, 6 Jun 2008 10:04:33 -0500
From: "Alexys Herleym Rodriguez Avellaneda" <alexyshr at
gmail.com>
Subject: [R] fit.variogram sgeostat error
To: r-help at r-project.org
Message-ID:
<ae405c330806060804t787392bewcd1b7e71da69ea0e at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi,
When i do the next line it work fine:
fit.spherical(var, 0, 2.6, 250, type='c', iterations=10,
tolerance=1e-06, echo=FALSE, plot.it=T, weighted=TRUE, delta=0.1,
verbose=TRUE)
But, i use the next and send one error:
fit.variogram("spherical", var, nugget=0, sill=2.6, range=250,
plot.it=TRUE, iterations=0)
This is the error:
Error in fit.variogram("spherical", var, nugget = 0, sill = 2.6, range
= 250, :
unused argument(s) (nugget = 0, sill = 2.6, range = 250, plot.it TRUE,
iterations = 0)
any suggest?
Alexys H
------------------------------
Message: 35
Date: Fri, 06 Jun 2008 17:05:58 +0200
From: Dani Valverde <daniel.valverde at uab.cat>
Subject: [R] lsmeans
To: R Help <r-help at r-project.org>
Message-ID: <48495256.20804 at uab.cat>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello,
I have the next function call:
lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID, data=error_DB)
which returns an lme object. I am interested on carrying out some kind
of lsmeans on the data returned, but I cannot find any function to do
this in R. I'have seen the effect() function, but it does not work with
lme objects. Any idea?
Best,
Dani
--
Daniel Valverde Saub?
Grup de Biologia Molecular de Llevats
Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona
Edifici V, Campus UAB
08193 Cerdanyola del Vall?s- SPAIN
Centro de Investigaci?n Biom?dica en Red
en Bioingenier?a, Biomateriales y
Nanomedicina (CIBER-BBN)
Grup d'Aplicacions Biom?diques de la RMN
Facultat de Bioci?ncies
Universitat Aut?noma de Barcelona
Edifici Cs, Campus UAB
08193 Cerdanyola del Vall?s- SPAIN
+34 93 5814126
------------------------------
Message: 36
Date: Fri, 06 Jun 2008 11:12:32 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: r-help at r-project.org
Message-ID: <484953E0.70508 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Anybody have any thoughts on this? Please? :)
on 06/05/2008 02:09 PM Daniel Folkinshteyn said the
following:> Hi everyone!
>
> I have a question about data processing efficiency.
>
> My data are as follows: I have a data set on quarterly institutional
> ownership of equities; some of them have had recent IPOs, some have not
> (I have a binary flag set). The total dataset size is 700k+ rows.
>
> My goal is this: For every quarter since issue for each IPO, I need to
> find a "matched" firm in the same industry, and close in market
cap. So,
> e.g., for firm X, which had an IPO, i need to find a matched non-issuing
> firm in quarter 1 since IPO, then a (possibly different) non-issuing
> firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there
> are about 8300 of these).
>
> Thus it seems to me that I need to be doing a lot of data selection and
> subsetting, and looping (yikes!), but the result appears to be highly
> inefficient and takes ages (well, many hours). What I am doing, in
> pseudocode, is this:
>
> 1. for each quarter of data, getting out all the IPOs and all the
> eligible non-issuing firms.
> 2. for each IPO in a quarter, grab all the non-issuers in the same
> industry, sort them by size, and finally grab a matching firm closest in
> size (the exact procedure is to grab the closest bigger firm if one
> exists, and just the biggest available if all are smaller)
> 3. assign the matched firm-observation the same "quarters since
issue"
> as the IPO being matched
> 4. rbind them all into the "matching" dataset.
>
> The function I currently have is pasted below, for your reference. Is
> there any way to make it produce the same result but much faster?
> Specifically, I am guessing eliminating some loops would be very good,
> but I don't see how, since I need to do some fancy footwork for each
IPO
> in each quarter to find the matching firm. I'll be doing a few things
> similar to this, so it's somewhat important to up the efficiency of
> this. Maybe some of you R-fu masters can clue me in? :)
>
> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>
> ========== my function below ==========>
> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
> quarters_since_issue=40) {
>
> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
> cheaper, so typecast the result to matrix
>
> colnames = names(tfdata)
>
> quarterends = sort(unique(tfdata$DATE))
>
> for (aquarter in quarterends) {
> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>
> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
&
> (tfdata_quarter$IPO.Flag == 0), ]
> tfdata_quarter_ipoissuers = tfdata_quarter[
> tfdata_quarter$IPO.Flag == 1, ]
>
> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
> arow = tfdata_quarter_ipoissuers[i,]
> industrypeers = tfdata_quarter_fitting_nonissuers[
> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
> industrypeers = industrypeers[
> order(industrypeers$Market.Cap.13f), ]
> if ( nrow(industrypeers) > 0 ) {
> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>> arow$Market.Cap.13f, ]) > 0 ) {
> bestpeer >
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
> }
> else {
> bestpeer = industrypeers[nrow(industrypeers),]
> }
> bestpeer$Quarters.Since.IPO.Issue >
arow$Quarters.Since.IPO.Issue
>
> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO =>
bestpeer$PERMNO] = 1
> result = rbind(result, as.matrix(bestpeer))
> }
> }
> #result = rbind(result, tfdata_quarter)
> print (aquarter)
> }
>
> result = as.data.frame(result)
> names(result) = colnames
> return(result)
>
> }
>
> ========= end of my function ============>
------------------------------
Message: 37
Date: Fri, 6 Jun 2008 17:36:18 +0200
From: DAVID ARTETA GARCIA <darteta001 at ikasle.ehu.es>
Subject: [R] store filename
To: "r-help at r-project.org" <r-help at r-project.org>
Message-ID: <20080606173618.dcn20wksgwsosoo8 at www.ehu.es>
Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes";
format="flowed"
------------------------------
Message: 38
Date: Fri, 6 Jun 2008 10:36:30 -0500
From: "hadley wickham" <h.wickham at gmail.com>
Subject: Re: [R] label outliers in geom_boxplot (ggplot2)
To: " Mihalicza P?ter " <mihalicza.peter at eski.hu>
Cc: r-help at r-project.org
Message-ID:
<f8e6ff050806060836i1de4ecfk4aab39e5bed596af at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
> It's too obvious, so I am positive that there is a good reason for not
doing
> this, but still:
> why is it not possible, to have an "outlier" output in
stat_boxplot that can
> be used at geom_text()?
>
> Something like this, with "upper":
> > dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5),
> name=letters[1:20])
> > ggplot(dat, aes(y=val, x=num))+stat_boxplot(outlier.size=4,
> + outlier.colour="green")+geom_text(aes(y=..upper..),
label="This is upper
> hinge")
>
> Unfortunately, this does not work and gives the error message:
> Error in eval(expr, envir, enclos) : object "upper" not found
>
> Is it because you can only use stat outputs within the stat statements?
> Could it be possible to make them available outside the statements too?
You can generally, but it won't work here. The problem is that you
want a different y aesthetic for the statistic (val) than you do for
the geom (upper) and there's no way to get around that with the
current design of ggplot2.
Hadley
--
http://had.co.nz/
------------------------------
Message: 39
Date: Fri, 06 Jun 2008 16:44:26 +0100
From: Patrick Burns <pburns at pburns.seanet.com>
Subject: Re: [R] Improving data processing efficiency
To: Daniel Folkinshteyn <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID: <48495B5A.3040005 at pburns.seanet.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
One thing that is likely to speed the code significantly
is if you create 'result' to be its final size and then
subscript into it. Something like:
result[i, ] <- bestpeer
(though I'm not sure if 'i' is the proper index).
Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
Daniel Folkinshteyn wrote:> Anybody have any thoughts on this? Please? :)
>
> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>> Hi everyone!
>>
>> I have a question about data processing efficiency.
>>
>> My data are as follows: I have a data set on quarterly institutional
>> ownership of equities; some of them have had recent IPOs, some have
>> not (I have a binary flag set). The total dataset size is 700k+ rows.
>>
>> My goal is this: For every quarter since issue for each IPO, I need
>> to find a "matched" firm in the same industry, and close in
market
>> cap. So, e.g., for firm X, which had an IPO, i need to find a matched
>> non-issuing firm in quarter 1 since IPO, then a (possibly different)
>> non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing
>> firm (there are about 8300 of these).
>>
>> Thus it seems to me that I need to be doing a lot of data selection
>> and subsetting, and looping (yikes!), but the result appears to be
>> highly inefficient and takes ages (well, many hours). What I am
>> doing, in pseudocode, is this:
>>
>> 1. for each quarter of data, getting out all the IPOs and all the
>> eligible non-issuing firms.
>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>> industry, sort them by size, and finally grab a matching firm closest
>> in size (the exact procedure is to grab the closest bigger firm if
>> one exists, and just the biggest available if all are smaller)
>> 3. assign the matched firm-observation the same "quarters since
>> issue" as the IPO being matched
>> 4. rbind them all into the "matching" dataset.
>>
>> The function I currently have is pasted below, for your reference. Is
>> there any way to make it produce the same result but much faster?
>> Specifically, I am guessing eliminating some loops would be very
>> good, but I don't see how, since I need to do some fancy footwork
for
>> each IPO in each quarter to find the matching firm. I'll be doing a
>> few things similar to this, so it's somewhat important to up the
>> efficiency of this. Maybe some of you R-fu masters can clue me in? :)
>>
>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>
>> ========== my function below ==========>>
>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>> quarters_since_issue=40) {
>>
>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>> cheaper, so typecast the result to matrix
>>
>> colnames = names(tfdata)
>>
>> quarterends = sort(unique(tfdata$DATE))
>>
>> for (aquarter in quarterends) {
>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>
>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
&
>> (tfdata_quarter$IPO.Flag == 0), ]
>> tfdata_quarter_ipoissuers = tfdata_quarter[
>> tfdata_quarter$IPO.Flag == 1, ]
>>
>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>> arow = tfdata_quarter_ipoissuers[i,]
>> industrypeers = tfdata_quarter_fitting_nonissuers[
>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>> industrypeers = industrypeers[
>> order(industrypeers$Market.Cap.13f), ]
>> if ( nrow(industrypeers) > 0 ) {
>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>> >= arow$Market.Cap.13f, ]) > 0 ) {
>> bestpeer >>
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>> }
>> else {
>> bestpeer = industrypeers[nrow(industrypeers),]
>> }
>> bestpeer$Quarters.Since.IPO.Issue >>
arow$Quarters.Since.IPO.Issue
>>
>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO =>>
bestpeer$PERMNO] = 1
>> result = rbind(result, as.matrix(bestpeer))
>> }
>> }
>> #result = rbind(result, tfdata_quarter)
>> print (aquarter)
>> }
>>
>> result = as.data.frame(result)
>> names(result) = colnames
>> return(result)
>>
>> }
>>
>> ========= end of my function ============>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------
Message: 40
Date: Fri, 6 Jun 2008 11:45:44 -0400
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: "Daniel Folkinshteyn" <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<971536df0806060845v66784addo64cf0d3a1bfee377 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Try reading the posting guide before posting.
On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:> Anybody have any thoughts on this? Please? :)
>
> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>
>> Hi everyone!
>>
>> I have a question about data processing efficiency.
>>
>> My data are as follows: I have a data set on quarterly institutional
>> ownership of equities; some of them have had recent IPOs, some have not
(I
>> have a binary flag set). The total dataset size is 700k+ rows.
>>
>> My goal is this: For every quarter since issue for each IPO, I need to
>> find a "matched" firm in the same industry, and close in
market cap. So,
>> e.g., for firm X, which had an IPO, i need to find a matched
non-issuing
>> firm in quarter 1 since IPO, then a (possibly different) non-issuing
firm in
>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are about
8300
>> of these).
>>
>> Thus it seems to me that I need to be doing a lot of data selection and
>> subsetting, and looping (yikes!), but the result appears to be highly
>> inefficient and takes ages (well, many hours). What I am doing, in
>> pseudocode, is this:
>>
>> 1. for each quarter of data, getting out all the IPOs and all the
eligible
>> non-issuing firms.
>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>> industry, sort them by size, and finally grab a matching firm closest
in
>> size (the exact procedure is to grab the closest bigger firm if one
exists,
>> and just the biggest available if all are smaller)
>> 3. assign the matched firm-observation the same "quarters since
issue" as
>> the IPO being matched
>> 4. rbind them all into the "matching" dataset.
>>
>> The function I currently have is pasted below, for your reference. Is
>> there any way to make it produce the same result but much faster?
>> Specifically, I am guessing eliminating some loops would be very good,
but I
>> don't see how, since I need to do some fancy footwork for each IPO
in each
>> quarter to find the matching firm. I'll be doing a few things
similar to
>> this, so it's somewhat important to up the efficiency of this.
Maybe some of
>> you R-fu masters can clue me in? :)
>>
>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>
>> ========== my function below ==========>>
>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>> quarters_since_issue=40) {
>>
>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>> cheaper, so typecast the result to matrix
>>
>> colnames = names(tfdata)
>>
>> quarterends = sort(unique(tfdata$DATE))
>>
>> for (aquarter in quarterends) {
>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>
>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
&
>> (tfdata_quarter$IPO.Flag == 0), ]
>> tfdata_quarter_ipoissuers = tfdata_quarter[
tfdata_quarter$IPO.Flag
>> == 1, ]
>>
>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>> arow = tfdata_quarter_ipoissuers[i,]
>> industrypeers = tfdata_quarter_fitting_nonissuers[
>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>> industrypeers = industrypeers[
>> order(industrypeers$Market.Cap.13f), ]
>> if ( nrow(industrypeers) > 0 ) {
>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>>> arow$Market.Cap.13f, ]) > 0 ) {
>> bestpeer =
industrypeers[industrypeers$Market.Cap.13f
>> >= arow$Market.Cap.13f, ][1,]
>> }
>> else {
>> bestpeer = industrypeers[nrow(industrypeers),]
>> }
>> bestpeer$Quarters.Since.IPO.Issue >>
arow$Quarters.Since.IPO.Issue
>>
>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO =>>
bestpeer$PERMNO] = 1
>> result = rbind(result, as.matrix(bestpeer))
>> }
>> }
>> #result = rbind(result, tfdata_quarter)
>> print (aquarter)
>> }
>>
>> result = as.data.frame(result)
>> names(result) = colnames
>> return(result)
>>
>> }
>>
>> ========= end of my function ============>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 41
Date: Fri, 6 Jun 2008 17:51:34 +0200
From: DAVID ARTETA GARCIA <darteta001 at ikasle.ehu.es>
Subject: [R] Store filename
To: "r-help at r-project.org" <r-help at r-project.org>
Message-ID: <20080606175134.qh2v6povk0o4co4s at www.ehu.es>
Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes";
format="flowed"
Hi list,
Is it possible to save the name of a filename automatically when
reading it using read.table() or some other function?
My aim is to create then an output table with the name of the original
table with a suffix like _out
example:
mydata = read.table("Run224_v2_060308.txt", sep = "\t",
header = TRUE)
## store name?
myfile = the_name_of_the_file
## do analysis of data and store in a data.frame "myoutput"
## write output in tab format
write.table(myoutput, c(myfile,"_out.txt"),sep="\t")
the name of the new file will be
"Run224_v2_060308_out.txt"
Thanks in advanve,
David
------------------------------
Message: 42
Date: Fri, 6 Jun 2008 09:56:27 -0600
From: "Nanye Long" <nanye.long at gmail.com>
Subject: [R] where to download BRugs?
To: r-help at r-project.org
Message-ID:
<d3fc68d40806060856h1ccb5475u40c2ffa08d75ef32 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi all,
Does anyone know where to download the "BRugs" package? I did not find
it on r-project website. Thanks.
NL
------------------------------
Message: 43
Date: Fri, 06 Jun 2008 12:03:13 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Gabor Grothendieck <ggrothendieck at gmail.com>
Cc: r-help at r-project.org
Message-ID: <48495FC1.5060900 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
i did! what did i miss?
on 06/06/2008 11:45 AM Gabor Grothendieck said the
following:> Try reading the posting guide before posting.
>
> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:
>> Anybody have any thoughts on this? Please? :)
>>
>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>> Hi everyone!
>>>
>>> I have a question about data processing efficiency.
>>>
>>> My data are as follows: I have a data set on quarterly
institutional
>>> ownership of equities; some of them have had recent IPOs, some have
not (I
>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>
>>> My goal is this: For every quarter since issue for each IPO, I need
to
>>> find a "matched" firm in the same industry, and close in
market cap. So,
>>> e.g., for firm X, which had an IPO, i need to find a matched
non-issuing
>>> firm in quarter 1 since IPO, then a (possibly different)
non-issuing firm in
>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are
about 8300
>>> of these).
>>>
>>> Thus it seems to me that I need to be doing a lot of data selection
and
>>> subsetting, and looping (yikes!), but the result appears to be
highly
>>> inefficient and takes ages (well, many hours). What I am doing, in
>>> pseudocode, is this:
>>>
>>> 1. for each quarter of data, getting out all the IPOs and all the
eligible
>>> non-issuing firms.
>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>> industry, sort them by size, and finally grab a matching firm
closest in
>>> size (the exact procedure is to grab the closest bigger firm if one
exists,
>>> and just the biggest available if all are smaller)
>>> 3. assign the matched firm-observation the same "quarters
since issue" as
>>> the IPO being matched
>>> 4. rbind them all into the "matching" dataset.
>>>
>>> The function I currently have is pasted below, for your reference.
Is
>>> there any way to make it produce the same result but much faster?
>>> Specifically, I am guessing eliminating some loops would be very
good, but I
>>> don't see how, since I need to do some fancy footwork for each
IPO in each
>>> quarter to find the matching firm. I'll be doing a few things
similar to
>>> this, so it's somewhat important to up the efficiency of this.
Maybe some of
>>> you R-fu masters can clue me in? :)
>>>
>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>
>>> ========== my function below ==========>>>
>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>> quarters_since_issue=40) {
>>>
>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>> cheaper, so typecast the result to matrix
>>>
>>> colnames = names(tfdata)
>>>
>>> quarterends = sort(unique(tfdata$DATE))
>>>
>>> for (aquarter in quarterends) {
>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>
>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>> (tfdata_quarter$IPO.Flag == 0), ]
>>> tfdata_quarter_ipoissuers = tfdata_quarter[
tfdata_quarter$IPO.Flag
>>> == 1, ]
>>>
>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>> arow = tfdata_quarter_ipoissuers[i,]
>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>> industrypeers = industrypeers[
>>> order(industrypeers$Market.Cap.13f), ]
>>> if ( nrow(industrypeers) > 0 ) {
>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>> bestpeer =
industrypeers[industrypeers$Market.Cap.13f
>>>> = arow$Market.Cap.13f, ][1,]
>>> }
>>> else {
>>> bestpeer = industrypeers[nrow(industrypeers),]
>>> }
>>> bestpeer$Quarters.Since.IPO.Issue >>>
arow$Quarters.Since.IPO.Issue
>>>
>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>> bestpeer$PERMNO] = 1
>>> result = rbind(result, as.matrix(bestpeer))
>>> }
>>> }
>>> #result = rbind(result, tfdata_quarter)
>>> print (aquarter)
>>> }
>>>
>>> result = as.data.frame(result)
>>> names(result) = colnames
>>> return(result)
>>>
>>> }
>>>
>>> ========= end of my function ============>>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
------------------------------
Message: 44
Date: Fri, 6 Jun 2008 12:05:21 -0400
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: "Daniel Folkinshteyn" <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<971536df0806060905t4ed24ec3nf353155b6e129a9f at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Its summarized in the last line to r-help. Note reproducible and
minimal.
On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:> i did! what did i miss?
>
> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>
>> Try reading the posting guide before posting.
>>
>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins at
gmail.com>
>> wrote:
>>>
>>> Anybody have any thoughts on this? Please? :)
>>>
>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>
>>>> Hi everyone!
>>>>
>>>> I have a question about data processing efficiency.
>>>>
>>>> My data are as follows: I have a data set on quarterly
institutional
>>>> ownership of equities; some of them have had recent IPOs, some
have not
>>>> (I
>>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>>
>>>> My goal is this: For every quarter since issue for each IPO, I
need to
>>>> find a "matched" firm in the same industry, and close
in market cap. So,
>>>> e.g., for firm X, which had an IPO, i need to find a matched
non-issuing
>>>> firm in quarter 1 since IPO, then a (possibly different)
non-issuing
>>>> firm in
>>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there
are about
>>>> 8300
>>>> of these).
>>>>
>>>> Thus it seems to me that I need to be doing a lot of data
selection and
>>>> subsetting, and looping (yikes!), but the result appears to be
highly
>>>> inefficient and takes ages (well, many hours). What I am doing,
in
>>>> pseudocode, is this:
>>>>
>>>> 1. for each quarter of data, getting out all the IPOs and all
the
>>>> eligible
>>>> non-issuing firms.
>>>> 2. for each IPO in a quarter, grab all the non-issuers in the
same
>>>> industry, sort them by size, and finally grab a matching firm
closest in
>>>> size (the exact procedure is to grab the closest bigger firm if
one
>>>> exists,
>>>> and just the biggest available if all are smaller)
>>>> 3. assign the matched firm-observation the same "quarters
since issue"
>>>> as
>>>> the IPO being matched
>>>> 4. rbind them all into the "matching" dataset.
>>>>
>>>> The function I currently have is pasted below, for your
reference. Is
>>>> there any way to make it produce the same result but much
faster?
>>>> Specifically, I am guessing eliminating some loops would be
very good,
>>>> but I
>>>> don't see how, since I need to do some fancy footwork for
each IPO in
>>>> each
>>>> quarter to find the matching firm. I'll be doing a few
things similar to
>>>> this, so it's somewhat important to up the efficiency of
this. Maybe
>>>> some of
>>>> you R-fu masters can clue me in? :)
>>>>
>>>> I would appreciate any help, tips, tricks, tweaks, you name it!
:)
>>>>
>>>> ========== my function below ==========>>>>
>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>>> quarters_since_issue=40) {
>>>>
>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix
is
>>>> cheaper, so typecast the result to matrix
>>>>
>>>> colnames = names(tfdata)
>>>>
>>>> quarterends = sort(unique(tfdata$DATE))
>>>>
>>>> for (aquarter in quarterends) {
>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>
>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>> tfdata_quarter$IPO.Flag
>>>> == 1, ]
>>>>
>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>> industrypeers = industrypeers[
>>>> order(industrypeers$Market.Cap.13f), ]
>>>> if ( nrow(industrypeers) > 0 ) {
>>>> if (
nrow(industrypeers[industrypeers$Market.Cap.13f >>>>>
arow$Market.Cap.13f, ]) > 0 ) {
>>>> bestpeer =
industrypeers[industrypeers$Market.Cap.13f
>>>>>
>>>>> = arow$Market.Cap.13f, ][1,]
>>>>
>>>> }
>>>> else {
>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>> }
>>>> bestpeer$Quarters.Since.IPO.Issue
>>>> arow$Quarters.Since.IPO.Issue
>>>>
>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>> bestpeer$PERMNO] = 1
>>>> result = rbind(result, as.matrix(bestpeer))
>>>> }
>>>> }
>>>> #result = rbind(result, tfdata_quarter)
>>>> print (aquarter)
>>>> }
>>>>
>>>> result = as.data.frame(result)
>>>> names(result) = colnames
>>>> return(result)
>>>>
>>>> }
>>>>
>>>> ========= end of my function ============>>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
------------------------------
Message: 45
Date: Fri, 6 Jun 2008 12:05:38 -0400
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: "Daniel Folkinshteyn" <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<971536df0806060905u7198d0f6u19979a2e3b5dedc8 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
That is the last line of every message to r-help.
On Fri, Jun 6, 2008 at 12:05 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:> Its summarized in the last line to r-help. Note reproducible and
> minimal.
>
> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:
>> i did! what did i miss?
>>
>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>
>>> Try reading the posting guide before posting.
>>>
>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins
at gmail.com>
>>> wrote:
>>>>
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>>
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly
institutional
>>>>> ownership of equities; some of them have had recent IPOs,
some have not
>>>>> (I
>>>>> have a binary flag set). The total dataset size is 700k+
rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each
IPO, I need to
>>>>> find a "matched" firm in the same industry, and
close in market cap. So,
>>>>> e.g., for firm X, which had an IPO, i need to find a
matched non-issuing
>>>>> firm in quarter 1 since IPO, then a (possibly different)
non-issuing
>>>>> firm in
>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm
(there are about
>>>>> 8300
>>>>> of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data
selection and
>>>>> subsetting, and looping (yikes!), but the result appears to
be highly
>>>>> inefficient and takes ages (well, many hours). What I am
doing, in
>>>>> pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and
all the
>>>>> eligible
>>>>> non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in
the same
>>>>> industry, sort them by size, and finally grab a matching
firm closest in
>>>>> size (the exact procedure is to grab the closest bigger
firm if one
>>>>> exists,
>>>>> and just the biggest available if all are smaller)
>>>>> 3. assign the matched firm-observation the same
"quarters since issue"
>>>>> as
>>>>> the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your
reference. Is
>>>>> there any way to make it produce the same result but much
faster?
>>>>> Specifically, I am guessing eliminating some loops would be
very good,
>>>>> but I
>>>>> don't see how, since I need to do some fancy footwork
for each IPO in
>>>>> each
>>>>> quarter to find the matching firm. I'll be doing a few
things similar to
>>>>> this, so it's somewhat important to up the efficiency
of this. Maybe
>>>>> some of
>>>>> you R-fu masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name
it! :)
>>>>>
>>>>> ========== my function below ==========>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>>>> quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for
matrix is
>>>>> cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag
>>>>> == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers =
tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if (
nrow(industrypeers[industrypeers$Market.Cap.13f >>>>>>
arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer =
industrypeers[industrypeers$Market.Cap.13f
>>>>>>
>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>
>>>>> }
>>>>> else {
>>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>>
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function
============>>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>
>>
>
------------------------------
Message: 46
Date: Fri, 6 Jun 2008 12:06:37 -0400
From: "Woolner, Keith" <kwoolner at indians.com>
Subject: [R] How to force two regression coefficients to be equal but
opposite in sign?
To: <r-help at r-project.org>
Message-ID:
<F8CE5B1510266D46A27FE586C8D78EEC02F5CEAF at WAHOO.indians.com>
Content-Type: text/plain
Is there a way to set up a regression in R that forces two coefficients
to be equal but opposite in sign?
I'm trying to setup a model where a subject appears in a pair of
environments where a measurement X is made. There are a total of 5
environments, one of which is a baseline. But each observation is for
a subject in only two of them, and not all subjects will appear in
each environment.
Each of the environments has an effect on the variable X. I want to
measure the relative effects of each environment E on X with a model.
Xj = Xi * Ei / Ej
Ei of the baseline model is set equal to 1.
With a log transform, a linear-looking regression can be written as:
log(Xj) = log(Xi) + log(Ei) - log(Ej)
My data looks like:
# E1 X1 E2 X2
1 A .20 B .25
What I've tried in R:
env <-
c("A","B","C","D","E")
# Note: data is made up just for this example
df <- data.frame(
X1
c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92),
X2
c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70),
E1
c("A","A","A","B","B","B","C","C","C","D","D","D","E","E","E"),
E2
c("B","C","D","A","D","E","A","B","E","B","C","E","A","B","C")
)
model <- lm(log(X2) ~ log(X1) + E1 + E2, data = df)
summary(model)
Call:
lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df)
Residuals:
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15
0.3240 0.2621 -0.5861 -1.0283 0.5861 0.4422 0.3831 -0.2608 -0.1222
0.9002 -0.5802 -0.3200 0.6452 -0.9634 0.3182
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.54563 1.71558 0.318 0.763
log(X1) 1.29745 0.57295 2.265 0.073 .
E1B -0.23571 0.95738 -0.246 0.815
E1C -0.57057 1.20490 -0.474 0.656
E1D -0.22988 0.98274 -0.234 0.824
E1E -1.17181 1.02918 -1.139 0.306
E2B -0.16775 0.87803 -0.191 0.856
E2C 0.05952 1.12779 0.053 0.960
E2D 0.43077 1.19485 0.361 0.733
E2E 0.40633 0.98289 0.413 0.696
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
Residual standard error: 1.004 on 5 degrees of freedom
Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343
F-statistic: 1.781 on 9 and 5 DF, p-value: 0.2721
----
What I need to do is force the corresponding environment coefficients
to be equal in absolute value, but opposite in sign. That is:
E1B = -E2B
E1C = -E3C
E1D = -E3D
E1E = -E1E
In essence, E1 and E2 are the "same" variable, but can play two
different roles in the model depending on whether it's the first part
of the observation or the second part.
I searched the archive, and the closest thing I found to my situation
was:
http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html
But the response to that thread didn't seem to be applicable to my
situation.
Any pointers would be appreciated.
Thanks,
Keith
[[alternative HTML version deleted]]
------------------------------
Message: 47
Date: Fri, 06 Jun 2008 12:20:52 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Store filename
To: DAVID ARTETA GARCIA <darteta001 at ikasle.ehu.es>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Message-ID: <484963E4.20709 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
well, where are you getting the filename in the first place? are you
looping over a list of filenames that comes from somewhere?
generally, for concatenating strings, look at function 'paste':
write.table(myoutput, paste(myfilename,"_out.txt",
sep=''),sep="\t")
on 06/06/2008 11:51 AM DAVID ARTETA GARCIA said the
following:> Hi list,
>
> Is it possible to save the name of a filename automatically when reading
> it using read.table() or some other function?
> My aim is to create then an output table with the name of the original
> table with a suffix like _out
>
> example:
>
> mydata = read.table("Run224_v2_060308.txt", sep = "\t",
header = TRUE)
>
> ## store name?
>
> myfile = the_name_of_the_file
>
> ## do analysis of data and store in a data.frame "myoutput"
> ## write output in tab format
>
> write.table(myoutput, c(myfile,"_out.txt"),sep="\t")
>
> the name of the new file will be
>
> "Run224_v2_060308_out.txt"
>
> Thanks in advanve,
>
>
>
> David
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 48
Date: Fri, 6 Jun 2008 13:23:41 -0300
From: "Henrique Dallazuanna" <wwwhsd at gmail.com>
Subject: Re: [R] Store filename
To: "DAVID ARTETA GARCIA" <darteta001 at ikasle.ehu.es>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Message-ID:
<da79af330806060923r4f7058c1wa27e104b8cb4ee2e at mail.gmail.com>
Content-Type: text/plain
You can write your own function, something about like this:
read.table2 <- function(file, ...)
{
x <- read.table(file, ...)
attributes(x)[["file_name"]] <- file
return(x)
}
mydata <- read.table2("Run224_v2_060308.txt", sep = "\t",
header = TRUE)
myfile <- attr(x, "file_name")
On Fri, Jun 6, 2008 at 12:51 PM, DAVID ARTETA GARCIA <
darteta001 at ikasle.ehu.es> wrote:
> Hi list,
>
> Is it possible to save the name of a filename automatically when reading it
> using read.table() or some other function?
> My aim is to create then an output table with the name of the original
> table with a suffix like _out
>
> example:
>
> mydata = read.table("Run224_v2_060308.txt", sep = "\t",
header = TRUE)
>
> ## store name?
>
> myfile = the_name_of_the_file
>
> ## do analysis of data and store in a data.frame "myoutput"
> ## write output in tab format
>
> write.table(myoutput, c(myfile,"_out.txt"),sep="\t")
>
> the name of the new file will be
>
> "Run224_v2_060308_out.txt"
>
> Thanks in advanve,
>
>
>
> David
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Parana-Brasil
250 25' 40" S 490 16' 22" O
[[alternative HTML version deleted]]
------------------------------
Message: 49
Date: Fri, 06 Jun 2008 18:32:14 +0200
From: Dani Valverde <daniel.valverde at uab.cat>
Subject: [R] fit.contrast error
To: R Help <r-help at r-project.org>
Message-ID: <4849668E.7040009 at uab.cat>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hello,
I am trying to perform a fit.contrast() on a lme object with this code:
attach(error_DB)
model_temperature <- lme(Error ~ Temperature, data = error_DB,random=~1|ID)
summary(model_temperature)
fit.contrast(model_temperature, "Temperature", c(-1,1), conf.int=0.95
)
detach(error_DB)
but I got this error
Error in `contrasts<-`(`*tmp*`, value = c(-0.5, 0.5)) :
contrasts apply only to factors
My database is a dataframe, very similar to that of the Orthodont. Could
anyone give me some advise on how to solve the problem?
Best,
Dani
--
Daniel Valverde Saub?
Grup de Biologia Molecular de Llevats
Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona
Edifici V, Campus UAB
08193 Cerdanyola del Vall?s- SPAIN
Centro de Investigaci?n Biom?dica en Red
en Bioingenier?a, Biomateriales y
Nanomedicina (CIBER-BBN)
Grup d'Aplicacions Biom?diques de la RMN
Facultat de Bioci?ncies
Universitat Aut?noma de Barcelona
Edifici Cs, Campus UAB
08193 Cerdanyola del Vall?s- SPAIN
+34 93 5814126
------------------------------
Message: 50
Date: Fri, 06 Jun 2008 18:46:53 +0200
From: Uwe Ligges <ligges at statistik.tu-dortmund.de>
Subject: Re: [R] where to download BRugs?
To: Nanye Long <nanye.long at gmail.com>
Cc: r-help at r-project.org
Message-ID: <484969FD.3060807 at statistik.tu-dortmund.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear NL.
BRugs is available from the CRAN extras repository hosted by Brian Ripley.
install.packages("BRugs")
should install it as before (for R-2.7.x), if you have not changed the
list of default repositories.
Best wishes,
Uwe Ligges
Nanye Long wrote:> Hi all,
>
> Does anyone know where to download the "BRugs" package? I did not
find
> it on r-project website. Thanks.
>
> NL
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 51
Date: Fri, 6 Jun 2008 12:53:38 -0400
From: "Levi Waldron" <leviwaldron at gmail.com>
Subject: Re: [R] choosing an appropriate linear model
To: "R-help mailing list" <R-help at stat.math.ethz.ch>
Message-ID:
<7a09e3940806060953w5fa5bfb2i750a1e48b5561b5a at mail.gmail.com>
Content-Type: text/plain
Perhaps this was too big a question, so I'll ask something shorter:
I have fit a linear model, and want to use its prediction intervals to
calculate the sum of many individual predictions.
1) Some of the lower prediction intervals are negative, which is
non-sensical. Should I just set all negative predictions to zero, or is
there another way to require non-negative predictions only?
2) I am interested in the sum of many predictions based on the lm. How can
I calculate the 95% prediction interval for the sum? Should I calculate a
root mean square of the individual errors, or use a bootstrap method, or
something else?
ps. the data is attached to the end of this email.
On Thu, Jun 5, 2008 at 6:25 PM, Levi Waldron <leviwaldron at gmail.com>
wrote:
> I am trying to model the observed leaching of wood preservative chemicals
> from treated wood during an outdoor experiment where leaching is caused by
> rainfall events. For each rainfall event, the amount of rainfall was
> recorded as well as the amount of preservative chemical leached. A number
> of climatic variables were measured, but the most important is the amount
of
> rainfall.
>
> I have tried a simple linear model, with zero intercept because zero
> rainfall cannot cause any leaching (leachdata dataframe is attached to this
> email). The diagnostics show clearly non-normally distributed residuals
> with a simple linear regression, and I am trying to figure out what to do
> about it (see attached diagnostics.png). This dataset contains
measurements
> from 57 rainfall events on three replicate samples, for a total of 171
> measurements.
>
> Part of the problem is that physically, the leaching values can only be
> positive, so for the smaller rainfall amounts the residuals are all
> positive. If I allow an intercept then it is significantly positive,
> possibly since the researcher wouldn't have collected measurements for
very
> small rain events, but in terms of the model it doesn't make sense
> physically to have a positive intercept, particularly since lab experiments
> have shown that a certain amount of rain exposure is required to wet the
> wood before leaching begins.
>
> I can get more normally distributed residuals by log-transforming the
> response, or using the optimal box-cox transformation of lambda = 0.21,
> which produces nicer-looking residuals but unsatisfactory prediction which
> is the main goal of the model (also attached).
>
> Any advice on how to create a better predictive model? I presume it has
> something to do with glm, especially since I have repeated rainfalls on
> replicate samples, but any advice on the approach to take would be much
> appreciated. The code I used to produce the attached plots is included
> below.
>
>
> leach.lm <- lm(leachate~rainmm-1,data=leachdata)
>
> png("dianostics.png",height=1200,width=700)
> par(mfrow=c(3,2))
> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
> abline(leach.lm)
> plot(predict(leach.lm)~leachdata$leachate,main="predicted vs. observed
> leaching amount",xlim=c(0,12),ylim=c(0,12),xlab="observed
> leaching",ylab="predicted leaching")
> abline(a=0,b=1)
> plot(leach.lm)
> dev.off()
>
> library(MASS)
> boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01))
>
> boxtran <- function(y,lambda,inverse=F){
> if(inverse)
> return((lambda*y+1)^(1/lambda))
> else
> return((y^lambda-1)/lambda)
> }
>
> png("boxcox-dianostics.png",height=1200,width=700)
> par(mfrow=c(3,2))
> logleach.lm <- lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata)
> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
> x <- leachdata$rainmm
> y <- boxtran(predict(logleach.lm),0.21,T)
> xy <- cbind(x,y)[order(x),]
> lines(xy)
> plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main="predicted
vs.
> observed leaching amount",xlab="observed
leaching",ylab="predicted
> leaching")
> abline(a=0,b=1)
> plot(logleach.lm)
> dev.off()
>
`leachdata` <-
structure(list(rainmm = c(19.68, 36.168, 18.632, 2.74, 0.822,
9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74,
18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604,
2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644,
3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2,
43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96,
7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74,
0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042,
2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754,
12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548,
1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918,
54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124,
10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632,
2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288,
9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714,
5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21,
0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274,
1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13,
7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54), leachate = c(0.94,
4.74, 2.84, 3.28, 0.07, 1.56, 0.48, 9.63, 1.2, 2.55, 0.15, 0.67,
0.57, 0.38, 1.81, 0.08, 0.94, 0.79, 0.16, 0.09, 1.2, 0.61, 0.77,
0.02, 1, 0.26, 1.34, 0.81, 0.18, 0.17, 0.005, 0.25, 0.42, 1.45,
0.54, 0.24, 0.41, 0.55, 1.59, 1.09, 3.84, 11.52, 6.21, 3.86,
2.34, 11.02, 2.33, 1.83, 2.4, 0.74, 0.71, 0.55, 0.31, 0.83, 0.29,
0.48, 0.92, 1.33, 4.8, 1.73, 1.87, 0.21, 1.04, 1.08, 6.74, 1.23,
2.5, 0.13, 1.29, 0.75, 0.66, 2.14, 0.17, 0.43, 0.69, 0.47, 0.14,
1.6, 0.56, 1.02, 0.04, 0.75, 0.32, 1.68, 0.58, 0.42, 0.18, 0.1,
0.34, 0.36, 1.54, 0.38, 0.18, 0.26, 0.005, 0.17, 0.18, 0.4, 2.13,
0.87, 0.75, 0.52, 3.21, 0.49, 0.85, 1.24, 0.32, 0.5, 0.37, 0.19,
0.53, 0.3, 0.51, 1.37, 1.25, 3.69, 2.76, 1.82, 0.005, 0.99, 0.87,
6.93, 1.04, 2.26, 0.14, 1.27, 0.62, 0.6, 2.91, 0.19, 0.41, 0.47,
0.38, 0.17, 1.56, 0.41, 0.92, 0.02, 0.51, 0.26, 0.86, 0.47, 0.39,
0.12, 0.08, 0.28, 0.3, 1.16, 0.27, 0.15, 0.22, 0.3, 0.18, 0.16,
0.47, 6, 1.47, 0.67, 0.35, 2.13, 0.51, 0.85, 1.37, 0.23, 0.45,
0.34, 0.17, 0.46, 0.23, 0.43, 1.17)), .Names = c("rainmm",
"leachate"
), row.names = c(NA, -171L), class = "data.frame")
[[alternative HTML version deleted]]
------------------------------
Message: 52
Date: Fri, 6 Jun 2008 10:02:40 -0700 (PDT)
From: avilella <avilella at gmail.com>
Subject: [R] reorder breaking by half
To: r-help at r-project.org
Message-ID:
<89301b84-41f6-48fc-88e0-4087e0a21a4e at c65g2000hsa.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi,
I want to reorder the colors given by rainbow(7) so that the last half
move to the first 4.
For example:
> ci=rainbow(7)
> ci
[1] "#FF0000FF" "#FFDB00FF" "#49FF00FF"
"#00FF92FF" "#0092FFFF"
"#4900FFFF"
[7] "#FF00DBFF"
I would like "#FF0000FF" "#FFDB00FF" "#49FF00FF"
to be at the end of
ci, and the rest to be at the beginning.
How can I do that?
------------------------------
Message: 53
Date: Fri, 6 Jun 2008 10:11:42 -0700 (PDT)
From: Thomas Lumley <tlumley at u.washington.edu>
Subject: Re: [R] rmeta package: metaplot or forestplot of
meta-analysis under DSL (ramdon) model
To: "Shi, Jiajun [BSD] - KNP" <jshi1 at bsd.uchicago.edu>
Cc: r-help at r-project.org
Message-ID:
<Pine.LNX.4.64.0806061009100.13806 at homer21.u.washington.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
The package has a plot() method for random-effects meta-analyses as well,
either those produced by meta.DSL or meta.summaries.
There are examples on the help page for meta.DSL.
-thomas
On Tue, 27 May 2008, Shi, Jiajun [BSD] - KNP wrote:
> Dear all,
>
> I could not draw a forest plot for meta-analysis under ramdon models
> using the rmeta package. The rmeta has a default function for MH
> (fixed-effect) model. Has the rmeta package been updated for such a
> function? Or someone revised it and kept a private code?
>
> I would appreciate it if you could provide some information on this
> question.
>
> Thanks,
>
> Andrew
>
>
> This email is intended only for the use of the individua...{{dropped:12}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
------------------------------
Message: 54
Date: Fri, 6 Jun 2008 19:12:00 +0200 (CEST)
From: "Luca Mortarini" <l.mortarini at isac.cnr.it>
Subject: [R] Problem with subset
To: r-help at r-project.org
Message-ID: <56855.213.140.22.79.1212772320.squirrel at mail.isac.cnr.it>
Content-Type: text/plain;charset=iso-8859-1
Hi,
I am new to R and i am looking for a way to extract a subset from a
vector.
I have a vector of number oscillating around zero (a decreasing
autocorrelation function) and i would like to extract only the first
positive part of the function (from zero lag to the lag where the function
inverts its sign for the first time).
I have tried
subset(myvector,myvector>0)
but this obviously extract all the positive intervals not only the first one.
Is there a logical statement i can use in subset? I prefer not to use an
if statement that would probably slow down the code.
Thanks a lot,
Luca
*********************************************************
dr. Luca Mortarini l.mortarini at isac.cnr.it
Universit? del Piemonte Orientale
Dipartimento di Scienze e Tecnologie Avanzate
------------------------------
Message: 55
Date: Fri, 6 Jun 2008 10:14:12 -0700
From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Subject: Re: [R] Manipulating DataSets
To: Neil Gupta <neil.gup at gmail.com>
Cc: R-help at r-project.org
Message-ID: <Pine.LNX.4.64.0806061008470.28293 at tajo.ucsd.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Fri, 6 Jun 2008, Neil Gupta wrote:
> Hello R-users,
>
> I have a very simple problem I wanted to solve. I have a large dataset as
> such:
> Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1
> Time.1 TickType.1 ReferenceNumber.1
> 1 ES 3:ESZ7.GB 08:30:00 B 74390987 151075 44
> 3:ESZ7.GB08:30:00 A 74390988
> 2 ES 3:YMZ7.EC 08:30:00 B 74390993 13686 17
> 3:YMZ7.EC08:30:00 A 74390994
> 3 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49
> 3:ESZ7.GB08:30:00 A 74391136
> 4 YM 3:YMZ7.EC 08:30:00 B 74390998 13686 17
> 3:YMZ7.EC08:30:00 A 74390999
> 5 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49
> 3:ESZ7.GB08:30:00 A 74391136
> 6 YM 3:YMZ7.EC 08:30:00 B 74391000 13686 14
> 3:YMZ7.EC08:30:00 A 74391001
> Price.1 Size.1 LeadTime MidPoint Spread
> 1 151100 22 08:30:00 *151087.5* 25
> 2 13688 27 08:30:00 13687.0 2
> 3 151100 22 08:30:00 *151087.5* 25
> 4 13688 27 08:30:00 13687.0 2
> 5 151100 22 08:30:00 151087.5 25
> 6 13688 27 08:30:00 13687.0 2
>
>
> All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a
> symbol "3:ESZ7.GB"
> So the first one would be log(151087.5) - log(151087.5). I wanted to do
this
> throughout the data set and add that in another column. I would appreciate
> any help.
See
example( split )
Note the "### data frame variation", which should serve as a template
for
your problem.
HTH,
Chuck
>
> Regards,
>
> Neil Gupta
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
------------------------------
Message: 56
Date: Fri, 06 Jun 2008 13:25:05 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Gabor Grothendieck <ggrothendieck at gmail.com>
Cc: r-help at r-project.org
Message-ID: <484972F1.3070203 at gmail.com>
Content-Type: text/plain; charset="iso-8859-1";
Format="flowed"
i thought since the function code (which i provided in full) was pretty
short, it would be reasonably easy to just read the code and see what
it's doing.
but ok, so... i am attaching a zip file, with a small sample of the data
set (tab delimited), and the function code, in a zip file (posting
guidelines claim that "some archive formats" are allowed, i assume zip
is one of them...
would appreciate your comments! :)
on 06/06/2008 12:05 PM Gabor Grothendieck said the
following:> Its summarized in the last line to r-help. Note reproducible and
> minimal.
>
> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:
>> i did! what did i miss?
>>
>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>> Try reading the posting guide before posting.
>>>
>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins
at gmail.com>
>>> wrote:
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly
institutional
>>>>> ownership of equities; some of them have had recent IPOs,
some have not
>>>>> (I
>>>>> have a binary flag set). The total dataset size is 700k+
rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each
IPO, I need to
>>>>> find a "matched" firm in the same industry, and
close in market cap. So,
>>>>> e.g., for firm X, which had an IPO, i need to find a
matched non-issuing
>>>>> firm in quarter 1 since IPO, then a (possibly different)
non-issuing
>>>>> firm in
>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm
(there are about
>>>>> 8300
>>>>> of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data
selection and
>>>>> subsetting, and looping (yikes!), but the result appears to
be highly
>>>>> inefficient and takes ages (well, many hours). What I am
doing, in
>>>>> pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and
all the
>>>>> eligible
>>>>> non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in
the same
>>>>> industry, sort them by size, and finally grab a matching
firm closest in
>>>>> size (the exact procedure is to grab the closest bigger
firm if one
>>>>> exists,
>>>>> and just the biggest available if all are smaller)
>>>>> 3. assign the matched firm-observation the same
"quarters since issue"
>>>>> as
>>>>> the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your
reference. Is
>>>>> there any way to make it produce the same result but much
faster?
>>>>> Specifically, I am guessing eliminating some loops would be
very good,
>>>>> but I
>>>>> don't see how, since I need to do some fancy footwork
for each IPO in
>>>>> each
>>>>> quarter to find the matching firm. I'll be doing a few
things similar to
>>>>> this, so it's somewhat important to up the efficiency
of this. Maybe
>>>>> some of
>>>>> you R-fu masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name
it! :)
>>>>>
>>>>> ========== my function below ==========>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>>>> quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for
matrix is
>>>>> cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag
>>>>> == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers =
tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if (
nrow(industrypeers[industrypeers$Market.Cap.13f >>>>>>
arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer =
industrypeers[industrypeers$Market.Cap.13f
>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>> }
>>>>> else {
>>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>>
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function
============>>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>
------------------------------
Message: 57
Date: Fri, 06 Jun 2008 13:25:40 -0400
From: "John Fox" <jfox at mcmaster.ca>
Subject: Re: [R] lsmeans
To: Dani Valverde <daniel.valverde at uab.cat>
Cc: R Help <r-help at r-project.org>
Message-ID: <web-214018018 at cgpsrv2.cis.mcmaster.ca>
Content-Type: text/plain; charset="ISO-8859-1"
Dear Dani,
I intend at some point to extend the effects package to linear and
generalized linear mixed-effects models, probably using lmer() rather
than lme(), but as you discovered, it doesn't handle these models now.
It wouldn't be hard, however, to do the computations yourself, using
the coefficient vector for the fixed effects and a suitably constructed
model-matrix to compute the effects; you could also get standard errors
by using the covariance matrix for the fixed effects.
I hope this helps,
John
On Fri, 06 Jun 2008 17:05:58 +0200
Dani Valverde <daniel.valverde at uab.cat> wrote:> Hello,
> I have the next function call:
>
> lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID,
> data=error_DB)
>
> which returns an lme object. I am interested on carrying out some
> kind of lsmeans on the data returned, but I cannot find any function
> to do this in R. I'have seen the effect() function, but it does not
> work with lme objects. Any idea?
>
> Best,
>
> Dani
>
> --
> Daniel Valverde Saub?
>
> Grup de Biologia Molecular de Llevats
> Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona
> Edifici V, Campus UAB
> 08193 Cerdanyola del Vall?s- SPAIN
>
> Centro de Investigaci?n Biom?dica en Red
> en Bioingenier?a, Biomateriales y
> Nanomedicina (CIBER-BBN)
>
> Grup d'Aplicacions Biom?diques de la RMN
> Facultat de Bioci?ncies
> Universitat Aut?noma de Barcelona
> Edifici Cs, Campus UAB
> 08193 Cerdanyola del Vall?s- SPAIN
> +34 93 5814126
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
------------------------------
Message: 58
Date: Fri, 06 Jun 2008 13:27:32 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] reorder breaking by half
To: avilella <avilella at gmail.com>
Cc: r-help at r-project.org
Message-ID: <48497384.5000505 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
ci = rainbow(7)[c(4:7, 1:3)]
on 06/06/2008 01:02 PM avilella said the following:> Hi,
>
> I want to reorder the colors given by rainbow(7) so that the last half
> move to the first 4.
>
> For example:
>
>> ci=rainbow(7)
>> ci
> [1] "#FF0000FF" "#FFDB00FF" "#49FF00FF"
"#00FF92FF" "#0092FFFF"
> "#4900FFFF"
> [7] "#FF00DBFF"
>
> I would like "#FF0000FF" "#FFDB00FF"
"#49FF00FF" to be at the end of
> ci, and the rest to be at the beginning.
>
> How can I do that?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 59
Date: Fri, 06 Jun 2008 13:29:56 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Patrick Burns <pburns at pburns.seanet.com>
Cc: r-help at r-project.org
Message-ID: <48497414.1060802 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
thanks for the tip! i'll try that and see how big of a difference that
makes... if i am not sure what exactly the size will be, am i better off
making it larger, and then later stripping off the blank rows, or making
it smaller, and appending the missing rows?
on 06/06/2008 11:44 AM Patrick Burns said the following:> One thing that is likely to speed the code significantly
> is if you create 'result' to be its final size and then
> subscript into it. Something like:
>
> result[i, ] <- bestpeer
>
> (though I'm not sure if 'i' is the proper index).
>
> Patrick Burns
> patrick at burns-stat.com
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> Daniel Folkinshteyn wrote:
>> Anybody have any thoughts on this? Please? :)
>>
>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>> Hi everyone!
>>>
>>> I have a question about data processing efficiency.
>>>
>>> My data are as follows: I have a data set on quarterly
institutional
>>> ownership of equities; some of them have had recent IPOs, some have
>>> not (I have a binary flag set). The total dataset size is 700k+
rows.
>>>
>>> My goal is this: For every quarter since issue for each IPO, I need
>>> to find a "matched" firm in the same industry, and close
in market
>>> cap. So, e.g., for firm X, which had an IPO, i need to find a
matched
>>> non-issuing firm in quarter 1 since IPO, then a (possibly
different)
>>> non-issuing firm in quarter 2 since IPO, etc. Repeat for each
issuing
>>> firm (there are about 8300 of these).
>>>
>>> Thus it seems to me that I need to be doing a lot of data selection
>>> and subsetting, and looping (yikes!), but the result appears to be
>>> highly inefficient and takes ages (well, many hours). What I am
>>> doing, in pseudocode, is this:
>>>
>>> 1. for each quarter of data, getting out all the IPOs and all the
>>> eligible non-issuing firms.
>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>> industry, sort them by size, and finally grab a matching firm
closest
>>> in size (the exact procedure is to grab the closest bigger firm if
>>> one exists, and just the biggest available if all are smaller)
>>> 3. assign the matched firm-observation the same "quarters
since
>>> issue" as the IPO being matched
>>> 4. rbind them all into the "matching" dataset.
>>>
>>> The function I currently have is pasted below, for your reference.
Is
>>> there any way to make it produce the same result but much faster?
>>> Specifically, I am guessing eliminating some loops would be very
>>> good, but I don't see how, since I need to do some fancy
footwork for
>>> each IPO in each quarter to find the matching firm. I'll be
doing a
>>> few things similar to this, so it's somewhat important to up
the
>>> efficiency of this. Maybe some of you R-fu masters can clue me in?
:)
>>>
>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>
>>> ========== my function below ==========>>>
>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>> quarters_since_issue=40) {
>>>
>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix
is
>>> cheaper, so typecast the result to matrix
>>>
>>> colnames = names(tfdata)
>>>
>>> quarterends = sort(unique(tfdata$DATE))
>>>
>>> for (aquarter in quarterends) {
>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>
>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>> (tfdata_quarter$IPO.Flag == 0), ]
>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>> tfdata_quarter$IPO.Flag == 1, ]
>>>
>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>> arow = tfdata_quarter_ipoissuers[i,]
>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>> industrypeers = industrypeers[
>>> order(industrypeers$Market.Cap.13f), ]
>>> if ( nrow(industrypeers) > 0 ) {
>>> if (
nrow(industrypeers[industrypeers$Market.Cap.13f
>>> >= arow$Market.Cap.13f, ]) > 0 ) {
>>> bestpeer >>>
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>>> }
>>> else {
>>> bestpeer = industrypeers[nrow(industrypeers),]
>>> }
>>> bestpeer$Quarters.Since.IPO.Issue >>>
arow$Quarters.Since.IPO.Issue
>>>
>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>> bestpeer$PERMNO] = 1
>>> result = rbind(result, as.matrix(bestpeer))
>>> }
>>> }
>>> #result = rbind(result, tfdata_quarter)
>>> print (aquarter)
>>> }
>>>
>>> result = as.data.frame(result)
>>> names(result) = colnames
>>> return(result)
>>>
>>> }
>>>
>>> ========= end of my function ============>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
------------------------------
Message: 60
Date: Fri, 6 Jun 2008 13:35:48 -0400
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: "Daniel Folkinshteyn" <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<971536df0806061035vd6a2941v18303c1ce78bed2d at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
I think the posting guide may not be clear enough and have suggested that
it be clarified. Hopefully this better communicates what is required and why
in a shorter amount of space:
https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html
On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:> i thought since the function code (which i provided in full) was pretty
> short, it would be reasonably easy to just read the code and see what
it's
> doing.
>
> but ok, so... i am attaching a zip file, with a small sample of the data
set
> (tab delimited), and the function code, in a zip file (posting guidelines
> claim that "some archive formats" are allowed, i assume zip is
one of
> them...
>
> would appreciate your comments! :)
>
> on 06/06/2008 12:05 PM Gabor Grothendieck said the following:
>>
>> Its summarized in the last line to r-help. Note reproducible and
>> minimal.
>>
>> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins at
gmail.com>
>> wrote:
>>>
>>> i did! what did i miss?
>>>
>>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>>
>>>> Try reading the posting guide before posting.
>>>>
>>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn
>>>> <dfolkins at gmail.com>
>>>> wrote:
>>>>>
>>>>> Anybody have any thoughts on this? Please? :)
>>>>>
>>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the
following:
>>>>>>
>>>>>> Hi everyone!
>>>>>>
>>>>>> I have a question about data processing efficiency.
>>>>>>
>>>>>> My data are as follows: I have a data set on quarterly
institutional
>>>>>> ownership of equities; some of them have had recent
IPOs, some have
>>>>>> not
>>>>>> (I
>>>>>> have a binary flag set). The total dataset size is
700k+ rows.
>>>>>>
>>>>>> My goal is this: For every quarter since issue for each
IPO, I need to
>>>>>> find a "matched" firm in the same industry,
and close in market cap.
>>>>>> So,
>>>>>> e.g., for firm X, which had an IPO, i need to find a
matched
>>>>>> non-issuing
>>>>>> firm in quarter 1 since IPO, then a (possibly
different) non-issuing
>>>>>> firm in
>>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm
(there are
>>>>>> about
>>>>>> 8300
>>>>>> of these).
>>>>>>
>>>>>> Thus it seems to me that I need to be doing a lot of
data selection
>>>>>> and
>>>>>> subsetting, and looping (yikes!), but the result
appears to be highly
>>>>>> inefficient and takes ages (well, many hours). What I
am doing, in
>>>>>> pseudocode, is this:
>>>>>>
>>>>>> 1. for each quarter of data, getting out all the IPOs
and all the
>>>>>> eligible
>>>>>> non-issuing firms.
>>>>>> 2. for each IPO in a quarter, grab all the non-issuers
in the same
>>>>>> industry, sort them by size, and finally grab a
matching firm closest
>>>>>> in
>>>>>> size (the exact procedure is to grab the closest bigger
firm if one
>>>>>> exists,
>>>>>> and just the biggest available if all are smaller)
>>>>>> 3. assign the matched firm-observation the same
"quarters since issue"
>>>>>> as
>>>>>> the IPO being matched
>>>>>> 4. rbind them all into the "matching"
dataset.
>>>>>>
>>>>>> The function I currently have is pasted below, for your
reference. Is
>>>>>> there any way to make it produce the same result but
much faster?
>>>>>> Specifically, I am guessing eliminating some loops
would be very good,
>>>>>> but I
>>>>>> don't see how, since I need to do some fancy
footwork for each IPO in
>>>>>> each
>>>>>> quarter to find the matching firm. I'll be doing a
few things similar
>>>>>> to
>>>>>> this, so it's somewhat important to up the
efficiency of this. Maybe
>>>>>> some of
>>>>>> you R-fu masters can clue me in? :)
>>>>>>
>>>>>> I would appreciate any help, tips, tricks, tweaks, you
name it! :)
>>>>>>
>>>>>> ========== my function below
==========>>>>>>
>>>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>>>>> quarters_since_issue=40) {
>>>>>>
>>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for
matrix is
>>>>>> cheaper, so typecast the result to matrix
>>>>>>
>>>>>> colnames = names(tfdata)
>>>>>>
>>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>>
>>>>>> for (aquarter in quarterends) {
>>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>>
>>>>>> tfdata_quarter_fitting_nonissuers =
tfdata_quarter[
>>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>>> tfdata_quarter$IPO.Flag
>>>>>> == 1, ]
>>>>>>
>>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>>> industrypeers =
tfdata_quarter_fitting_nonissuers[
>>>>>> tfdata_quarter_fitting_nonissuers$HSICIG ==
arow$HSICIG, ]
>>>>>> industrypeers = industrypeers[
>>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>>> if (
nrow(industrypeers[industrypeers$Market.Cap.13f >>>>>>>
arow$Market.Cap.13f, ]) > 0 ) {
>>>>>> bestpeer =
industrypeers[industrypeers$Market.Cap.13f
>>>>>>>
>>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>>
>>>>>> }
>>>>>> else {
>>>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>>>> }
>>>>>> bestpeer$Quarters.Since.IPO.Issue
>>>>>> arow$Quarters.Since.IPO.Issue
>>>>>>
>>>>>>
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>>>> bestpeer$PERMNO] = 1
>>>>>> result = rbind(result,
as.matrix(bestpeer))
>>>>>> }
>>>>>> }
>>>>>> #result = rbind(result, tfdata_quarter)
>>>>>> print (aquarter)
>>>>>> }
>>>>>>
>>>>>> result = as.data.frame(result)
>>>>>> names(result) = colnames
>>>>>> return(result)
>>>>>>
>>>>>> }
>>>>>>
>>>>>> ========= end of my function
============>>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>
>
------------------------------
Message: 61
Date: Fri, 06 Jun 2008 13:36:42 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Gabor Grothendieck <ggrothendieck at gmail.com>
Cc: r-help at r-project.org
Message-ID: <484975AA.8000800 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
just in case, uploaded it to the server, you can get the zip file i
mentioned here:
http://astro.temple.edu/~dfolkins/helplistfiles.zip
on 06/06/2008 01:25 PM Daniel Folkinshteyn said the
following:> i thought since the function code (which i provided in full) was pretty
> short, it would be reasonably easy to just read the code and see what
> it's doing.
>
> but ok, so... i am attaching a zip file, with a small sample of the data
> set (tab delimited), and the function code, in a zip file (posting
> guidelines claim that "some archive formats" are allowed, i
assume zip
> is one of them...
>
> would appreciate your comments! :)
>
> on 06/06/2008 12:05 PM Gabor Grothendieck said the following:
>> Its summarized in the last line to r-help. Note reproducible and
>> minimal.
>>
>> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn
>> <dfolkins at gmail.com> wrote:
>>> i did! what did i miss?
>>>
>>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>> Try reading the posting guide before posting.
>>>>
>>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn
>>>> <dfolkins at gmail.com>
>>>> wrote:
>>>>> Anybody have any thoughts on this? Please? :)
>>>>>
>>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the
following:
>>>>>> Hi everyone!
>>>>>>
>>>>>> I have a question about data processing efficiency.
>>>>>>
>>>>>> My data are as follows: I have a data set on quarterly
institutional
>>>>>> ownership of equities; some of them have had recent
IPOs, some
>>>>>> have not
>>>>>> (I
>>>>>> have a binary flag set). The total dataset size is
700k+ rows.
>>>>>>
>>>>>> My goal is this: For every quarter since issue for each
IPO, I
>>>>>> need to
>>>>>> find a "matched" firm in the same industry,
and close in market
>>>>>> cap. So,
>>>>>> e.g., for firm X, which had an IPO, i need to find a
matched
>>>>>> non-issuing
>>>>>> firm in quarter 1 since IPO, then a (possibly
different) non-issuing
>>>>>> firm in
>>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm
(there are
>>>>>> about
>>>>>> 8300
>>>>>> of these).
>>>>>>
>>>>>> Thus it seems to me that I need to be doing a lot of
data
>>>>>> selection and
>>>>>> subsetting, and looping (yikes!), but the result
appears to be highly
>>>>>> inefficient and takes ages (well, many hours). What I
am doing, in
>>>>>> pseudocode, is this:
>>>>>>
>>>>>> 1. for each quarter of data, getting out all the IPOs
and all the
>>>>>> eligible
>>>>>> non-issuing firms.
>>>>>> 2. for each IPO in a quarter, grab all the non-issuers
in the same
>>>>>> industry, sort them by size, and finally grab a
matching firm
>>>>>> closest in
>>>>>> size (the exact procedure is to grab the closest bigger
firm if one
>>>>>> exists,
>>>>>> and just the biggest available if all are smaller)
>>>>>> 3. assign the matched firm-observation the same
"quarters since
>>>>>> issue"
>>>>>> as
>>>>>> the IPO being matched
>>>>>> 4. rbind them all into the "matching"
dataset.
>>>>>>
>>>>>> The function I currently have is pasted below, for your
reference. Is
>>>>>> there any way to make it produce the same result but
much faster?
>>>>>> Specifically, I am guessing eliminating some loops
would be very
>>>>>> good,
>>>>>> but I
>>>>>> don't see how, since I need to do some fancy
footwork for each IPO in
>>>>>> each
>>>>>> quarter to find the matching firm. I'll be doing a
few things
>>>>>> similar to
>>>>>> this, so it's somewhat important to up the
efficiency of this. Maybe
>>>>>> some of
>>>>>> you R-fu masters can clue me in? :)
>>>>>>
>>>>>> I would appreciate any help, tips, tricks, tweaks, you
name it! :)
>>>>>>
>>>>>> ========== my function below
==========>>>>>>
>>>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>>>>> quarters_since_issue=40) {
>>>>>>
>>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind
for matrix is
>>>>>> cheaper, so typecast the result to matrix
>>>>>>
>>>>>> colnames = names(tfdata)
>>>>>>
>>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>>
>>>>>> for (aquarter in quarterends) {
>>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter,
]
>>>>>>
>>>>>> tfdata_quarter_fitting_nonissuers =
tfdata_quarter[
>>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>>> tfdata_quarter$IPO.Flag
>>>>>> == 1, ]
>>>>>>
>>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>>> industrypeers =
tfdata_quarter_fitting_nonissuers[
>>>>>> tfdata_quarter_fitting_nonissuers$HSICIG ==
arow$HSICIG, ]
>>>>>> industrypeers = industrypeers[
>>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>>> if (
nrow(industrypeers[industrypeers$Market.Cap.13f >>>>>>>
arow$Market.Cap.13f, ]) > 0 ) {
>>>>>> bestpeer >>>>>>
industrypeers[industrypeers$Market.Cap.13f
>>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>> }
>>>>>> else {
>>>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>>>> }
>>>>>> bestpeer$Quarters.Since.IPO.Issue
>>>>>> arow$Quarters.Since.IPO.Issue
>>>>>>
>>>>>>
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>>>> bestpeer$PERMNO] = 1
>>>>>> result = rbind(result,
as.matrix(bestpeer))
>>>>>> }
>>>>>> }
>>>>>> #result = rbind(result, tfdata_quarter)
>>>>>> print (aquarter)
>>>>>> }
>>>>>>
>>>>>> result = as.data.frame(result)
>>>>>> names(result) = colnames
>>>>>> return(result)
>>>>>>
>>>>>> }
>>>>>>
>>>>>> ========= end of my function
============>>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>
------------------------------
Message: 62
Date: Fri, 6 Jun 2008 11:39:27 -0600
From: "Greg Snow" <Greg.Snow at imail.org>
Subject: Re: [R] How to force two regression coefficients to be equal
but opposite in sign?
To: "Woolner, Keith" <kwoolner at indians.com>, "r-help at
r-project.org"
<r-help at r-project.org>
Message-ID:
<B37C0A15B8FB3C468B5BC7EBC7DA14CC60F685895B at LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset=us-ascii
One simple way is to do something like:
> fit <- lm(y ~ I(x1-x2) + x3, data=mydata)
The first coeficient (after the intercept) will be the slope for x1, the slope
for x2 will be the negative of that. This model is nested in the fuller model
with x1 and x2 fit seperately and you can therefore test for differences.
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Woolner, Keith
> Sent: Friday, June 06, 2008 10:07 AM
> To: r-help at r-project.org
> Subject: [R] How to force two regression coefficients to be
> equal but opposite in sign?
>
> Is there a way to set up a regression in R that forces two
> coefficients
>
> to be equal but opposite in sign?
>
>
>
> I'm trying to setup a model where a subject appears in a pair of
>
> environments where a measurement X is made. There are a total of 5
>
> environments, one of which is a baseline. But each observation is for
>
> a subject in only two of them, and not all subjects will appear in
>
> each environment.
>
>
>
> Each of the environments has an effect on the variable X. I want to
>
> measure the relative effects of each environment E on X with a model.
>
>
>
> Xj = Xi * Ei / Ej
>
>
>
> Ei of the baseline model is set equal to 1.
>
>
>
> With a log transform, a linear-looking regression can be written as:
>
>
>
> log(Xj) = log(Xi) + log(Ei) - log(Ej)
>
>
>
> My data looks like:
>
>
>
> # E1 X1 E2 X2
>
> 1 A .20 B .25
>
>
>
> What I've tried in R:
>
>
>
> env <-
c("A","B","C","D","E")
>
>
>
> # Note: data is made up just for this example
>
>
>
> df <- data.frame(
>
> X1 >
c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92),
>
> X2 >
c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70),
>
> E1 >
c("A","A","A","B","B","B","C","C","C","D","D","D","E","E","E"),
>
> E2 >
c("B","C","D","A","D","E","A","B","E","B","C","E","A","B","C")
>
> )
>
>
>
> model <- lm(log(X2) ~ log(X1) + E1 + E2, data = df)
>
>
>
> summary(model)
>
>
>
> Call:
>
> lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df)
>
>
>
> Residuals:
>
> 1 2 3 4 5 6 7
> 8 9
> 10 11 12 13 14 15
>
> 0.3240 0.2621 -0.5861 -1.0283 0.5861 0.4422 0.3831
> -0.2608 -0.1222
> 0.9002 -0.5802 -0.3200 0.6452 -0.9634 0.3182
>
>
>
> Coefficients:
>
> Estimate Std. Error t value Pr(>|t|)
>
> (Intercept) 0.54563 1.71558 0.318 0.763
>
> log(X1) 1.29745 0.57295 2.265 0.073 .
>
> E1B -0.23571 0.95738 -0.246 0.815
>
> E1C -0.57057 1.20490 -0.474 0.656
>
> E1D -0.22988 0.98274 -0.234 0.824
>
> E1E -1.17181 1.02918 -1.139 0.306
>
> E2B -0.16775 0.87803 -0.191 0.856
>
> E2C 0.05952 1.12779 0.053 0.960
>
> E2D 0.43077 1.19485 0.361 0.733
>
> E2E 0.40633 0.98289 0.413 0.696
>
> ---
>
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
>
>
>
> Residual standard error: 1.004 on 5 degrees of freedom
>
> Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343
>
> F-statistic: 1.781 on 9 and 5 DF, p-value: 0.2721
>
>
>
> ----
>
>
>
> What I need to do is force the corresponding environment coefficients
>
> to be equal in absolute value, but opposite in sign. That is:
>
>
>
> E1B = -E2B
>
> E1C = -E3C
>
> E1D = -E3D
>
> E1E = -E1E
>
>
>
> In essence, E1 and E2 are the "same" variable, but can play two
>
> different roles in the model depending on whether it's the first part
>
> of the observation or the second part.
>
>
>
> I searched the archive, and the closest thing I found to my situation
>
> was:
>
>
>
> http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html
>
>
>
> But the response to that thread didn't seem to be applicable to my
>
> situation.
>
>
>
> Any pointers would be appreciated.
>
>
>
> Thanks,
>
> Keith
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 63
Date: Fri, 6 Jun 2008 13:43:27 -0400
From: "jim holtman" <jholtman at gmail.com>
Subject: Re: [R] Subsetting to unique values
To: "Emslie, Paul [Ctr]" <emsliep at atac.mil>
Cc: r-help at r-project.org
Message-ID:
<644e1f320806061043t5ab2fd01h586099c91e165ad4 at mail.gmail.com>
Content-Type: text/plain
The interesting thing about R is that there are several ways to "skin the
cat"; here is yet another solution:
> do.call(rbind, by(ddTable, ddTable$Id, function(z) z[1,,drop=FALSE]))
Id name
1 1 Paul
2 2 Bob>
On Fri, Jun 6, 2008 at 9:35 AM, Emslie, Paul [Ctr] <emsliep at atac.mil>
wrote:
> I want to take the first row of each unique ID value from a data frame.
> For instance
> > ddTable <-
>
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
[[alternative HTML version deleted]]
------------------------------
Message: 64
Date: Fri, 6 Jun 2008 18:51:34 +0100 (BST)
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Subject: Re: [R] where to download BRugs?
To: Nanye Long <nanye.long at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<alpine.LFD.1.10.0806061847500.10799 at gannet.stats.ox.ac.uk>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Fri, 6 Jun 2008, Nanye Long wrote:
> Hi all,
>
> Does anyone know where to download the "BRugs" package? I did not
find
> it on r-project website. Thanks.
It is Windows-only, and you download it from 'CRAN (extras)' which is
part
of the default repository set on Windows versions of R. So
install.packages("BRugs")
is all that is needed unless you changed something to stop it working.
(It is only available for R >= 2.6.0.)
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
------------------------------
Message: 65
Date: Fri, 6 Jun 2008 10:54:06 -0700
From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Subject: Re: [R] Problem with subset
To: Luca Mortarini <l.mortarini at isac.cnr.it>
Cc: r-help at r-project.org
Message-ID: <Pine.LNX.4.64.0806061048150.28293 at tajo.ucsd.edu>
Content-Type: text/plain; charset="iso-8859-1";
Format="flowed"
On Fri, 6 Jun 2008, Luca Mortarini wrote:
> Hi,
> I am new to R and i am looking for a way to extract a subset from a
> vector.
> I have a vector of number oscillating around zero (a decreasing
> autocorrelation function) and i would like to extract only the first
> positive part of the function (from zero lag to the lag where the function
> inverts its sign for the first time).
> I have tried
>
> subset(myvector,myvector>0)
>
> but this obviously extract all the positive intervals not only the first
one.
> Is there a logical statement i can use in subset? I prefer not to use an
For vector subsets you probably want "[". Because from
help("[")
For ordinary vectors, the result is simply x[subset & !is.na(subset)].
But see
?rle
Something like
myvector[ 1 : rle( myvector >= 0 )$lengths[ 1 ] ]
should work.
HTH,
Chuck
> if statement that would probably slow down the code.
> Thanks a lot,
> Luca
>
>
> *********************************************************
> dr. Luca Mortarini l.mortarini at isac.cnr.it
> Universit? del Piemonte Orientale
> Dipartimento di Scienze e Tecnologie Avanzate
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
------------------------------
Message: 66
Date: Fri, 06 Jun 2008 13:57:39 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Gabor Grothendieck <ggrothendieck at gmail.com>
Cc: r-help at r-project.org
Message-ID: <48497A93.6050602 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue
[[elided Yahoo spam]]
well, here's a dput-ed version of the small data subset you can use for
testing. below that, an updated version of the function, with extra
explanatory comments, and producing an extra column showing exactly what
is matched to what.
to test, just run the function, with the dataset as sole argument.
Thanks again; i'd appreciate any input on this.
=========== begin dataset dput representation ===========
structure(list(PERMNO = c(10001L, 10001L, 10298L, 10298L, 10484L,
10484L, 10515L, 10515L, 10634L, 10634L, 11048L, 11048L, 11237L,
11294L, 11294L, 11338L, 11338L, 11404L, 11404L, 11587L, 11587L,
11591L, 11591L, 11737L, 11737L, 11791L, 11809L, 11809L, 11858L,
11858L, 11955L, 11955L, 12003L, 12003L, 12016L, 12016L, 12223L,
12223L, 12758L, 12758L, 13688L, 13688L, 16117L, 16117L, 17770L,
17770L, 21514L, 21514L, 21792L, 21792L, 21821L, 21821L, 22437L,
22437L, 22947L, 22947L, 23027L, 23027L, 23182L, 23182L, 23536L,
23536L, 23712L, 23712L, 24053L, 24053L, 24117L, 24117L, 24256L,
24256L, 24299L, 24299L, 24352L, 24352L, 24379L, 24379L, 24467L,
24467L, 24679L, 24679L, 24870L, 24870L, 25056L, 25056L, 25208L,
25208L, 25232L, 25232L, 25241L, 25590L, 25590L, 26463L, 26463L,
26470L, 26470L, 26614L, 26614L, 27385L, 27385L, 29196L, 29196L,
30411L, 30411L, 32943L, 32943L, 38893L, 38893L, 40708L, 40708L,
41005L, 41005L, 42817L, 42817L, 42833L, 42833L, 43668L, 43668L,
45947L, 45947L, 46017L, 46017L, 48274L, 48274L, 49971L, 49971L,
53786L, 53786L, 53859L, 53859L, 54199L, 54199L, 56371L, 56952L,
56952L, 57277L, 57277L, 57381L, 57381L, 58202L, 58202L, 59395L,
59395L, 59935L, 60169L, 60169L, 61188L, 61188L, 61444L, 61444L,
62690L, 62690L, 62842L, 62842L, 64290L, 64290L, 64418L, 64418L,
64450L, 64450L, 64477L, 64477L, 64557L, 64557L, 64646L, 64646L,
64902L, 64902L, 67774L, 67774L, 68910L, 68910L, 70471L, 70471L,
74406L, 74406L, 75091L, 75091L, 75304L, 75304L, 75743L, 75964L,
75964L, 76026L, 76026L, 76162L, 76170L, 76173L, 78530L, 78530L,
78682L, 78682L, 81569L, 81569L, 82502L, 82502L, 83337L, 83337L,
83919L, 83919L, 88242L, 88242L, 90852L, 90852L, 91353L, 91353L
), DATE = c(19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630,
19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900630,
19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331,
19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630,
19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331,
19900630, 19900331, 19900331, 19900630, 19900331, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331,
19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331,
19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630,
19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900630, 19900630, 19900630,
19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630,
19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630,
19900331, 19900630, 19900331, 19900630), Shares.Owned = c(50100,
50100, 250000, 293500, 3656629, 3827119, 4132439, 3566591, 2631193,
2500301, 775879, 816879, 38700, 1041600, 1070300, 533768, 558815,
61384492, 60466567, 194595, 196979, 359946, 314446, 106770, 107070,
20242, 1935098, 2099403, 1902125, 1766750, 41991, 41991, 34490,
36290, 589400, 596700, 1549395, 1759440, 854473, 762903, 156366785,
98780287, 2486389, 2635718, 122264, 122292, 25455916, 25458658,
71645490, 71855722, 30969596, 30409838, 2738576, 2814490, 20846605,
20930233, 1148299, 505415, 396388, 385714, 25239923, 24117950,
73465526, 73084616, 8096614, 7595742, 3937930, 3820215, 20884821,
19456342, 2127331, 2188276, 2334515, 2813347, 8267264, 8544084,
783277, 810742, 742048, 512956, 9659658, 9436873, 40107717, 41234384,
9111755, 9708782, 12815719, 13144148, 1146100, 8292392, 8271030,
282650, 281273, 4196126, 4273758, 2489363, 2734182, 1579681,
1369192, 51947585, 51941430, 54673, 52585, 317601, 314876, 62626258,
63341772, 8977553, 8940106, 4478872, 4315631, 1246339, 1227442,
68484747, 68041081, 22679902, 21775270, 927147, 936881, 2626449,
2245552, 14029366, 14304855, 2434123, 2184358, 77479654, 81754241,
333070, 282967, 241146, 256146, 11419, 819092, 798490, 1403179,
1326018, 238974451, 237684105, 1889699, 2317096, 4887641, 5972387,
3567239, 1024595, 993627, 5254732, 5459404, 413146, 432697, 5307595,
4813261, 7717872, 8689444, 2431341, 2372096, 909359, 868068,
2110670, 2055349, 23774859, 23573345, 4234466, 4143534, 1192314,
1255105, 3052000, 2605700, 5566270, 5972761, 1470173, 1448403,
28065345, 32961737, 1844441, 2247991, 651758, 655658, 65864806,
82392617, 1942906, 14800, 14657, 6600, 5534, 394064, 163000,
2499320, 1123624, 1227987, 198000, 241000, 3681688, 3409586,
2416988, 2407798, 55081, 48091, 480000, 785710, 1040147, 1171854,
1363994, 1555229, 199237, 192637), Shares.Outstanding.13f = c(1,
1, 7, 7, 8, 8, 8, 8, 6, 6, 8, 8, 4, 4, 4, 18, 19, 228, 228, 2,
2, 3, 3, 5, 5, 7, 9, 9, 6, 6, 2, 2, 3, 3, 7, 7, 14, 15, 3, 3,
429, 429, 17, 16, 2, 2, 43, 41, 127, 126, 86, 86, 15, 15, 51,
51, 7, 7, 3, 3, 67, 67, 211, 211, 35, 35, 14, 14, 49, 49, 12,
12, 22, 22, 31, 31, 4, 4, 4, 5, 34, 34, 64, 64, 56, 56, 27, 27,
47, 28, 28, 2, 2, 10, 10, 8, 8, 13, 13, 87, 87, 1, 1, 3, 3, 101,
101, 38, 36, 49, 56, 22, 22, 245, 247, 36, 35, 6, 6, 22, 22,
30, 30, 11, 11, 151, 151, 2, 2, 3, 3, 4, 4, 4, 10, 10, 468, 459,
10, 10, 16, 16, 27, 8, 8, 19, 19, 3, 3, 7, 7, 15, 15, 6, 6, 6,
6, 13, 13, 60, 60, 11, 11, 10, 10, 8, 8, 153, 152, 7, 7, 206,
206, 5, 5, 4, 4, 246, 299, 4, 0, 0, 13, 13, 7, 5, 10, 7, 7, 11,
11, 16, 16, 6, 6, 1, 1, 7, 7, 10, 10, 5, 5, 10, 10), Percent.Inst.Owned
= c(0.0501,
0.0501, 0.0357142857142857, 0.0419285714285714, 0.457078625,
0.478389875, 0.516554875, 0.445823875, 0.438532166666667,
0.416716833333333,
0.096984875, 0.102109875, 0.009675, 0.2604, 0.267575, 0.0296537777777778,
0.0294113157894737, 0.269230228070175, 0.26520424122807, 0.0972975,
0.0984895, 0.119982, 0.104815333333333, 0.021354, 0.021414,
0.00289171428571429,
0.215010888888889, 0.233267, 0.317020833333333, 0.294458333333333,
0.0209955, 0.0209955, 0.0114966666666667, 0.0120966666666667,
0.0842, 0.0852428571428571, 0.110671071428571, 0.117296, 0.284824333333333,
0.254301, 0.36449134032634, 0.230257079254079, 0.146258176470588,
0.164732375, 0.061132, 0.061146, 0.591998046511628, 0.62094287804878,
0.564137716535433, 0.570283507936508, 0.360111581395349, 0.353602767441860,
0.182571733333333, 0.187632666666667, 0.408756960784314, 0.410396725490196,
0.164042714285714, 0.0722021428571429, 0.132129333333333,
0.128571333333333,
0.376715268656716, 0.359969402985075, 0.348177848341232, 0.346372587677725,
0.231331828571429, 0.2170212, 0.281280714285714, 0.2728725,
0.426220836734694,
0.397068204081633, 0.177277583333333, 0.182356333333333, 0.106114318181818,
0.127879409090909, 0.266685935483871, 0.275615612903226, 0.19581925,
0.2026855, 0.185512, 0.1025912, 0.284107588235294, 0.277555088235294,
0.626683078125, 0.64428725, 0.162709910714286, 0.173371107142857,
0.474656259259259, 0.486820296296296, 0.0243851063829787,
0.296156857142857,
0.295393928571429, 0.141325, 0.1406365, 0.4196126, 0.4273758,
0.311170375, 0.34177275, 0.121513923076923, 0.105322461538462,
0.59709867816092, 0.597027931034483, 0.054673, 0.052585, 0.105867,
0.104958666666667, 0.62006196039604, 0.627146257425743, 0.236251394736842,
0.248336277777778, 0.0914055510204082, 0.0770648392857143,
0.0566517727272727,
0.0557928181818182, 0.279529579591837, 0.275469963562753,
0.629997277777778,
0.622150571428571, 0.1545245, 0.156146833333333, 0.119384045454545,
0.102070545454545, 0.467645533333333, 0.4768285, 0.221283909090909,
0.198578, 0.513110291390729, 0.541418814569536, 0.166535, 0.1414835,
0.080382, 0.085382, 0.00285475, 0.204773, 0.1996225, 0.1403179,
0.1326018, 0.510629168803419, 0.517830294117647, 0.1889699, 0.2317096,
0.3054775625, 0.3732741875, 0.132119962962963, 0.128074375, 0.124203375,
0.276564842105263, 0.287337052631579, 0.137715333333333, 0.144232333333333,
0.758227857142857, 0.687608714285714, 0.5145248, 0.579296266666667,
0.4052235, 0.395349333333333, 0.151559833333333, 0.144678,
0.162359230769231,
0.158103769230769, 0.39624765, 0.392889083333333, 0.384951454545455,
0.376684909090909, 0.1192314, 0.1255105, 0.3815, 0.3257125,
0.0363808496732026,
0.0392944802631579, 0.210024714285714, 0.206914714285714,
0.136239538834951,
0.160008432038835, 0.3688882, 0.4495982, 0.1629395, 0.1639145,
0.267743113821138, 0.275560591973244, 0.4857265, Inf, Inf,
0.000507692307692308,
0.000425692307692308, 0.0562948571428571, 0.0326, 0.249932,
0.160517714285714,
0.175426714285714, 0.018, 0.0219090909090909, 0.2301055, 0.213099125,
0.402831333333333, 0.401299666666667, 0.055081, 0.048091,
0.0685714285714286,
0.112244285714286, 0.1040147, 0.1171854, 0.2727988, 0.3110458,
0.0199237, 0.0192637), Latest.Issue.Date.ByPERMNO = c(19860108,
19860108, 19600101, 19600101, 19600101, 19600101, 19870728, 19870728,
19870501, 19870501, 19870805, 19870805, 19600101, 19600101, 19600101,
19600101, 19600101, 19730523, 19730523, 19600101, 19600101, 19870811,
19870811, 19870930, 19870930, 19600101, 19880729, 19880729, 19880225,
19880225, 19880602, 19880602, 19860610, 19860610, 19880802, 19880802,
19890629, 19890629, 19600101, 19600101, 19821109, 19821109, 19860619,
19860619, 19871117, 19871117, 19600101, 19600101, 19890308, 19890308,
19900208, 19900208, 19861120, 19861120, 19880803, 19880803, 19600101,
19600101, 19890216, 19890216, 19761202, 19761202, 19890919, 19890919,
19810623, 19810623, 19770615, 19770615, 19831004, 19831004, 19830616,
19830616, 19810519, 19810519, 19850311, 19850311, 19781130, 19781130,
19841016, 19900515, 19800904, 19800904, 19830825, 19830825, 19830601,
19830601, 19811110, 19811110, 19600101, 19890309, 19890309, 19850529,
19850529, 19881122, 19881122, 19840620, 19840620, 19740305, 19740305,
19860718, 19860718, 19600101, 19600101, 19860207, 19860207, 19891003,
19891003, 19870403, 19870403, 19600101, 19600101, 19790403, 19790403,
19850528, 19850528, 19830322, 19830322, 19761202, 19761202, 19841114,
19841114, 19800826, 19800826, 19880517, 19880517, 19860516, 19860516,
19891122, 19891122, 19600101, 19600101, 19600101, 19871119, 19871119,
19760624, 19760624, 19851206, 19851206, 19890615, 19890615, 19860805,
19860805, 19600101, 19890919, 19890919, 19860501, 19860501, 19600101,
19600101, 19890308, 19890308, 19900125, 19900125, 19890714, 19890714,
19880412, 19880412, 19890809, 19890809, 19870306, 19870306, 19751112,
19751112, 19870604, 19870604, 19810625, 19810625, 19600101, 19600101,
19860416, 19860416, 19891027, 19891027, 19890125, 19890125, 19860502,
19860502, 19600101, 19600101, 19900405, 19600101, 19600101, 19600101,
19600101, 19900412, 19900514, 19900518, 19890518, 19890518, 19600101,
19600101, 19900117, 19900117, 19891214, 19891214, 19600101, 19600101,
19600101, 19600101, 19851206, 19851206, 19851211, 19851211, 19600101,
19600101), Quarters.Since.19800101 = c(41L, 42L, 42L, 41L, 41L,
42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 41L, 41L, 42L, 41L,
42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 41L,
42L, 42L, 41L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L,
41L, 42L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 41L, 42L, 42L,
41L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L,
41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L,
42L, 41L, 42L, 42L, 41L, 41L, 41L, 42L, 42L, 41L, 41L, 42L, 41L,
42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 41L,
41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L,
41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 42L, 41L, 42L, 42L,
41L, 42L, 41L, 42L, 41L, 41L, 42L, 41L, 41L, 42L, 41L, 42L, 41L,
42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L,
42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 41L, 42L, 42L,
41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 42L,
41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L,
42L, 41L, 42L, 41L, 42L), Quarters.Since.Latest.Issue = c(17L,
18L, 122L, 121L, 121L, 122L, 11L, 12L, 12L, 13L, 11L, 12L, 122L,
122L, 121L, 121L, 122L, 68L, 69L, 121L, 122L, 12L, 11L, 11L,
10L, 122L, 8L, 7L, 10L, 9L, 8L, 9L, 17L, 16L, 8L, 7L, 4L, 5L,
121L, 122L, 30L, 31L, 16L, 17L, 10L, 11L, 122L, 121L, 6L, 5L,
2L, 1L, 15L, 14L, 7L, 8L, 122L, 121L, 5L, 6L, 55L, 54L, 3L, 4L,
36L, 37L, 53L, 52L, 26L, 27L, 28L, 29L, 37L, 36L, 21L, 22L, 46L,
47L, 22L, 1L, 39L, 40L, 27L, 28L, 28L, 29L, 35L, 34L, 121L, 5L,
6L, 21L, 20L, 6L, 7L, 24L, 25L, 66L, 65L, 15L, 16L, 121L, 122L,
18L, 17L, 3L, 2L, 13L, 12L, 121L, 122L, 44L, 45L, 20L, 21L, 29L,
30L, 54L, 55L, 22L, 23L, 40L, 39L, 8L, 9L, 16L, 17L, 3L, 2L,
121L, 122L, 122L, 10L, 11L, 57L, 56L, 19L, 18L, 5L, 4L, 15L,
16L, 121L, 3L, 4L, 16L, 17L, 121L, 122L, 6L, 5L, 1L, 2L, 3L,
4L, 8L, 9L, 3L, 4L, 13L, 14L, 59L, 58L, 12L, 13L, 36L, 37L, 122L,
121L, 17L, 16L, 2L, 3L, 6L, 5L, 16L, 17L, 121L, 122L, 1L, 121L,
122L, 121L, 122L, 1L, 1L, 1L, 4L, 5L, 121L, 122L, 1L, 2L, 3L,
2L, 121L, 122L, 121L, 122L, 18L, 19L, 18L, 19L, 121L, 122L),
ALTPRC = c(9.9375, 9.875, 0.45313, 0.67188, 7.875, 10, 18,
22, 14.75, 9.75, 0.375, 0.15625, 3.9375, 16, 14.25, 7, 7.125,
27.25, 23.375, 10.75, 13, 3.125, 3.125, 2.6875, 3.4375, 0.5,
8.75, 7, 16.875, 12.375, 2.40625, 3.96875, 4, 4.625, 4.5,
5.125, 26.25, 28.75, 4.5, 5.5, 21.75, 23.25, 15, 14.375,
16.625, 14, 50.5, 48.75, 31.875, 33.125, 41.5, 46, 21, 22.125,
30.75, 30.125, 10.375, 5.5, 11.5, 11, 29, 28.875, 27.25,
26.75, 22.375, 22.25, 33.375, 35, 21, 19.75, 29.875, 28.875,
22.125, 20.125, 21, 18.875, 24.625, 26.75, 21.75, 22, 22.125,
21.125, 24.75, 26.75, 42.75, 43.5, 13.375, 29.625, 0.07813,
25.125, 23.75, 18, 20, 17.5, 18.125, 18.875, 19, 28.875,
30, 23.875, 23.625, 15.5, 15.625, 17.5, 19.5, 34.75, 30.75,
2, 2.25, 18.625, 17.5, 21.375, 19.875, 45.25, 20.125, 37.25,
41.75, 32.25, 32.5, 23.125, 21.875, 35.25, 38.75, 27.875,
27.375, 35.875, 42.125, 24.25, 24.5, 25.125, 23.875, 2.0625,
16.75, 16.25, 34.625, 37.75, 40, 31.625, 19.375, 20, 30.875,
29.375, 0.125, 17.625, 17, 16.625, 17.75, 12.625, 13.25,
26, 19.75, 15.25, 18.625, 18.125, 18, 16.375, 15.625, 18.5,
19, 12.875, 14.375, 32.375, 33.375, 16.375, 16.375, 1.625,
2.8125, 13.875, 14.625, 4.625, 4.5, 18.5, 24.125, 6.375,
5.875, 10.625, 11.625, 6.625, 7.375, 14.75, 0.8125, 0.6875,
2.125, 2.375, 20.25, 7.625, 34, 15.25, 15, 2.09375, 2.375,
19.5, 18.125, 38.5, 30.75, 36, 35.75, 9.375, 11.25, 21.25,
18.625, 6, 5.25, 1.15625, 1.25), HSICIG = c(492, 492, 494,
494, 495, 495, 495, 495, 495, 495, 493, 493, 495, 495, 495,
495, 495, 493, 493, 492, 492, 495, 495, 495, 495, 495, 495,
495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495,
495, 493, 493, 494, 494, 492, 492, 492, 492, 493, 493, 492,
492, 493, 493, 493, 493, 495, 495, 492, 492, 493, 493, 493,
493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493,
493, 493, 493, 492, 492, 493, 493, 492, 492, 493, 493, 492,
492, 495, 492, 492, 494, 494, 492, 492, 492, 492, 493, 493,
492, 492, 494, 494, 492, 492, 492, 492, 492, 492, 492, 492,
493, 493, 493, 493, 492, 492, 493, 493, 493, 493, 492, 492,
492, 492, 495, 495, 494, 494, 494, 494, 495, 492, 492, 493,
493, 495, 495, 492, 492, 492, 492, 495, 492, 492, 492, 492,
492, 492, 492, 492, 494, 494, 492, 492, 492, 492, 492, 492,
495, 495, 493, 493, 492, 492, 495, 495, 492, 492, 492, 492,
495, 495, 495, 495, 495, 495, 492, 492, 495, 495, 495, 492,
492, 495, 495, 495, 492, 492, 494, 494, 492, 492, 495, 495,
492, 492, 493, 493, 494, 494, 495, 495, 495, 495), Market.Cap.13f
c(9937500,
9875000, 3171910, 4703160, 6.3e+07, 8e+07, 1.44e+08, 1.76e+08,
88500000, 58500000, 3e+06, 1250000, 15750000, 6.4e+07, 5.7e+07,
1.26e+08, 135375000, 6.213e+09, 5329500000, 21500000, 2.6e+07,
9375000, 9375000, 13437500, 17187500, 3500000, 78750000,
6.3e+07, 101250000, 74250000, 4812500, 7937500, 1.2e+07,
13875000, 31500000, 35875000, 367500000, 431250000, 13500000,
16500000, 9330750000, 9974250000, 2.55e+08, 2.3e+08, 33250000,
2.8e+07, 2171500000, 1998750000, 4048125000, 4173750000,
3.569e+09, 3.956e+09, 3.15e+08, 331875000, 1568250000, 1536375000,
72625000, 38500000, 34500000, 3.3e+07, 1.943e+09, 1934625000,
5749750000, 5644250000, 783125000, 778750000, 467250000,
4.9e+08, 1.029e+09, 967750000, 358500000, 346500000, 486750000,
442750000, 6.51e+08, 585125000, 98500000, 1.07e+08, 8.7e+07,
1.1e+08, 752250000, 718250000, 1.584e+09, 1.712e+09, 2.394e+09,
2.436e+09, 361125000, 799875000, 3672110, 703500000, 6.65e+08,
3.6e+07, 4e+07, 1.75e+08, 181250000, 1.51e+08, 1.52e+08,
375375000, 3.9e+08, 2077125000, 2055375000, 15500000, 15625000,
52500000, 58500000, 3509750000, 3105750000, 7.6e+07, 8.1e+07,
912625000, 9.8e+08, 470250000, 437250000, 11086250000, 4970875000,
1.341e+09, 1461250000, 193500000, 1.95e+08, 508750000, 481250000,
1057500000, 1162500000, 306625000, 301125000, 5417125000,
6360875000, 48500000, 4.9e+07, 75375000, 71625000, 8250000,
6.7e+07, 6.5e+07, 346250000, 377500000, 1.872e+10, 14515875000,
193750000, 2e+08, 4.94e+08, 4.7e+08, 3375000, 1.41e+08, 1.36e+08,
315875000, 337250000, 37875000, 39750000, 1.82e+08, 138250000,
228750000, 279375000, 108750000, 1.08e+08, 98250000, 93750000,
240500000, 2.47e+08, 772500000, 862500000, 356125000, 367125000,
163750000, 163750000, 1.3e+07, 22500000, 2122875000, 2.223e+09,
32375000, 31500000, 3.811e+09, 4969750000, 31875000, 29375000,
42500000, 46500000, 1629750000, 2205125000, 5.9e+07, 0, 0,
27625000, 30875000, 141750000, 38125000, 3.4e+08, 106750000,
1.05e+08, 23031250, 26125000, 3.12e+08, 2.9e+08, 2.31e+08,
184500000, 3.6e+07, 35750000, 65625000, 78750000, 212500000,
186250000, 3e+07, 26250000, 11562500, 12500000), IPO.Flag = c(0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0), IPO.Issue.Date = c(NA,
NA, NA, NA, NA, NA, 19860724, 19860724, NA, NA, 19870127,
19870127, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19870811, 19870811,
19870930, 19870930, NA, 19871124, 19871124, 19880225, 19880225,
19880602, 19880602, NA, NA, 19880802, 19880802, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19710324, 19710324,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 19710617, 19710617, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
19831014, 19831014, 19861016, 19861016, NA, NA, 19860502,
19860502, NA, NA, 19890419, NA, NA, NA, NA, 19900412, 19900514,
19900518, NA, NA, NA, NA, NA, NA, 19830603, 19830603, NA,
NA, NA, NA, 19851206, 19851206, 19851211, 19851211, NA, NA
), Quarters.Since.IPO.Issue = c(NA, NA, NA, NA, NA, NA, 15L,
16L, NA, NA, 13L, 14L, NA, NA, NA, NA, NA, NA, NA, NA, NA,
12L, 11L, 11L, 10L, NA, 11L, 10L, 10L, 9L, 8L, 9L, NA, NA,
8L, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 77L, 78L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 77L, 76L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, 27L, 26L, 14L, 15L, NA, NA, 16L, 17L, NA, NA, 5L, NA,
NA, NA, NA, 1L, 1L, 1L, NA, NA, NA, NA, NA, NA, 29L, 28L,
NA, NA, NA, NA, 18L, 19L, 18L, 19L, NA, NA)), .Names =
c("PERMNO",
"DATE", "Shares.Owned", "Shares.Outstanding.13f",
"Percent.Inst.Owned",
"Latest.Issue.Date.ByPERMNO", "Quarters.Since.19800101",
"Quarters.Since.Latest.Issue",
"ALTPRC", "HSICIG", "Market.Cap.13f",
"IPO.Flag", "IPO.Issue.Date",
"Quarters.Since.IPO.Issue"), row.names = c(79L, 85L, 9902L, 9908L,
15739L, 15758L, 16673L, 16675L, 20159L, 20160L, 32879L, 32889L,
38023L, 39404L, 39409L, 40405L, 40420L, 43114L, 43116L, 47939L,
47953L, 48091L, 48120L, 52828L, 52837L, 54612L, 55002L, 55048L,
56506L, 56508L, 59230L, 59247L, 60454L, 60461L, 60845L, 60852L,
66143L, 66147L, 69439L, 69454L, 72218L, 72232L, 81826L, 81840L,
87882L, 87883L, 105814L, 105832L, 106687L, 106709L, 106867L,
106875L, 110008L, 110081L, 113124L, 113125L, 113448L, 113460L,
114419L, 114431L, 116222L, 116234L, 117215L, 117310L, 119463L,
119477L, 119913L, 119927L, 120787L, 120799L, 121214L, 121215L,
121541L, 121548L, 121670L, 121680L, 122420L, 122421L, 123629L,
123679L, 124479L, 124485L, 125607L, 125608L, 126683L, 126716L,
126911L, 126954L, 126986L, 128941L, 128979L, 132991L, 133048L,
133090L, 133091L, 134227L, 134228L, 137449L, 137465L, 146656L,
146710L, 151717L, 151728L, 162724L, 162738L, 186344L, 186346L,
194239L, 194251L, 195124L, 195125L, 203411L, 203426L, 203486L,
203487L, 206821L, 206863L, 218733L, 218734L, 219083L, 219084L,
232389L, 232401L, 241221L, 241222L, 262518L, 262556L, 263151L,
263154L, 264783L, 264811L, 275743L, 278957L, 278958L, 281230L,
281242L, 281957L, 281962L, 286492L, 286504L, 294444L, 294445L,
297641L, 298974L, 298988L, 304628L, 304669L, 306326L, 306339L,
315987L, 316013L, 316939L, 316940L, 327003L, 327032L, 327976L,
327977L, 328372L, 328386L, 328621L, 328622L, 329277L, 329289L,
329983L, 329984L, 331735L, 331746L, 350849L, 350887L, 357747L,
357750L, 366913L, 366917L, 380680L, 380749L, 385635L, 385642L,
394280L, 394281L, 410203L, 417419L, 417420L, 418842L, 418851L,
423401L, 423687L, 423795L, 494497L, 494498L, 496519L, 496520L,
576735L, 576737L, 590042L, 590057L, 606077L, 606087L, 620736L,
620737L, 704834L, 704837L, 749540L, 749573L, 754161L, 754162L
), class = "data.frame")
======== end dataset dput representation ========
=========== begin function code
=========fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
quarters_since_issue=20) {
# tfdata must not have NAs for market.cap
result = matrix(nrow=0, ncol=(ncol(tfdata) + 1)) # rbind for matrix
is cheaper, so typecast the result to matrix
colnames = names(tfdata) #grab the colnames, which we will shove
back to the result at the end when we reconvert to data.frame
quarterends = sort(unique(tfdata$DATE)) # the data are quarterly,
all dates are quarter ends
# basic code logic:
# grab each quarter's data, in each quarter get the ipo subset, and
the eligible matching firm subset
# for each ipo from the ipo subset, select a matching firm from the
eligible matching firm subset
# the said selection is done based on industry group (HSICIG), and
market cap (Market.Cap.13f)
# Industry group has to be the same, market cap has to be 'closest
one from above', or if that is not available, then 'closest one from
below'.
for (aquarter in quarterends) {
tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
tfdata_quarter_fitting_nonissuers = tfdata_quarter[
(tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
(tfdata_quarter$IPO.Flag == 0), ]
tfdata_quarter_ipoissuers = tfdata_quarter[
tfdata_quarter$IPO.Flag == 1, ]
for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
arow = tfdata_quarter_ipoissuers[i,]
industrypeers = tfdata_quarter_fitting_nonissuers[
tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
industrypeers = industrypeers[
order(industrypeers$Market.Cap.13f), ]
if ( nrow(industrypeers) > 0 ) {
if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>arow$Market.Cap.13f, ]) > 0 ) {
bestpeer industrypeers[industrypeers$Market.Cap.13f >=
arow$Market.Cap.13f, ][1,]
}
else {
bestpeer = industrypeers[nrow(industrypeers),]
}
bestpeer$Quarters.Since.IPO.Issue arow$Quarters.Since.IPO.Issue
bestpeer$Peer.To.PERMNO = arow$PERMNO
result = rbind(result, as.matrix(bestpeer))
}
}
print (aquarter)
}
result = as.data.frame(result)
names(result) = c(colnames, 'Peer.To.PERMNO')
return(result)
}
============== end function code==========
on 06/06/2008 01:35 PM Gabor Grothendieck said the
following:> I think the posting guide may not be clear enough and have suggested that
> it be clarified. Hopefully this better communicates what is required and
why
> in a shorter amount of space:
>
> https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html
>
>
> On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:
>> i thought since the function code (which i provided in full) was pretty
>> short, it would be reasonably easy to just read the code and see what
it's
>> doing.
>>
>> but ok, so... i am attaching a zip file, with a small sample of the
data set
>> (tab delimited), and the function code, in a zip file (posting
guidelines
>> claim that "some archive formats" are allowed, i assume zip
is one of
>> them...
>>
>> would appreciate your comments! :)
>>
>> on 06/06/2008 12:05 PM Gabor Grothendieck said the following:
>>> Its summarized in the last line to r-help. Note reproducible and
>>> minimal.
>>>
>>> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins
at gmail.com>
>>> wrote:
>>>> i did! what did i miss?
>>>>
>>>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>>> Try reading the posting guide before posting.
>>>>>
>>>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn
>>>>> <dfolkins at gmail.com>
>>>>> wrote:
>>>>>> Anybody have any thoughts on this? Please? :)
>>>>>>
>>>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the
following:
>>>>>>> Hi everyone!
>>>>>>>
>>>>>>> I have a question about data processing efficiency.
>>>>>>>
>>>>>>> My data are as follows: I have a data set on
quarterly institutional
>>>>>>> ownership of equities; some of them have had recent
IPOs, some have
>>>>>>> not
>>>>>>> (I
>>>>>>> have a binary flag set). The total dataset size is
700k+ rows.
>>>>>>>
>>>>>>> My goal is this: For every quarter since issue for
each IPO, I need to
>>>>>>> find a "matched" firm in the same
industry, and close in market cap.
>>>>>>> So,
>>>>>>> e.g., for firm X, which had an IPO, i need to find
a matched
>>>>>>> non-issuing
>>>>>>> firm in quarter 1 since IPO, then a (possibly
different) non-issuing
>>>>>>> firm in
>>>>>>> quarter 2 since IPO, etc. Repeat for each issuing
firm (there are
>>>>>>> about
>>>>>>> 8300
>>>>>>> of these).
>>>>>>>
>>>>>>> Thus it seems to me that I need to be doing a lot
of data selection
>>>>>>> and
>>>>>>> subsetting, and looping (yikes!), but the result
appears to be highly
>>>>>>> inefficient and takes ages (well, many hours). What
I am doing, in
>>>>>>> pseudocode, is this:
>>>>>>>
>>>>>>> 1. for each quarter of data, getting out all the
IPOs and all the
>>>>>>> eligible
>>>>>>> non-issuing firms.
>>>>>>> 2. for each IPO in a quarter, grab all the
non-issuers in the same
>>>>>>> industry, sort them by size, and finally grab a
matching firm closest
>>>>>>> in
>>>>>>> size (the exact procedure is to grab the closest
bigger firm if one
>>>>>>> exists,
>>>>>>> and just the biggest available if all are smaller)
>>>>>>> 3. assign the matched firm-observation the same
"quarters since issue"
>>>>>>> as
>>>>>>> the IPO being matched
>>>>>>> 4. rbind them all into the "matching"
dataset.
>>>>>>>
>>>>>>> The function I currently have is pasted below, for
your reference. Is
>>>>>>> there any way to make it produce the same result
but much faster?
>>>>>>> Specifically, I am guessing eliminating some loops
would be very good,
>>>>>>> but I
>>>>>>> don't see how, since I need to do some fancy
footwork for each IPO in
>>>>>>> each
>>>>>>> quarter to find the matching firm. I'll be
doing a few things similar
>>>>>>> to
>>>>>>> this, so it's somewhat important to up the
efficiency of this. Maybe
>>>>>>> some of
>>>>>>> you R-fu masters can clue me in? :)
>>>>>>>
>>>>>>> I would appreciate any help, tips, tricks, tweaks,
you name it! :)
>>>>>>>
>>>>>>> ========== my function below
==========>>>>>>>
>>>>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
function(tfdata,
>>>>>>> quarters_since_issue=40) {
>>>>>>>
>>>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind
for matrix is
>>>>>>> cheaper, so typecast the result to matrix
>>>>>>>
>>>>>>> colnames = names(tfdata)
>>>>>>>
>>>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>>>
>>>>>>> for (aquarter in quarterends) {
>>>>>>> tfdata_quarter = tfdata[tfdata$DATE ==
aquarter, ]
>>>>>>>
>>>>>>> tfdata_quarter_fitting_nonissuers =
tfdata_quarter[
>>>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue) &
>>>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>>>> tfdata_quarter$IPO.Flag
>>>>>>> == 1, ]
>>>>>>>
>>>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>>>> industrypeers =
tfdata_quarter_fitting_nonissuers[
>>>>>>> tfdata_quarter_fitting_nonissuers$HSICIG ==
arow$HSICIG, ]
>>>>>>> industrypeers = industrypeers[
>>>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>>>> if (
nrow(industrypeers[industrypeers$Market.Cap.13f >>>>>>>>
arow$Market.Cap.13f, ]) > 0 ) {
>>>>>>> bestpeer =
industrypeers[industrypeers$Market.Cap.13f
>>>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>>> }
>>>>>>> else {
>>>>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>>>>> }
>>>>>>> bestpeer$Quarters.Since.IPO.Issue
>>>>>>> arow$Quarters.Since.IPO.Issue
>>>>>>>
>>>>>>>
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>>>>> bestpeer$PERMNO] = 1
>>>>>>> result = rbind(result,
as.matrix(bestpeer))
>>>>>>> }
>>>>>>> }
>>>>>>> #result = rbind(result, tfdata_quarter)
>>>>>>> print (aquarter)
>>>>>>> }
>>>>>>>
>>>>>>> result = as.data.frame(result)
>>>>>>> names(result) = colnames
>>>>>>> return(result)
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> ========= end of my function
============>>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>>
>
------------------------------
Message: 67
Date: Fri, 6 Jun 2008 14:01:01 -0400
From: "Thompson, David (MNR)" <David.John.Thompson at
ontario.ca>
Subject: Re: [R] ggplot questions
To: "ONKELINX, Thierry" <Thierry.ONKELINX at inbo.be>,
"hadley wickham"
<h.wickham at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<ECF21B71808ECF4F8918C57EDBEE121D028C2532 at
CTSPITDCEMMVX11.cihs.ad.gov.on.ca>
Content-Type: text/plain; charset="us-ascii"
Thanx Thierry,
Suggestion #1 had no effect.
I have been playing with variants on #2 along the way.
DaveT.>-----Original Message-----
>From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be]
>Sent: June 6, 2008 04:02 AM
>To: Thompson, David (MNR); hadley wickham
>Cc: r-help at r-project.org
>Subject: RE: [R] ggplot questions
>
>David,
>
>1. Try scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim >c(0,
16))
>2. You could set the colour of the gridlines equal to the backgroup
>colour with ggopt
>
>HTH,
>
>Thierry
>
------------------------------
Message: 68
Date: Fri, 06 Jun 2008 19:03:44 +0100
From: Patrick Burns <pburns at pburns.seanet.com>
Subject: Re: [R] Improving data processing efficiency
To: Daniel Folkinshteyn <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID: <48497C00.7000206 at pburns.seanet.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
That is going to be situation dependent, but if you
have a reasonable upper bound, then that will be
much easier and not far from optimal.
If you pick the possibly too small route, then increasing
the size in largish junks is much better than adding
a row at a time.
Pat
Daniel Folkinshteyn wrote:> thanks for the tip! i'll try that and see how big of a difference that
> makes... if i am not sure what exactly the size will be, am i better
> off making it larger, and then later stripping off the blank rows, or
> making it smaller, and appending the missing rows?
>
> on 06/06/2008 11:44 AM Patrick Burns said the following:
>> One thing that is likely to speed the code significantly
>> is if you create 'result' to be its final size and then
>> subscript into it. Something like:
>>
>> result[i, ] <- bestpeer
>>
>> (though I'm not sure if 'i' is the proper index).
>>
>> Patrick Burns
>> patrick at burns-stat.com
>> +44 (0)20 8525 0696
>> http://www.burns-stat.com
>> (home of S Poetry and "A Guide for the Unwilling S User")
>>
>> Daniel Folkinshteyn wrote:
>>> Anybody have any thoughts on this? Please? :)
>>>
>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>> Hi everyone!
>>>>
>>>> I have a question about data processing efficiency.
>>>>
>>>> My data are as follows: I have a data set on quarterly
>>>> institutional ownership of equities; some of them have had
recent
>>>> IPOs, some have not (I have a binary flag set). The total
dataset
>>>> size is 700k+ rows.
>>>>
>>>> My goal is this: For every quarter since issue for each IPO, I
need
>>>> to find a "matched" firm in the same industry, and
close in market
>>>> cap. So, e.g., for firm X, which had an IPO, i need to find a
>>>> matched non-issuing firm in quarter 1 since IPO, then a
(possibly
>>>> different) non-issuing firm in quarter 2 since IPO, etc. Repeat
for
>>>> each issuing firm (there are about 8300 of these).
>>>>
>>>> Thus it seems to me that I need to be doing a lot of data
selection
>>>> and subsetting, and looping (yikes!), but the result appears to
be
>>>> highly inefficient and takes ages (well, many hours). What I am
>>>> doing, in pseudocode, is this:
>>>>
>>>> 1. for each quarter of data, getting out all the IPOs and all
the
>>>> eligible non-issuing firms.
>>>> 2. for each IPO in a quarter, grab all the non-issuers in the
same
>>>> industry, sort them by size, and finally grab a matching firm
>>>> closest in size (the exact procedure is to grab the closest
bigger
>>>> firm if one exists, and just the biggest available if all are
smaller)
>>>> 3. assign the matched firm-observation the same "quarters
since
>>>> issue" as the IPO being matched
>>>> 4. rbind them all into the "matching" dataset.
>>>>
>>>> The function I currently have is pasted below, for your
reference.
>>>> Is there any way to make it produce the same result but much
>>>> faster? Specifically, I am guessing eliminating some loops
would be
>>>> very good, but I don't see how, since I need to do some
fancy
>>>> footwork for each IPO in each quarter to find the matching
firm.
>>>> I'll be doing a few things similar to this, so it's
somewhat
>>>> important to up the efficiency of this. Maybe some of you R-fu
>>>> masters can clue me in? :)
>>>>
>>>> I would appreciate any help, tips, tricks, tweaks, you name it!
:)
>>>>
>>>> ========== my function below ==========>>>>
>>>> fcn_create_nonissuing_match_by_quarterssinceissue
>>>> function(tfdata, quarters_since_issue=40) {
>>>>
>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for
matrix
>>>> is cheaper, so typecast the result to matrix
>>>>
>>>> colnames = names(tfdata)
>>>>
>>>> quarterends = sort(unique(tfdata$DATE))
>>>>
>>>> for (aquarter in quarterends) {
>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>
>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue)
>>>> & (tfdata_quarter$IPO.Flag == 0), ]
>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>> tfdata_quarter$IPO.Flag == 1, ]
>>>>
>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>> industrypeers = industrypeers[
>>>> order(industrypeers$Market.Cap.13f), ]
>>>> if ( nrow(industrypeers) > 0 ) {
>>>> if (
>>>> nrow(industrypeers[industrypeers$Market.Cap.13f
>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>> bestpeer >>>>
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f,
>>>> ][1,]
>>>> }
>>>> else {
>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>> }
>>>> bestpeer$Quarters.Since.IPO.Issue
>>>> arow$Quarters.Since.IPO.Issue
>>>>
>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>> bestpeer$PERMNO] = 1
>>>> result = rbind(result, as.matrix(bestpeer))
>>>> }
>>>> }
>>>> #result = rbind(result, tfdata_quarter)
>>>> print (aquarter)
>>>> }
>>>>
>>>> result = as.data.frame(result)
>>>> names(result) = colnames
>>>> return(result)
>>>>
>>>> }
>>>>
>>>> ========= end of my function ============>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------
Message: 69
Date: Fri, 6 Jun 2008 13:06:22 -0500
From: "hadley wickham" <h.wickham at gmail.com>
Subject: Re: [R] ggplot questions
To: "Thompson, David (MNR)" <David.John.Thompson at ontario.ca>
Cc: r-help at r-project.org
Message-ID:
<f8e6ff050806061106o1ab6583je2c195b42d704689 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
> Does the difference have something to do with ggplot() using ranges
> derived from the data?
> When I modify my original 'test' dataframe with two extra rows as
> defined below, I get expected results in both versions.
Order shouldn't matter - and if it's making a difference, that's a
bug. But I'm still not completely sure what you're expecting.
> This highlights my next question (warned you ;-) ), I have been
> unsuccessful in trying to define fixed plotting ranges to generate a
> 'template' graphic that I may reuse with successive 'overstory
plot'
> data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but, this
seems to
> not have any effect on the final plot layout.
Could you please produce a small reproducible example that
demonstrates this? It may well be a bug.
Hadley
--
http://had.co.nz/
------------------------------
Message: 70
Date: Fri, 06 Jun 2008 14:09:40 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Patrick Burns <pburns at pburns.seanet.com>
Cc: r-help at r-project.org
Message-ID: <48497D64.4090905 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Cool, I do have an upper bound, so I'll try it and how much of a
[[elided Yahoo spam]]
on 06/06/2008 02:03 PM Patrick Burns said the following:> That is going to be situation dependent, but if you
> have a reasonable upper bound, then that will be
> much easier and not far from optimal.
>
> If you pick the possibly too small route, then increasing
> the size in largish junks is much better than adding
> a row at a time.
>
> Pat
>
> Daniel Folkinshteyn wrote:
>> thanks for the tip! i'll try that and see how big of a difference
that
>> makes... if i am not sure what exactly the size will be, am i better
>> off making it larger, and then later stripping off the blank rows, or
>> making it smaller, and appending the missing rows?
>>
>> on 06/06/2008 11:44 AM Patrick Burns said the following:
>>> One thing that is likely to speed the code significantly
>>> is if you create 'result' to be its final size and then
>>> subscript into it. Something like:
>>>
>>> result[i, ] <- bestpeer
>>>
>>> (though I'm not sure if 'i' is the proper index).
>>>
>>> Patrick Burns
>>> patrick at burns-stat.com
>>> +44 (0)20 8525 0696
>>> http://www.burns-stat.com
>>> (home of S Poetry and "A Guide for the Unwilling S User")
>>>
>>> Daniel Folkinshteyn wrote:
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly
>>>>> institutional ownership of equities; some of them have had
recent
>>>>> IPOs, some have not (I have a binary flag set). The total
dataset
>>>>> size is 700k+ rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each
IPO, I need
>>>>> to find a "matched" firm in the same industry,
and close in market
>>>>> cap. So, e.g., for firm X, which had an IPO, i need to find
a
>>>>> matched non-issuing firm in quarter 1 since IPO, then a
(possibly
>>>>> different) non-issuing firm in quarter 2 since IPO, etc.
Repeat for
>>>>> each issuing firm (there are about 8300 of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data
selection
>>>>> and subsetting, and looping (yikes!), but the result
appears to be
>>>>> highly inefficient and takes ages (well, many hours). What
I am
>>>>> doing, in pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and
all the
>>>>> eligible non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in
the same
>>>>> industry, sort them by size, and finally grab a matching
firm
>>>>> closest in size (the exact procedure is to grab the closest
bigger
>>>>> firm if one exists, and just the biggest available if all
are smaller)
>>>>> 3. assign the matched firm-observation the same
"quarters since
>>>>> issue" as the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your
reference.
>>>>> Is there any way to make it produce the same result but
much
>>>>> faster? Specifically, I am guessing eliminating some loops
would be
>>>>> very good, but I don't see how, since I need to do some
fancy
>>>>> footwork for each IPO in each quarter to find the matching
firm.
>>>>> I'll be doing a few things similar to this, so it's
somewhat
>>>>> important to up the efficiency of this. Maybe some of you
R-fu
>>>>> masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name
it! :)
>>>>>
>>>>> ========== my function below ==========>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue
>>>>> function(tfdata, quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for
matrix
>>>>> is cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue)
>>>>> & (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers =
tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if (
>>>>> nrow(industrypeers[industrypeers$Market.Cap.13f
>>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer >>>>>
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f,
>>>>> ][1,]
>>>>> }
>>>>> else {
>>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>>
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function
============>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
------------------------------
Message: 71
Date: Fri, 6 Jun 2008 19:14:59 +0100 (BST)
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Subject: Re: [R] boxplot changes fontsize of labels
To: Sebastian Merz <sebastian.merz at web.de>
Cc: r-help at r-project.org
Message-ID:
<alpine.LFD.1.10.0806061905080.10799 at gannet.stats.ox.ac.uk>
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Please read the help for par(mfrow)! AFAICS this is nothing to do with
boxplot().
In a layout with exactly two rows and columns the base value
of '"cex"' is reduced by a factor of 0.83: if there
are three
or more of either rows or columns, the reduction factor is
0.66.
See also the 'consider the alternatives' in that entry.
On Fri, 6 Jun 2008, Sebastian Merz wrote:
> Hi all!
>
> So far I learned some R but finilizing my plots so they look
> publishable seems not to be possible.
>
> I set up some boxplots. Everything works well but when I put more then
> two of them in one plot the labels of the axes appear smaller than the
> normal font size.
>
>> x <- rnorm(30)
>> y <- rnorm(30)
>> par(mfrow=c(1,4))
>> boxplot(x,y, names=c("horray", "hurra"))
>> mtext("Jubel", side=1, line=2)
>
> In case I take one or two boxplots this does not happen:
>> par(mfrow=c(1,2))
>> boxplot(x,y, names=c("horray", "hurra"))
>> mtext("Jubel", side=1, line=2)
>
> The cex.axis seems not to be changed, as setting it to 1.0 doesn't
> change the behaviour. If cex.axis=1.3 in the first example the font
> size used by boxplot and by mtext is about the same. But as I use a
> function to draw quite some of these plots this "hack" is not a
proper
> solution.
>
> I couldn't find anything about this behaviour in the documention or
> the inet.
>
> Can anybody explain? All hints are appriciated.
>
> Thanks,
> S. Merz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
------------------------------
Message: 72
Date: Fri, 6 Jun 2008 12:28:15 -0600
From: "Greg Snow" <Greg.Snow at imail.org>
Subject: Re: [R] Improving data processing efficiency
To: "Patrick Burns" <pburns at pburns.seanet.com>, "Daniel
Folkinshteyn"
<dfolkins at gmail.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Message-ID:
<B37C0A15B8FB3C468B5BC7EBC7DA14CC60F6858991 at LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset=us-ascii
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick Burns
> Sent: Friday, June 06, 2008 12:04 PM
> To: Daniel Folkinshteyn
> Cc: r-help at r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> That is going to be situation dependent, but if you have a
> reasonable upper bound, then that will be much easier and not
> far from optimal.
>
> If you pick the possibly too small route, then increasing the
> size in largish junks is much better than adding a row at a time.
Pat,
I am unfamiliar with the use of the word "junk" as a unit of measure
for data objects. I figure there are a few different possibilities:
1. You are using the term intentionally meaning that you suggest he increases
the size in terms of old cars and broken pianos rather than used up pens and
broken pencils.
2. This was a Freudian slip based on your opinion of some datasets you have
seen.
3. Somewhere between your mind and the final product "jumps/chunks"
became "junks" (possibly a microsoft "correction", or just
typing too fast combined with number 2).
4. "junks" is an official measure of data/object size that I need to
learn more about (the history of the term possibly being related to 2 and 3
above).
Please let it be #4, I would love to be able to tell some clients that I have
received a junk of data from them.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111
------------------------------
Message: 73
Date: Fri, 6 Jun 2008 14:32:47 -0400
From: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: "Greg Snow" <Greg.Snow at imail.org>
Cc: "r-help at r-project.org" <r-help at r-project.org>, Patrick
Burns
<pburns at pburns.seanet.com>
Message-ID:
<971536df0806061132h1d5dfebeyfca3961152f76ba5 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow at imail.org>
wrote:>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick Burns
>> Sent: Friday, June 06, 2008 12:04 PM
>> To: Daniel Folkinshteyn
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Improving data processing efficiency
>>
>> That is going to be situation dependent, but if you have a
>> reasonable upper bound, then that will be much easier and not
>> far from optimal.
>>
>> If you pick the possibly too small route, then increasing the
>> size in largish junks is much better than adding a row at a time.
>
> Pat,
>
> I am unfamiliar with the use of the word "junk" as a unit of
measure for data objects. I figure there are a few different possibilities:
>
> 1. You are using the term intentionally meaning that you suggest he
increases the size in terms of old cars and broken pianos rather than used up
pens and broken pencils.
>
> 2. This was a Freudian slip based on your opinion of some datasets you have
seen.
>
> 3. Somewhere between your mind and the final product
"jumps/chunks" became "junks" (possibly a microsoft
"correction", or just typing too fast combined with number 2).
>
> 4. "junks" is an official measure of data/object size that I need
to learn more about (the history of the term possibly being related to 2 and 3
above).
>
5. Chinese sailing vessel.
http://en.wikipedia.org/wiki/Junk_(ship)
------------------------------
Message: 74
Date: Fri, 6 Jun 2008 19:38:48 +0200
From: "Bertrand Pub Michel" <michel.bertrand.pub at gmail.com>
Subject: [R] Random Forest
To: r-help at r-project.org
Message-ID:
<2eab3a700806061038o187b384aj342e14547f20ebc7 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hello
Is there exists a package for multivariate random forest, namely for
multivariate response data ? It seems to be impossible with the
"randomForest" function and I did not find any information about this
in the help pages ...
Thank you for your help
Bertrand
------------------------------
Message: 75
Date: Fri, 06 Jun 2008 17:43:27 +0200
From: Marco Chiapello <marco.chiapello at unito.it>
Subject: [R] mean
To: r-help at r-project.org
Message-ID: <1212767007.6257.6.camel at Biochimica2.bioveg.unito.it>
Content-Type: text/plain
Hi,
I have a simple question. If I have a table and I want to have the mean
[[elided Yahoo spam]]
Es:
c1 c2 c3 mean
1 12 13 14 ??
2 15 24 10 ??
...
Thanks,
Marco
------------------------------
Message: 76
Date: Fri, 6 Jun 2008 07:51:21 -0700 (PDT)
From: madhura <madhura.girish at gmail.com>
Subject: Re: [R] Java to R interface
To: r-help at r-project.org
Message-ID:
<3f0c26b2-a978-484b-ae75-2df2476f2ada at m45g2000hsb.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
The path to R/bin is in the Windows PATH variable. Yet I get this
error.
On Jun 6, 10:37?am, "Dumblauskas, Jerry" <jerry.dumblaus... at
credit-
suisse.com> wrote:> Try and make sure that R is in your windows Path variable
>
> I got your message when I first did this, but when I did the about it
> then worked...
>
>
===========================================================================?==>
Please access the attached hyperlink for an important electronic communications
disclaimer:
>
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
===========================================================================?==>
> ? ? ? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-h... at r-project.org mailing
listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 77
Date: Fri, 6 Jun 2008 08:35:51 -0700 (PDT)
From: Evans_CSHL <evans at cshl.edu>
Subject: [R] R (D)COM Server not working on windows domain account
To: r-help at r-project.org
Message-ID: <17695171.post at talk.nabble.com>
Content-Type: text/plain; charset=us-ascii
I have installed R (D)COM on a (windows) machine that is part of a windows
domain. if I run the test file in a local (log into this machine)
administrative account it works fine. If I run the test file on a domain
account with administrative rights it will not connect to the server, even
is I change the account type from roaming to local.
Anyone have any ideas?
Thanks,
Gregg
--
View this message in context:
http://www.nabble.com/R-%28D%29COM-Server-not-working-on-windows-domain-account-tp17695171p17695171.html
Sent from the R help mailing list archive at Nabble.com.
------------------------------
Message: 78
Date: Fri, 6 Jun 2008 19:40:55 +0200
From: "Bertrand Pub Michel" <michel.bertrand.pub at gmail.com>
Subject: [R] Random Forest and for multivariate response data
To: r-help at r-project.org
Message-ID:
<2eab3a700806061040l7febaae5ub7af3be0d622bf82 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hello
Is there exists a package for multivariate random forest, namely for
multivariate response data ? It seems to be impossible with the
"randomForest" function and I did not find any information about this
in the help pages ...
Thank you for your help
Bertrand
------------------------------
Message: 79
Date: Fri, 6 Jun 2008 19:40:55 +0200
From: "Bertrand Pub Michel" <michel.bertrand.pub at gmail.com>
Subject: [R] Random Forest
To: r-help at r-project.org
Message-ID:
<2eab3a700806061040g51003677l9bfaa1b7a4dfa2b2 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hello
Is there exists a package for multivariate random forest, namely for
multivariate response data ? It seems to be impossible with the
"randomForest" function and I did not find any information about this
in the help pages ...
Thank you for your help
Bertrand
------------------------------
Message: 80
Date: Fri, 6 Jun 2008 14:13:40 -0400
From: "steven wilson" <swpt07 at gmail.com>
Subject: [R] R + Linux
To: r-help at r-project.org
Message-ID:
<25944ea00806061113p30b8edcdo49978d401b05465f at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Dear all;
I'm planning to install Linux on my computer to run R (I'm bored of
W..XP). However, I haven't used Linux before and I would appreciate,
if possible, suggestions/comments about what could be the best option
install, say Fedora, Ubuntu or OpenSuse which to my impression are the
most popular ones (at least on the R-help lists). The computer is a PC
desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
used only to run R.
Thanks
Steven
------------------------------
Message: 81
Date: Fri, 6 Jun 2008 12:50:23 -0600
From: "Greg Snow" <Greg.Snow at imail.org>
Subject: Re: [R] Improving data processing efficiency
To: "Gabor Grothendieck" <ggrothendieck at gmail.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>, Patrick
Burns
<pburns at pburns.seanet.com>
Message-ID:
<B37C0A15B8FB3C468B5BC7EBC7DA14CC60F68589AA at LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset=us-ascii
> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Sent: Friday, June 06, 2008 12:33 PM
> To: Greg Snow
> Cc: Patrick Burns; Daniel Folkinshteyn; r-help at r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow at imail.org>
wrote:
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org
> >> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick
Burns
> >> Sent: Friday, June 06, 2008 12:04 PM
> >> To: Daniel Folkinshteyn
> >> Cc: r-help at r-project.org
> >> Subject: Re: [R] Improving data processing efficiency
> >>
> >> That is going to be situation dependent, but if you have a
> reasonable
> >> upper bound, then that will be much easier and not far
> from optimal.
> >>
> >> If you pick the possibly too small route, then increasing
> the size in
> >> largish junks is much better than adding a row at a time.
> >
> > Pat,
> >
> > I am unfamiliar with the use of the word "junk" as a unit
> of measure for data objects. I figure there are a few
> different possibilities:
> >
> > 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >
> > 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >
> > 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >
> > 4. "junks" is an official measure of data/object size that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> >
>
> 5. Chinese sailing vessel.
> http://en.wikipedia.org/wiki/Junk_(ship)
>
Thanks for expanding my vocabulary (hmm, how am I going to use that word in
context today?).
So, if 5 is the case, then Pat's original statement can be reworded as:
"If you pick the possibly too small route, then increasing the size in
largish Chinese sailing vessels is much better than adding a row boat at a
time."
While that is probably true, I am not sure what that would mean in terms of the
original data processing question.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111
------------------------------
Message: 82
Date: Fri, 6 Jun 2008 13:56:39 -0500
From: ctu at bigred.unl.edu
Subject: Re: [R] mean
To: r-help at r-project.org
Message-ID: <20080606135639.r6hbgm31a8w8s4ko at wm-imp-2.unl.edu>
Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes";
format="flowed"
> TABLE<-matrix(data=c(12,13,14,15,24,10),byrow=T,nrow=2,ncol=3)
> TABLE
[,1] [,2] [,3]
[1,] 12 13 14
[2,] 15 24 10> apply(TABLE,1,mean)
[1] 13.00000 16.33333
Chunhao
Quoting Marco Chiapello <marco.chiapello at unito.it>:
> Hi,
> I have a simple question. If I have a table and I want to have the mean
[[elided Yahoo spam]]> Es:
> c1 c2 c3 mean
> 1 12 13 14 ??
> 2 15 24 10 ??
> ...
>
> Thanks,
> Marco
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 83
Date: Fri, 06 Jun 2008 19:58:05 +0100
From: Patrick Burns <pburns at pburns.seanet.com>
Subject: Re: [R] Improving data processing efficiency
To: Gabor Grothendieck <ggrothendieck at gmail.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>, Greg
Snow
<Greg.Snow at imail.org>
Message-ID: <484988BD.4090205 at pburns.seanet.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
My guess is that number 2 is closest to the mark.
Typing too fast is unfortunately not one of my
habitual attributes.
Gabor Grothendieck wrote:> On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow at imail.org>
wrote:
>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick Burns
>>> Sent: Friday, June 06, 2008 12:04 PM
>>> To: Daniel Folkinshteyn
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Improving data processing efficiency
>>>
>>> That is going to be situation dependent, but if you have a
>>> reasonable upper bound, then that will be much easier and not
>>> far from optimal.
>>>
>>> If you pick the possibly too small route, then increasing the
>>> size in largish junks is much better than adding a row at a time.
>>>
>> Pat,
>>
>> I am unfamiliar with the use of the word "junk" as a unit of
measure for data objects. I figure there are a few different possibilities:
>>
>> 1. You are using the term intentionally meaning that you suggest he
increases the size in terms of old cars and broken pianos rather than used up
pens and broken pencils.
>>
>> 2. This was a Freudian slip based on your opinion of some datasets you
have seen.
>>
>> 3. Somewhere between your mind and the final product
"jumps/chunks" became "junks" (possibly a microsoft
"correction", or just typing too fast combined with number 2).
>>
>> 4. "junks" is an official measure of data/object size that I
need to learn more about (the history of the term possibly being related to 2
and 3 above).
>>
>>
>
> 5. Chinese sailing vessel.
> http://en.wikipedia.org/wiki/Junk_(ship)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
------------------------------
Message: 84
Date: Fri, 6 Jun 2008 17:01:01 -0200
From: "Alberto Monteiro" <albmont at centroin.com.br>
Subject: [R] Plot matrix as many lines
To: r-help at r-project.org
Message-ID: <20080606185826.M61869 at centroin.com.br>
Content-Type: text/plain; charset=iso-8859-1
Suppose that I have a matrix like:
m <- rbind(c(1,2,3,4), c(2,3,2,1))
Is there any way to efficiently plot the _lines_ as if
I was doing:
plot(m[1,], type="l")
points(m[2,], type="l", col="red")
(of course, in the "real world" there much more than
just 2 lines and 4 columns...)
Alberto Monteiro
------------------------------
Message: 85
Date: Fri, 06 Jun 2008 15:00:57 -0400
From: Chuck Cleland <ccleland at optonline.net>
Subject: Re: [R] mean
To: Marco Chiapello <marco.chiapello at unito.it>
Cc: r-help at r-project.org
Message-ID: <48498969.8090908 at optonline.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 6/6/2008 11:43 AM, Marco Chiapello wrote:> Hi,
> I have a simple question. If I have a table and I want to have the mean
[[elided Yahoo spam]]> Es:
> c1 c2 c3 mean
> 1 12 13 14 ??
> 2 15 24 10 ??
> ...
>
> Thanks,
> Marco
VADeaths
Rural Male Rural Female Urban Male Urban Female
50-54 11.7 8.7 15.4 8.4
55-59 18.1 11.7 24.3 13.6
60-64 26.9 20.3 37.0 19.3
65-69 41.0 30.9 54.6 35.1
70-74 66.0 54.3 71.1 50.0
rowMeans(VADeaths)
50-54 55-59 60-64 65-69 70-74
11.050 16.925 25.875 40.400 60.350
You could have found rowMeans() with the following:
RSiteSearch("row means", restrict="functions")
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
------------------------------
Message: 86
Date: Fri, 6 Jun 2008 20:06:41 +0100
From: tolga.i.uzuner at jpmorgan.com
Subject: [R] col.names ?
To: r-help at r-project.org
Message-ID:
<OFA4DE8424.7D3A4136-ON80257460.0068DA43-80257460.0068FBB0 at
jpmchase.com>
Content-Type: text/plain
Dear R Users,
A bit of an elementary question, but somehow, I haven't been able to
figure it out. I'd like to changes the column names of a data frame, so I
am looking for something like col.names (as in row.names). Could someone
please show me how to change the column names of a data frame ?
Thanks,
Tolga
Generally, this communication is for informational purposes only
and it is not intended as an offer or solicitation for the purchase
or sale of any financial instrument or as an official confirmation
of any transaction. In the event you are receiving the offering
materials attached below related to your interest in hedge funds or
private equity, this communication may be intended as an offer or
solicitation for the purchase or sale of such fund(s). All market
prices, data and other information are not warranted as to
completeness or accuracy and are subject to change without notice.
Any comments or statements made herein do not necessarily reflect
those of JPMorgan Chase & Co., its subsidiaries and affiliates.
This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.
[[alternative HTML version deleted]]
------------------------------
Message: 87
Date: Fri, 6 Jun 2008 15:11:13 -0400
From: "Thompson, David (MNR)" <David.John.Thompson at
ontario.ca>
Subject: Re: [R] ggplot questions
To: "hadley wickham" <h.wickham at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<ECF21B71808ECF4F8918C57EDBEE121D028C2556 at
CTSPITDCEMMVX11.cihs.ad.gov.on.ca>
Content-Type: text/plain; charset="us-ascii"
OK,
The original ggplot() construct (below) on the following two dataframes
(test1, test2) generate different outputs, which I have attached. The
output that I expect is that shown in test2.png. My expectations are
that I have set the plotting limits with 'scale_x_continuous(lim = c(0,
360)) + scale_y_continuous(lim = c(0, 16))' so, both data sets should
produce the same output except for the 'o' at plot center and the
'N' at
the top. The only difference in the two dataframes are inclusion of
first two rows in test2 with rplt column changed to
character:> test2[1:2,]
oplt rplt az dist
1 0 o 0 0
2 0 N 360 16
Ahhh, wait a second! In composing this message I may have found the
problem.
It appears that including the 'scale_x_continuous()' component twice in
my original version was causing (?) the erratic behaviour. And I have
confirmed that the ordering of the layer, scale* and coord* components
does not affect the output. However, I'm still getting more x-breaks
than requested with radial lines corresponding to 45, 135, 225, 315
degrees (NE, SE, SW, NW). Still open to suggestions on that.
# new version working with both dataframes
ggplot() + coord_polar() +
layer( data = test1, mapping = aes(x = az, y = dist, label = rplt), geom
= "text") +
scale_x_continuous(lim = c(0, 360), breaks=c(90, 180, 270, 360),
labels=c('E', 'S', 'W', 'N')) +
scale_y_continuous(lim = c(0, 16), breaks=c(0, 4, 8, 12, 16),
labels=c('centre', '4m', '8m', '12m',
'16m'))
######
######
######
# original version NOT WORKING with test1
ggplot() + coord_polar() + scale_x_continuous(lim = c(0, 360)) +
scale_y_continuous(lim = c(0, 16)) +
layer( data = test, mapping = aes(x = az, y = dist, label = rplt), geom
= "text") +
scale_x_continuous(breaks=c(90, 180, 270, 360), labels=c('90',
'180',
'270', '360'))
# data generating test1.png
test1 <-structure(list(oplt = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), rplt = 1:10, az = c(57L, 94L, 96L, 152L, 182L, 185L, 227L,
264L, 332L, 354L), dist = c(4.09, 2.8, 7.08, 7.09, 3.28, 7.85,
6.12, 1.97, 7.68, 7.9)), .Names = c("oplt", "rplt",
"az", "dist"
), row.names = c(NA, 10L), class = "data.frame")
# data generating test2.png
test2 <- structure(list(oplt = c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
rplt = c("o", "N", "1", "2",
"3", "4", "5", "6", "7",
"8",
"9", "10"), az = c(0, 360, 57, 94, 96, 152, 182, 185,
227,
264, 332, 354), dist = c(0, 16, 4.09, 2.8, 7.08, 7.09, 3.28,
7.85, 6.12, 1.97, 7.68, 7.9)), .Names = c("oplt",
"rplt",
"az", "dist"), row.names = c(NA, 12L), class =
"data.frame")
Many, many thanks for your patience and perseverance on this one Hadley,
DaveT.
>-----Original Message-----
>From: hadley wickham [mailto:h.wickham at gmail.com]
>Sent: June 6, 2008 02:06 PM
>To: Thompson, David (MNR)
>Cc: r-help at r-project.org
>Subject: Re: [R] ggplot questions
>
>> Does the difference have something to do with ggplot() using ranges
>> derived from the data?
>> When I modify my original 'test' dataframe with two extra rows
as
>> defined below, I get expected results in both versions.
>
>Order shouldn't matter - and if it's making a difference, that's
a
>bug. But I'm still not completely sure what you're expecting.
>
>> This highlights my next question (warned you ;-) ), I have been
>> unsuccessful in trying to define fixed plotting ranges to generate a
>> 'template' graphic that I may reuse with successive
'overstory plot'
>> data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but,
>this seems to
>> not have any effect on the final plot layout.
>
>Could you please produce a small reproducible example that
>demonstrates this? It may well be a bug.
>
>Hadley
>
>--
>http://had.co.nz/
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test1.png
Type: image/png
Size: 9710 bytes
Desc: test1.png
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20080606/02dbba07/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2.png
Type: image/png
Size: 9306 bytes
Desc: test2.png
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20080606/02dbba07/attachment-0003.png>
------------------------------
Message: 88
Date: Fri, 6 Jun 2008 14:14:49 -0500
From: "Douglas Bates" <bates at stat.wisc.edu>
Subject: Re: [R] mean
To: ctu at bigred.unl.edu
Cc: r-help at r-project.org
Message-ID:
<40e66e0b0806061214veef7b96t8b07bae9c9a4d044 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
See also
?rowMeans
On Fri, Jun 6, 2008 at 1:56 PM, <ctu at bigred.unl.edu>
wrote:>> TABLE<-matrix(data=c(12,13,14,15,24,10),byrow=T,nrow=2,ncol=3)
>> TABLE
>
> [,1] [,2] [,3]
> [1,] 12 13 14
> [2,] 15 24 10
>>
>> apply(TABLE,1,mean)
>
> [1] 13.00000 16.33333
>
> Chunhao
>
>
> Quoting Marco Chiapello <marco.chiapello at unito.it>:
>
>> Hi,
>> I have a simple question. If I have a table and I want to have the mean
[[elided Yahoo spam]]>> Es:
>> c1 c2 c3 mean
>> 1 12 13 14 ??
>> 2 15 24 10 ??
>> ...
>>
>> Thanks,
>> Marco
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 89
Date: Fri, 6 Jun 2008 13:14:20 -0600
From: "Greg Snow" <Greg.Snow at imail.org>
Subject: [R] New vocabulary on a Friday afternoon. Was: Improving data
processing efficiency
To: "Patrick Burns" <pburns at pburns.seanet.com>, "Gabor
Grothendieck"
<ggrothendieck at gmail.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Message-ID:
<B37C0A15B8FB3C468B5BC7EBC7DA14CC60F68589C6 at LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset=us-ascii
I still like the number 4 option, so I think we need to come up with a formal
definition for a "junk" of data. I read somewhere that Tukey coined
the word "bit" as it applies to computers, we can share the
credit/blame for "junks" of data.
My proposal for a statistical/data definition of the work junk:
Junk (noun):
A quantity of data just large enough to get the client excited about the
"great" dataset they provided, but not large enough to make any useful
conclusions.
Example sentence: We just received another junk of data from the boss, who gets
to give him the bad news that it still does not prove his pet theory?
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111
> -----Original Message-----
> From: Patrick Burns [mailto:pburns at pburns.seanet.com]
> Sent: Friday, June 06, 2008 12:58 PM
> To: Gabor Grothendieck
> Cc: Greg Snow; r-help at r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> My guess is that number 2 is closest to the mark.
> Typing too fast is unfortunately not one of my habitual attributes.
>
> Gabor Grothendieck wrote:
> > On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow
> <Greg.Snow at imail.org> wrote:
> >
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org
> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick
Burns
> >>> Sent: Friday, June 06, 2008 12:04 PM
> >>> To: Daniel Folkinshteyn
> >>> Cc: r-help at r-project.org
> >>> Subject: Re: [R] Improving data processing efficiency
> >>>
> >>> That is going to be situation dependent, but if you have a
> >>> reasonable upper bound, then that will be much easier and not
far
> >>> from optimal.
> >>>
> >>> If you pick the possibly too small route, then increasing
> the size
> >>> in largish junks is much better than adding a row at a time.
> >>>
> >> Pat,
> >>
> >> I am unfamiliar with the use of the word "junk" as a
unit
> of measure for data objects. I figure there are a few
> different possibilities:
> >>
> >> 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >>
> >> 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >>
> >> 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >>
> >> 4. "junks" is an official measure of data/object size
that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> >>
> >>
> >
> > 5. Chinese sailing vessel.
> > http://en.wikipedia.org/wiki/Junk_(ship)
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
------------------------------
Message: 90
Date: Fri, 6 Jun 2008 14:18:52 -0500
From: "Douglas Bates" <bates at stat.wisc.edu>
Subject: Re: [R] R + Linux
To: "steven wilson" <swpt07 at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<40e66e0b0806061218r71700a56nedb6cc610150fb49 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Jun 6, 2008 at 1:13 PM, steven wilson <swpt07 at gmail.com>
wrote:> Dear all;
> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
> most popular ones (at least on the R-help lists). The computer is a PC
> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
> used only to run R.
Ah yes, we haven't had a good flame war for a long time. Let's start
discussing the relative merits of various Linux distributions. That
should heat things up a bit.
I can only speak about Ubuntu. I have used it exclusively for several
years now and find it to be superb. In my opinion it is easy to
install and maintain and has very good support for R (take a bow,
Dirk).
------------------------------
Message: 91
Date: Fri, 06 Jun 2008 15:25:15 -0400
From: "john.polo" <jpolo at mail.usf.edu>
Subject: [R] editing a data.frame
To: r-help at r-project.org
Message-ID: <48498F1B.5020506 at mail.usf.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
dear R users,
the data frame (read in from a csv) looks like this:
TreeTag Census Stage DBH
1 CW-W740 2001 juvenile 5.8
2 CW-W739 2001 juvenile 4.3
3 CW-W738 2001 juvenile 4.7
4 CW-W737 2001 juvenile 5.4
5 CW-W736 2001 juvenile 7.4
6 CW-W735 2001 juvenile 5.4
...
1501 1.00E-20 2001 adult 32.5
i would like to change values under the TreeTag column. as the last
value shows, some of the tags have decimals followed by 2 decimal
places. i just want whole numbers, i.e. not 1.00E-20, but 1E-20. i have
a rough understanding of regexp and grepped all the positions that have
the inappropriate tags. i tried sub() a couple of different ways, like
yr1bp$TreeTag[1501]<-sub("1.00", "1",
yr1bp$TreeTag[1501])
and after turning yr1bp$TreeTag[1501] into <NA>,
yr1bp$TreeTag[1501]<-sub("", "1E-20",
yr1pb$TreeTag[1501])
and
sub("", "1E-20", yr1bp$TreeTag[1501])
but it's not working. i guess it has something to do with the data.frame
characteristics i'm not aware of or don't understand. would i somehow
have to tear apart the columns, edit them, and then put it back
together? not that i know how to do that, but i'm wondering out loud.
john
------------------------------
Message: 92
Date: Fri, 6 Jun 2008 16:31:11 -0300
From: "Henrique Dallazuanna" <wwwhsd at gmail.com>
Subject: Re: [R] Plot matrix as many lines
To: "Alberto Monteiro" <albmont at centroin.com.br>
Cc: r-help at r-project.org
Message-ID:
<da79af330806061231o5be8199aq63d1bd983b359da7 at mail.gmail.com>
Content-Type: text/plain
Try this:
matplot(t(m), type='l', lty = 'solid', col='black')
On 6/6/08, Alberto Monteiro <albmont at centroin.com.br>
wrote:>
> Suppose that I have a matrix like:
>
> m <- rbind(c(1,2,3,4), c(2,3,2,1))
>
> Is there any way to efficiently plot the _lines_ as if
> I was doing:
>
> plot(m[1,], type="l")
> points(m[2,], type="l", col="red")
>
> (of course, in the "real world" there much more than
> just 2 lines and 4 columns...)
>
> Alberto Monteiro
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Parana-Brasil
250 25' 40" S 490 16' 22" O
[[alternative HTML version deleted]]
------------------------------
Message: 93
Date: Fri, 6 Jun 2008 15:32:29 -0400
From: John Nolan <jpnolan at american.edu>
Subject: [R] calling a C function with a struct
To: r-help at r-project.org
Message-ID:
<OFADEBCB20.A4FB3A39-ON85257460.006B3BBC-85257460.006B7096 at
american.edu>
Content-Type: text/plain
I am trying to call a precompiled C function that uses a struct as one of
it's arguments.
I could write a wrapper function in C, but I was hoping there is some way
to
pack fields into an array of type raw that could be passed directly to the
function.
Here is some more detail. The C struct is simple, but has mixed types:
struct STRUCT1 {
long type;
long nx;
double *x;
double a;
double b;
};
typedef struct STRUCT1 STRUCT1_TYPE;
The C function header is
void func1( long method, STRUCT1 my_struct, double *output);
I would like to have an R list mimicking the C struct,
and then use .C to call func1 with this information, e.g.
my.struct <- list(type=3,nx=5,x=1:5,a=2.5,b=8.3)
my.func1( 3, convert2raw( my.struct ), )
where R function convert2raw would return a vector of type raw with
the fields of my.struct packed into memory just like STRUCT1, and then
I could call func1 with that vector of raws.
Can I write a convert2raw( ) function and then use
my.func1 <- function( method, buf ) {
a <- .C("func1", as.integer(method), as.raw(buf) ,
output=double(1)
)
return(a$output)
}
John
...........................................................................
John P. Nolan
Math/Stat Department
227 Gray Hall
American University
4400 Massachusetts Avenue, NW
Washington, DC 20016-8050
jpnolan at american.edu
202.885.3140 voice
202.885.3155 fax
http://academic2.american.edu/~jpnolan
...........................................................................
[[alternative HTML version deleted]]
------------------------------
Message: 94
Date: Fri, 6 Jun 2008 16:37:25 -0300
From: "Henrique Dallazuanna" <wwwhsd at gmail.com>
Subject: Re: [R] col.names ?
To: "tolga.i.uzuner at jpmorgan.com" <tolga.i.uzuner at
jpmorgan.com>
Cc: r-help at r-project.org
Message-ID:
<da79af330806061237j5578eabfq75da2146ef1b09ca at mail.gmail.com>
Content-Type: text/plain
See ?names
On 6/6/08, tolga.i.uzuner at jpmorgan.com <tolga.i.uzuner at jpmorgan.com>
wrote:>
> Dear R Users,
>
> A bit of an elementary question, but somehow, I haven't been able to
> figure it out. I'd like to changes the column names of a data frame, so
I
> am looking for something like col.names (as in row.names). Could someone
> please show me how to change the column names of a data frame ?
>
> Thanks,
> Tolga
>
> Generally, this communication is for informational purposes only
> and it is not intended as an offer or solicitation for the purchase
> or sale of any financial instrument or as an official confirmation
> of any transaction. In the event you are receiving the offering
> materials attached below related to your interest in hedge funds or
> private equity, this communication may be intended as an offer or
> solicitation for the purchase or sale of such fund(s). All market
> prices, data and other information are not warranted as to
> completeness or accuracy and are subject to change without notice.
> Any comments or statements made herein do not necessarily reflect
> those of JPMorgan Chase & Co., its subsidiaries and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to UK legal entities.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Parana-Brasil
250 25' 40" S 490 16' 22" O
[[alternative HTML version deleted]]
------------------------------
Message: 95
Date: Fri, 06 Jun 2008 15:41:08 -0400
From: Chuck Cleland <ccleland at optonline.net>
Subject: Re: [R] Plot matrix as many lines
To: Alberto Monteiro <albmont at centroin.com.br>
Cc: r-help at r-project.org
Message-ID: <484992D4.3040903 at optonline.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 6/6/2008 3:01 PM, Alberto Monteiro wrote:> Suppose that I have a matrix like:
>
> m <- rbind(c(1,2,3,4), c(2,3,2,1))
>
> Is there any way to efficiently plot the _lines_ as if
> I was doing:
>
> plot(m[1,], type="l")
> points(m[2,], type="l", col="red")
>
> (of course, in the "real world" there much more than
> just 2 lines and 4 columns...)
m <- rbind(c(1,2,3,4), c(2,3,2,1))
matplot(t(m), type="l")
?matplot
> Alberto Monteiro
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
------------------------------
Message: 96
Date: Fri, 06 Jun 2008 15:46:47 -0400
From: Chuck Cleland <ccleland at optonline.net>
Subject: Re: [R] col.names ?
To: tolga.i.uzuner at jpmorgan.com
Cc: r-help at r-project.org
Message-ID: <48499427.2060108 at optonline.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 6/6/2008 3:06 PM, tolga.i.uzuner at jpmorgan.com
wrote:> Dear R Users,
>
> A bit of an elementary question, but somehow, I haven't been able to
> figure it out. I'd like to changes the column names of a data frame, so
I
> am looking for something like col.names (as in row.names). Could someone
> please show me how to change the column names of a data frame ?
>
> Thanks,
> Tolga
my.iris <- iris
names(my.iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length"
"Petal.Width"
"Species"
names(my.iris) <- c("Length.Sepal", "Width.Sepal",
"Length.Petal",
"Width.Petal", "Species")
names(my.iris)
[1] "Length.Sepal" "Width.Sepal" "Length.Petal"
"Width.Petal" "Species"
> Generally, this communication is for informational purposes only
> and it is not intended as an offer or solicitation for the purchase
> or sale of any financial instrument or as an official confirmation
> of any transaction. In the event you are receiving the offering
> materials attached below related to your interest in hedge funds or
> private equity, this communication may be intended as an offer or
> solicitation for the purchase or sale of such fund(s). All market
> prices, data and other information are not warranted as to
> completeness or accuracy and are subject to change without notice.
> Any comments or statements made herein do not necessarily reflect
> those of JPMorgan Chase & Co., its subsidiaries and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to UK legal entities.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894
------------------------------
Message: 97
Date: Fri, 6 Jun 2008 15:52:19 -0400
From: William Pepe <williampepe at hotmail.com>
Subject: Re: [R] col.names ?
To: <tolga.i.uzuner at jpmorgan.com>, <r-help at r-project.org>
Message-ID: <BAY101-W486E51E4FA67FD4847E4D3B4B70 at phx.gbl>
Content-Type: text/plain
As a very simple example:
TolgaData<-data.frame(A=c(1,2),B=c(3,4))names(TolgaData )<- c(
"newA", "newB" )> TolgaData Column names should now be
newA and newB Best, Bill> To: r-help at r-project.org> From:
tolga.i.uzuner at jpmorgan.com> Date: Fri, 6 Jun 2008 20:06:41 +0100>
Subject: [R] col.names ?> > Dear R Users,> > A bit of an elementary
question, but somehow, I haven't been able to > figure it out. I'd
like to changes the column names of a data frame, so I > am looking for
something like col.names (as in row.names). Could someone > please show me
how to change the column names of a data frame ?> > Thanks,> Tolga>
> Generally, this communication is for informational purposes only> and it
is not intended as an offer or solicitation for the purchase> or sale of any
financial instrument or as an official confirmation> of any transaction. In
the event you are receiving the offering> materials attached below related to
your interest in hedge funds or> private equity, this communica!
tion may be intended as an offer or> solicitation for the purchase or sale
of such fund(s). All market> prices, data and other information are not
warranted as to> completeness or accuracy and are subject to change without
notice.> Any comments or statements made herein do not necessarily
reflect> those of JPMorgan Chase & Co., its subsidiaries and
affiliates.> > This transmission may contain information that is
privileged,> confidential, legally privileged, and/or exempt from
disclosure> under applicable law. If you are not the intended recipient,
you> are hereby notified that any disclosure, copying, distribution, or>
use of the information contained herein (including any reliance> thereon) is
STRICTLY PROHIBITED. Although this transmission and any> attachments are
believed to be free of any virus or other defect> that might affect any
computer system into which it is received and> opened, it is the
responsibility of the recipient to ensure that it> is virus free and no r!
esponsibility is accepted by JPMorgan Chase &> Co., its subsidiaries a
nd affiliates, as applicable, for any loss> or damage arising in any way from
its use. If you received this> transmission in error, please immediately
contact the sender and> destroy the material in its entirety, whether in
electronic or hard> copy format. Thank you.> Please refer to
http://www.jpmorgan.com/pages/disclosures for> disclosures relating to UK
legal entities.> [[alternative HTML version deleted]]> >
______________________________________________> R-help at r-project.org
mailing list> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read
the posting guide http://www.R-project.org/posting-guide.html> and provide
commented, minimal, self-contained, reproducible code.
_________________________________________________________________
sh_skydrive_062008
[[alternative HTML version deleted]]
------------------------------
Message: 98
Date: Fri, 6 Jun 2008 20:54:05 +0100
From: tolga.i.uzuner at jpmorgan.com
Subject: Re: [R] col.names ?
To: William Pepe <williampepe at hotmail.com>
Cc: r-help at r-project.org, tolga.i.uzuner at jpmorgan.com
Message-ID:
<OF1B3AFA12.6E9DEF33-ON80257460.006D4E42-80257460.006D52AB at
jpmchase.com>
Content-Type: text/plain
Many thanks to everyone who replied,
Tolga
William Pepe <williampepe at hotmail.com>
06/06/2008 20:52
To
<tolga.i.uzuner at jpmorgan.com>, <r-help at r-project.org>
cc
Subject
RE: [R] col.names ?
As a very simple example:
TolgaData<-data.frame(A=c(1,2),B=c(3,4))
names(TolgaData )<- c( "newA", "newB" )
> TolgaData
Column names should now be newA and newB
Best,
Bill
> To: r-help at r-project.org
> From: tolga.i.uzuner at jpmorgan.com
> Date: Fri, 6 Jun 2008 20:06:41 +0100
> Subject: [R] col.names ?
>
> Dear R Users,
>
> A bit of an elementary question, but somehow, I haven't been able to
> figure it out. I'd like to changes the column names of a data frame, so
I> am looking for something like col.names (as in row.names). Could someone
> please show me how to change the column names of a data frame ?
>
> Thanks,
> Tolga
>
> Generally, this communication is for informational purposes only
> and it is not intended as an offer or solicitation for the purchase
> or sale of any financial instrument or as an official confirmation
> of any transaction. In the event you are receiving the offering
> materials attached below related to your interest in hedge funds or
> private equity, this communication may be intended as an offer or
> solicitation for the purchase or sale of such fund(s). All market
> prices, data and other information are not warranted as to
> completeness or accuracy and are subject to change without notice.
> Any comments or statements made herein do not necessarily reflect
> those of JPMorgan Chase & Co., its subsidiaries and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to UK legal entities.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
SkyDrive.
Generally, this communication is for informational purposes only
and it is not intended as an offer or solicitation for the purchase
or sale of any financial instrument or as an official confirmation
of any transaction. In the event you are receiving the offering
materials attached below related to your interest in hedge funds or
private equity, this communication may be intended as an offer or
solicitation for the purchase or sale of such fund(s). All market
prices, data and other information are not warranted as to
completeness or accuracy and are subject to change without notice.
Any comments or statements made herein do not necessarily reflect
those of JPMorgan Chase & Co., its subsidiaries and affiliates.
This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.
Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to UK legal entities.
[[alternative HTML version deleted]]
------------------------------
Message: 99
Date: Fri, 6 Jun 2008 15:57:37 -0400
From: "Jorge Ivan Velez" <jorgeivanvelez at gmail.com>
Subject: Re: [R] Subsetting to unique values
To: "Emslie, Paul [Ctr]" <emsliep at atac.mil>
Cc: r-help at r-project.org
Message-ID:
<317737de0806061257n197e2f1aq871cac79945737ca at mail.gmail.com>
Content-Type: text/plain
Dear Paul,
Try also:
ddTable <-
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
ddTable[unique(ddTable$Id),]
HTH,
Jorge
On Fri, Jun 6, 2008 at 9:35 AM, Emslie, Paul [Ctr] <emsliep at atac.mil>
wrote:
> I want to take the first row of each unique ID value from a data frame.
> For instance
> > ddTable <-
>
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
------------------------------
Message: 100
Date: Fri, 06 Jun 2008 16:04:24 -0400
From: Duncan Murdoch <murdoch at stats.uwo.ca>
Subject: Re: [R] calling a C function with a struct
To: John Nolan <jpnolan at american.edu>
Cc: r-help at r-project.org
Message-ID: <48499848.8010100 at stats.uwo.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
John Nolan wrote:> I am trying to call a precompiled C function that uses a struct as one of
> it's arguments.
> I could write a wrapper function in C, but I was hoping there is some way
> to
> pack fields into an array of type raw that could be passed directly to the
> function.
>
> Here is some more detail. The C struct is simple, but has mixed types:
>
> struct STRUCT1 {
> long type;
> long nx;
> double *x;
> double a;
> double b;
> };
> typedef struct STRUCT1 STRUCT1_TYPE;
>
> The C function header is
>
> void func1( long method, STRUCT1 my_struct, double *output);
>
> I would like to have an R list mimicking the C struct,
> and then use .C to call func1 with this information, e.g.
>
> my.struct <- list(type=3,nx=5,x=1:5,a=2.5,b=8.3)
> my.func1( 3, convert2raw( my.struct ), )
>
It might be possible, but it would be quite tricky, and I'd guess the
"double *x" would be just about impossible. R has no way to see C
level
pointers.
Just write the wrapper in C, it will be easier than writing one in R.
Duncan Murdoch
> where R function convert2raw would return a vector of type raw with
> the fields of my.struct packed into memory just like STRUCT1, and then
> I could call func1 with that vector of raws.
>
> Can I write a convert2raw( ) function and then use
> my.func1 <- function( method, buf ) {
> a <- .C("func1", as.integer(method), as.raw(buf) ,
output=double(1)
> )
> return(a$output)
> }
>
>
> John
>
> ...........................................................................
>
> John P. Nolan
> Math/Stat Department
> 227 Gray Hall
> American University
> 4400 Massachusetts Avenue, NW
> Washington, DC 20016-8050
>
> jpnolan at american.edu
> 202.885.3140 voice
> 202.885.3155 fax
> http://academic2.american.edu/~jpnolan
>
> ...........................................................................
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 101
Date: Fri, 06 Jun 2008 16:18:23 -0400
From: "Kevin E. Thorpe" <kevin.thorpe at utoronto.ca>
Subject: Re: [R] R + Linux
To: Douglas Bates <bates at stat.wisc.edu>
Cc: r-help at r-project.org, steven wilson <swpt07 at gmail.com>
Message-ID: <48499B8F.10808 at utoronto.ca>
Content-Type: text/plain; charset=ISO-8859-1
Any of the three distros mentioned are sure to be fine.
Personally, I find the sysadmin tool in opensuse to be
fantastic for a novice.
It comes down to preference. Try some live versions of the distros to
see what you like best.
Douglas Bates wrote:> On Fri, Jun 6, 2008 at 1:13 PM, steven wilson <swpt07 at gmail.com>
wrote:
>> Dear all;
>
>> I'm planning to install Linux on my computer to run R (I'm
bored of
>> W..XP). However, I haven't used Linux before and I would
appreciate,
>> if possible, suggestions/comments about what could be the best option
>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>> most popular ones (at least on the R-help lists). The computer is a PC
>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
>> used only to run R.
>
> Ah yes, we haven't had a good flame war for a long time. Let's
start
> discussing the relative merits of various Linux distributions. That
> should heat things up a bit.
>
> I can only speak about Ubuntu. I have used it exclusively for several
> years now and find it to be superb. In my opinion it is easy to
> install and maintain and has very good support for R (take a bow,
> Dirk).
>
--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
------------------------------
Message: 102
Date: Fri, 06 Jun 2008 16:36:34 -0400
From: Markus J?ntti <mjantti at abo.fi>
Subject: Re: [R] R + Linux
To: steven wilson <swpt07 at gmail.com>
Cc: r-help at r-project.org
Message-ID: <1212784594.7023.61.camel at hades>
Content-Type: text/plain
I have both Debian, Ubuntu, RedHat and CentOS systems, and primary run R
on the Debian and RedHat machines. I have encountered few problems
running R on RedHat/CentOS, but I do think the Debian/Ubuntu package
management system, combined with the kind provision of packages, makes
life a lot simpler. (Yes, many thanks to Dirk!).
Also, the ease of installing and maintaining among with the highly
useful user forums of Ubuntu would lead me to recommend that particular
distribution.
Regards,
Markus
On Fri, 2008-06-06 at 14:13 -0400, steven wilson wrote:> Dear all;
>
> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
> most popular ones (at least on the R-help lists). The computer is a PC
> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
> used only to run R.
>
> Thanks
> Steven
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Markus Jantti
Abo Akademi University
mjantti at abo.fi
http://www.iki.fi/~mjantti
------------------------------
Message: 103
Date: Fri, 06 Jun 2008 16:37:28 -0400
From: Michael Friendly <friendly at yorku.ca>
Subject: [R] color scale mapped to B/W
To: R-Help <r-help at stat.math.ethz.ch>
Message-ID: <4849A008.2000801 at yorku.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
In an R graphic, I'm using
cond.col <- c("green", "yellow", "red")
to represent a quantitative variable, where green means 'OK', yellow
represents 'warning'
and red represents 'danger'. Using these particular color names, in B/W,
red is darkest
and yellow is lightest. I'd like to find color designations to replace
yellow and green so
that when printed in B/W, the yellowish color appears darker than the
greenish one.
Is there some tool/code I can use to find these? i.e., something to
display a grid
of color swatches with color codes/names I can look at in color and B/W
to decide?
t hanks,
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
------------------------------
Message: 104
Date: Fri, 6 Jun 2008 13:45:11 -0700 (PDT)
From: Yasir Kaheil <kaheil at gmail.com>
Subject: Re: [R] Random Forest
To: r-help at r-project.org
Message-ID: <17700817.post at talk.nabble.com>
Content-Type: text/plain; charset=us-ascii
hi there:
please refer to:
http://www.math.usu.edu/~adele/forests/cc_home.htm
and
http://www.math.usu.edu/~minnotte/S5600S07/R17.txt
thanks
BertrandM wrote:>
> Hello
>
> Is there exists a package for multivariate random forest, namely for
> multivariate response data ? It seems to be impossible with the
> "randomForest" function and I did not find any information about
this
> in the help pages ...
> Thank you for your help
>
> Bertrand
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
-----
Yasir H. Kaheil
Catchment Research Facility
The University of Western Ontario
--
View this message in context:
http://www.nabble.com/Random-Forest-tp17698842p17700817.html
Sent from the R help mailing list archive at Nabble.com.
------------------------------
Message: 105
Date: Fri, 06 Jun 2008 16:49:24 -0400
From: Roland Rau <roland.rproject at gmail.com>
Subject: Re: [R] R + Linux
To: steven wilson <swpt07 at gmail.com>
Cc: r-help at r-project.org
Message-ID: <4849A2D4.5090206 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear all,
a related follow up -- with the hope for some feedback from the specialists.
Is the following general advice justified:
========================================================If one has not more than
4GB RAM and one wants to run primarily R on
one's Linux machine, it is a good idea to install the 32bit version of
the operating system.
The reasons are:
The machine has 4GB RAM which implies that the 32bit version can
(theoretically) use the whole available memory address space. The
advantage of addressing more memory using 64bit is in this instance of a
4GB computer lost. Furthermore, 64bit often runs slower than 32bit (see
Section 8 of R Admin Manual) due to the larger pointer size.
========================================================
Thanks,
Roland
steven wilson wrote:> Dear all;
>
> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
> most popular ones (at least on the R-help lists). The computer is a PC
> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
> used only to run R.
>
> Thanks
> Steven
>
------------------------------
Message: 106
Date: Fri, 6 Jun 2008 15:52:40 -0500
From: Dirk Eddelbuettel <edd at debian.org>
Subject: Re: [R] R + Linux
To: "Kevin E. Thorpe" <kevin.thorpe at utoronto.ca>
Cc: r-help at r-project.org, Douglas Bates <bates at stat.wisc.edu>,
steven
wilson <swpt07 at gmail.com>
Message-ID: <18505.41880.598299.374536 at ron.nulle.part>
Content-Type: text/plain; charset=us-ascii
On 6 June 2008 at 16:18, Kevin E. Thorpe wrote:
| Any of the three distros mentioned are sure to be fine.
| Personally, I find the sysadmin tool in opensuse to be
| fantastic for a novice.
|
| It comes down to preference. Try some live versions of the distros to
| see what you like best.
While that is certainly true, there is a difference that doesn't get
mentioned as much.
On Debian + Ubuntu you also get numerous add-ons and extension that the other
distros may not have such as
- 60+ packages from CRAN already in the distro
- the ESS emacs add-on
- out-of-the box RPy (R/Python) support
- Ggobi and rggobi for visualization
- Rkward as a friendly GUI
- bindings from R to Shogun for data-mining
- littler for scripting
- out-of-the box Rmpi / Open MPI support
and a few things I am probably forgetting.
As you say, preferences. The ease of installation with Ubuntu (and also with
the more recent Debian installers) coupled with the richer set of packages
tilts this at least in my book. But that's just my $0.02.
Dirk
--
Three out of two people have difficulties with fractions.
------------------------------
Message: 107
Date: Fri, 6 Jun 2008 22:09:10 +0100 (BST)
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Subject: Re: [R] R + Linux
To: Roland Rau <roland.rproject at gmail.com>
Cc: r-help at r-project.org, steven wilson <swpt07 at gmail.com>
Message-ID:
<alpine.LFD.1.10.0806062159020.18018 at gannet.stats.ox.ac.uk>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
This is not sound advice. For 1GB yes, perhaps 2GB. Beyond that the
extra freedom in the address space of a 64-bit system pays off.
The user address space of a 32-bit Linux system is (in the examples I have
seen) 3 to 3.5Gb. See ?"Memory-limits" for why that is restrictive.
There are some anomalies, depending on the CPU. On Intel Core 2 Duos
manipulating 64-bit pointers seems to be as efficient as 32-bit ones and
on some platforms (e.g. Mac OS 10.5.3) 64-bit is actually faster than
32-bit R. So very similar CPUs can give quite different performance
differences with 32- vs 64-bit R.
On Fri, 6 Jun 2008, Roland Rau wrote:
> Dear all,
>
> a related follow up -- with the hope for some feedback from the
specialists.
> Is the following general advice justified:
> ========================================================> If one has not
more than 4GB RAM and one wants to run primarily R on one's
> Linux machine, it is a good idea to install the 32bit version of the
> operating system.
> The reasons are:
> The machine has 4GB RAM which implies that the 32bit version can
> (theoretically) use the whole available memory address space. The advantage
> of addressing more memory using 64bit is in this instance of a 4GB computer
> lost. Furthermore, 64bit often runs slower than 32bit (see Section 8 of R
> Admin Manual) due to the larger pointer size.
> ========================================================>
> Thanks,
> Roland
>
>
> steven wilson wrote:
>> Dear all;
>>
>> I'm planning to install Linux on my computer to run R (I'm
bored of
>> W..XP). However, I haven't used Linux before and I would
appreciate,
>> if possible, suggestions/comments about what could be the best option
>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>> most popular ones (at least on the R-help lists). The computer is a PC
>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
>> used only to run R.
>>
>> Thanks
>> Steven
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
------------------------------
Message: 108
Date: Fri, 06 Jun 2008 17:11:00 -0400
From: Esmail Bonakdarian <esmail.js at gmail.com>
Subject: Re: [R] R + Linux
To: steven wilson <swpt07 at gmail.com>
Cc: r-help at r-project.org
Message-ID: <4849A7E4.5040006 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
steven wilson wrote:>
> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install,
Hi,
I have used Linux since the early 1990s starting with the original
slackware distribution, followed by various versions of Red Hat,
Gentoo (compiled from source), Fedora and now Ubuntu.
Ubuntu is my choice for having the least troublesome install and
maintenance. It has a very nice package manager, and if your goal
is to *use* a Linux system rather than tinker with it, you could
do much worse than Ubuntu.
I installed R via the package manger a month ago or so, very easy
and trouble free.
Hope that helps,
Esmail
------------------------------
Message: 109
Date: Fri, 06 Jun 2008 17:22:01 -0400
From: Abhijit Dasgupta <adasgupt at mail.jci.tju.edu>
Subject: Re: [R] R + Linux
To: Markus J?ntti <mjantti at abo.fi>
Cc: r-help at r-project.org, steven wilson <swpt07 at gmail.com>
Message-ID: <4849AA79.6040507 at mail.jci.tju.edu>
Content-Type: text/plain
I've had R on an Ubuntu system for about 18 months now, and getting R
up and running was a breeze. (I didn't realize it earlier, but Dirk
certainly gets my vote of thanks for his efforts in making this process
as easy as it is). Specially in terms of dependencies and the like, the
Ubuntu packaging system has made things specially easy. I've also had
the experience of installing R on a RedHat Enterprise System on a new
server at university, and the dependencies issues was much more
problematic (albeit, I wasn't allowed to use yum because of the way our
IT people had set it up), specially at the compiler level. Just my
limited experience in this area. In any case, I'm not going back to
Windows now if not forced; I've been quite happy with my experience in
the Linux world.
Abhijit
Markus Jdntti wrote:> I have both Debian, Ubuntu, RedHat and CentOS systems, and primary run R
> on the Debian and RedHat machines. I have encountered few problems
> running R on RedHat/CentOS, but I do think the Debian/Ubuntu package
> management system, combined with the kind provision of packages, makes
> life a lot simpler. (Yes, many thanks to Dirk!).
>
> Also, the ease of installing and maintaining among with the highly
> useful user forums of Ubuntu would lead me to recommend that particular
> distribution.
>
> Regards,
>
> Markus
>
> On Fri, 2008-06-06 at 14:13 -0400, steven wilson wrote:
>
>> Dear all;
>>
>> I'm planning to install Linux on my computer to run R (I'm
bored of
>> W..XP). However, I haven't used Linux before and I would
appreciate,
>> if possible, suggestions/comments about what could be the best option
>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>> most popular ones (at least on the R-help lists). The computer is a PC
>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
>> used only to run R.
>>
>> Thanks
>> Steven
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
------------------------------
Message: 110
Date: Fri, 06 Jun 2008 17:31:53 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] editing a data.frame
To: "john.polo" <jpolo at mail.usf.edu>
Cc: r-help at r-project.org
Message-ID: <4849ACC9.7050304 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
works for me:
> sub('1.00', '1', '1.00E-20')
[1] "1E-20"
remember, according to ?sub, it's sub(pattern, repl, string)
try doing it step by step. first, see what yr1bp$TreeTag[1501] is.
then, if it's the right data item, see what the output of
sub("1.00",
"1", yr1bp$TreeTag[1501]) is.
that'll let you figure out where the problem lies.
finally, if all your target strings are of the form 1.00E-20, you could
sub the whole thing with a more general regexp:
sub("([0-9])(\.[0-9]{2})(.*)", "\\1\\3", yourvector)
(it matches a digit, followed by a dot and two digits, followed by
"anything else", and takes out the "dot and two digits" bit
in the
replacement, in the whole vector.)
on 06/06/2008 03:25 PM john.polo said the following:> dear R users,
>
> the data frame (read in from a csv) looks like this:
> TreeTag Census Stage DBH
> 1 CW-W740 2001 juvenile 5.8
> 2 CW-W739 2001 juvenile 4.3
> 3 CW-W738 2001 juvenile 4.7
> 4 CW-W737 2001 juvenile 5.4
> 5 CW-W736 2001 juvenile 7.4
> 6 CW-W735 2001 juvenile 5.4
> ...
> 1501 1.00E-20 2001 adult 32.5
>
> i would like to change values under the TreeTag column. as the last
> value shows, some of the tags have decimals followed by 2 decimal
> places. i just want whole numbers, i.e. not 1.00E-20, but 1E-20. i have
> a rough understanding of regexp and grepped all the positions that have
> the inappropriate tags. i tried sub() a couple of different ways, like
> yr1bp$TreeTag[1501]<-sub("1.00", "1",
yr1bp$TreeTag[1501])
> and after turning yr1bp$TreeTag[1501] into <NA>,
> yr1bp$TreeTag[1501]<-sub("", "1E-20",
yr1pb$TreeTag[1501])
> and
> sub("", "1E-20", yr1bp$TreeTag[1501])
> but it's not working. i guess it has something to do with the
data.frame
> characteristics i'm not aware of or don't understand. would i
somehow
> have to tear apart the columns, edit them, and then put it back
> together? not that i know how to do that, but i'm wondering out loud.
>
> john
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------
Message: 111
Date: Fri, 06 Jun 2008 17:34:07 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] R + Linux
To: adasgupt at mail.jci.tju.edu
Cc: Markus J?ntti <mjantti at abo.fi>, r-help at r-project.org, steven
wilson <swpt07 at gmail.com>
Message-ID: <4849AD4F.8040409 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
another vote for ubuntu here - works for me, and quite trouble-free. add
the r-project repositories, and you're sure to always have the latest,
too. (if you don't care for the latest R, you can of course also just
get R from the distro's repos as well)
on 06/06/2008 05:22 PM Abhijit Dasgupta said the
following:> I've had R on an Ubuntu system for about 18 months now, and getting R
> up and running was a breeze. (I didn't realize it earlier, but Dirk
> certainly gets my vote of thanks for his efforts in making this process
> as easy as it is). Specially in terms of dependencies and the like, the
> Ubuntu packaging system has made things specially easy. I've also had
> the experience of installing R on a RedHat Enterprise System on a new
> server at university, and the dependencies issues was much more
> problematic (albeit, I wasn't allowed to use yum because of the way our
> IT people had set it up), specially at the compiler level. Just my
> limited experience in this area. In any case, I'm not going back to
> Windows now if not forced; I've been quite happy with my experience in
> the Linux world.
>
> Abhijit
>
> Markus J?ntti wrote:
>> I have both Debian, Ubuntu, RedHat and CentOS systems, and primary run
R
>> on the Debian and RedHat machines. I have encountered few problems
>> running R on RedHat/CentOS, but I do think the Debian/Ubuntu package
>> management system, combined with the kind provision of packages, makes
>> life a lot simpler. (Yes, many thanks to Dirk!).
>>
>> Also, the ease of installing and maintaining among with the highly
>> useful user forums of Ubuntu would lead me to recommend that particular
>> distribution.
>>
>> Regards,
>>
>> Markus
>>
>> On Fri, 2008-06-06 at 14:13 -0400, steven wilson wrote:
>>
>>> Dear all;
>>>
>>> I'm planning to install Linux on my computer to run R (I'm
bored of
>>> W..XP). However, I haven't used Linux before and I would
appreciate,
>>> if possible, suggestions/comments about what could be the best
option
>>> install, say Fedora, Ubuntu or OpenSuse which to my impression are
the
>>> most popular ones (at least on the R-help lists). The computer is a
PC
>>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will
be
>>> used only to run R.
>>>
>>> Thanks
>>> Steven
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 112
Date: Fri, 6 Jun 2008 17:55:17 -0400
From: Jonathan Baron <baron at psych.upenn.edu>
Subject: Re: [R] R + Linux
To: Daniel Folkinshteyn <dfolkins at gmail.com>
Cc: Markus J?ntti <mjantti at abo.fi>, r-help at r-project.org, steven
wilson <swpt07 at gmail.com>
Message-ID: <20080606215517.GA4326 at psych.upenn.edu>
Content-Type: text/plain; charset=us-ascii
R works just fine on Fedora 9.
------------------------------
Message: 113
Date: Fri, 06 Jun 2008 18:10:25 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Patrick Burns <pburns at pburns.seanet.com>
Cc: r-help at r-project.org
Message-ID: <4849B5D1.4070202 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hmm... ok... so i ran the code twice - once with a preallocated result,
assigning rows to it, and once with a nrow=0 result, rbinding rows to
it, for the first 20 quarters. There was no speedup. In fact, running
with a preallocated result matrix was slower than rbinding to the matrix:
for preallocated matrix:
Time difference of 1.577779 mins
for rbinding:
Time difference of 1.498628 mins
(the time difference only counts from the start of the loop til the end,
so the time to allocate the empty matrix was /not/ included in the time
count).
So, it appears that rbinding a matrix is not the bottleneck. (That it
was actually faster than assigning rows could have been a random anomaly
(e.g. some other process eating a bit of cpu during the run?), or not -
at any rate, it doesn't make an /appreciable/ difference.
Any other suggestions? :)
on 06/06/2008 02:03 PM Patrick Burns said the following:> That is going to be situation dependent, but if you
> have a reasonable upper bound, then that will be
> much easier and not far from optimal.
>
> If you pick the possibly too small route, then increasing
> the size in largish junks is much better than adding
> a row at a time.
>
> Pat
>
> Daniel Folkinshteyn wrote:
>> thanks for the tip! i'll try that and see how big of a difference
that
>> makes... if i am not sure what exactly the size will be, am i better
>> off making it larger, and then later stripping off the blank rows, or
>> making it smaller, and appending the missing rows?
>>
>> on 06/06/2008 11:44 AM Patrick Burns said the following:
>>> One thing that is likely to speed the code significantly
>>> is if you create 'result' to be its final size and then
>>> subscript into it. Something like:
>>>
>>> result[i, ] <- bestpeer
>>>
>>> (though I'm not sure if 'i' is the proper index).
>>>
>>> Patrick Burns
>>> patrick at burns-stat.com
>>> +44 (0)20 8525 0696
>>> http://www.burns-stat.com
>>> (home of S Poetry and "A Guide for the Unwilling S User")
>>>
>>> Daniel Folkinshteyn wrote:
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly
>>>>> institutional ownership of equities; some of them have had
recent
>>>>> IPOs, some have not (I have a binary flag set). The total
dataset
>>>>> size is 700k+ rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each
IPO, I need
>>>>> to find a "matched" firm in the same industry,
and close in market
>>>>> cap. So, e.g., for firm X, which had an IPO, i need to find
a
>>>>> matched non-issuing firm in quarter 1 since IPO, then a
(possibly
>>>>> different) non-issuing firm in quarter 2 since IPO, etc.
Repeat for
>>>>> each issuing firm (there are about 8300 of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data
selection
>>>>> and subsetting, and looping (yikes!), but the result
appears to be
>>>>> highly inefficient and takes ages (well, many hours). What
I am
>>>>> doing, in pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and
all the
>>>>> eligible non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in
the same
>>>>> industry, sort them by size, and finally grab a matching
firm
>>>>> closest in size (the exact procedure is to grab the closest
bigger
>>>>> firm if one exists, and just the biggest available if all
are smaller)
>>>>> 3. assign the matched firm-observation the same
"quarters since
>>>>> issue" as the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your
reference.
>>>>> Is there any way to make it produce the same result but
much
>>>>> faster? Specifically, I am guessing eliminating some loops
would be
>>>>> very good, but I don't see how, since I need to do some
fancy
>>>>> footwork for each IPO in each quarter to find the matching
firm.
>>>>> I'll be doing a few things similar to this, so it's
somewhat
>>>>> important to up the efficiency of this. Maybe some of you
R-fu
>>>>> masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name
it! :)
>>>>>
>>>>> ========== my function below ==========>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue
>>>>> function(tfdata, quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for
matrix
>>>>> is cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue >
quarters_since_issue)
>>>>> & (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers =
tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if (
>>>>> nrow(industrypeers[industrypeers$Market.Cap.13f
>>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer >>>>>
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f,
>>>>> ][1,]
>>>>> }
>>>>> else {
>>>>> bestpeer =
industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>>
#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function
============>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
------------------------------
Message: 114
Date: Fri, 06 Jun 2008 18:15:23 -0400
From: Esmail Bonakdarian <esmail.js at gmail.com>
Subject: Re: [R] R + Linux
To: steven wilson <swpt07 at gmail.com>
Cc: r-help at r-project.org
Message-ID: <4849B6FB.2020902 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
FWIW, those who are curious about Linux but are not willing
or ready to abandon the Windows platform can now very easily
try out Ubuntu without having to repartition their hard drive.
Wubi is a project that installs Ubuntu under Windows so that it
can be uninstalled easily and requires no messing around with
hard drive partitions.
From the Wubi web site:
"Wubi is an officially supported Ubuntu installer for Windows users that
can
bring you to the Linux world with a single click. Wubi allows you to install
and uninstall Ubuntu as any other Windows application, in a simple and safe
way. Are you curious about Linux and Ubuntu? Trying them out has never been
easier!"
For more information see:
http://wubi-installer.org/
Esmail
------------------------------
Message: 115
Date: Fri, 6 Jun 2008 15:19:55 -0700
From: Horace Tso <Horace.Tso at pgn.com>
Subject: [R] FW: R + Linux
To: "r-help at r-project.org" <r-help at r-project.org>
Message-ID:
<D49782AAF0ACCD4B836AA8D7D40BF417C30FEDE68F at APEXMAIL.corp.dom>
Content-Type: text/plain; charset="us-ascii"
I'll add my $0.02 as I've just gone thru a (painful) transition to
Linux. In my case Ubuntu didn't quite work for reason I'm still not sure
(must be hardware + driver issue). I eventually put on opensuse 10.3 and
installed R in an rpm pkgage on the command line. Getting R in was not simple. I
got errors that complain about not finding BLAS and a couple other things I
forgot. And you can't install packages by install.packages() on the prompt.
I had to download them in tar.gz and then install.
However, once the priliminaries are out of the way it seems to work just fine. I
have rkward because Tinn-R is not available on linux and that works just fine,
so far.
H
(Sorry folks.....it's Friday aftnoon....)
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of steven wilson
Sent: Friday, June 06, 2008 11:14 AM
To: r-help at r-project.org
Subject: [R] R + Linux
Dear all;
I'm planning to install Linux on my computer to run R (I'm bored of
W..XP). However, I haven't used Linux before and I would appreciate,
if possible, suggestions/comments about what could be the best option
install, say Fedora, Ubuntu or OpenSuse which to my impression are the
most popular ones (at least on the R-help lists). The computer is a PC
desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
used only to run R.
Thanks
Steven
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 116
Date: Fri, 6 Jun 2008 15:48:10 -0700
From: Don MacQueen <macq at llnl.gov>
Subject: Re: [R] Improving data processing efficiency
To: Daniel Folkinshteyn <dfolkins at gmail.com>, r-help at r-project.org
Message-ID: <p0623090ec46f6b55f10e@[128.115.92.33]>
Content-Type: text/plain; charset="us-ascii" ;
format="flowed"
In a case like this, if you can possibly work with matrices instead
of data frames, you might get significant speedup.
(More accurately, I have had situations where I obtained speed up by
working with matrices instead of dataframes.)
Even if you have to code character columns as numeric, it can be worth it.
Data frames have overhead that matrices do not. (Here's where
profiling might have given a clue) Granted, there has been recent
work in reducing the overhead associated with dataframes, but I think
it's worth a try. Carrying along extra columns and doing row
subsetting, rbinding, etc, means a lot more things happening in
memory.
So, for example, if all of your matching is based just on a few
columns, extract those columns, convert them to a matrix, do all the
matching, and then based on some sort of row index retrieve all of
the associated columns.
-Don
At 2:09 PM -0400 6/5/08, Daniel Folkinshteyn wrote:>Hi everyone!
>
>I have a question about data processing efficiency.
>
>My data are as follows: I have a data set on quarterly institutional
>ownership of equities; some of them have had recent IPOs, some have
>not (I have a binary flag set). The total dataset size is 700k+ rows.
>
>My goal is this: For every quarter since issue for each IPO, I need
>to find a "matched" firm in the same industry, and close in market
>cap. So, e.g., for firm X, which had an IPO, i need to find a
>matched non-issuing firm in quarter 1 since IPO, then a (possibly
>different) non-issuing firm in quarter 2 since IPO, etc. Repeat for
>each issuing firm (there are about 8300 of these).
>
>Thus it seems to me that I need to be doing a lot of data selection
>and subsetting, and looping (yikes!), but the result appears to be
>highly inefficient and takes ages (well, many hours). What I am
>doing, in pseudocode, is this:
>
>1. for each quarter of data, getting out all the IPOs and all the
>eligible non-issuing firms.
>2. for each IPO in a quarter, grab all the non-issuers in the same
>industry, sort them by size, and finally grab a matching firm
>closest in size (the exact procedure is to grab the closest bigger
>firm if one exists, and just the biggest available if all are
>smaller)
>3. assign the matched firm-observation the same "quarters since
>issue" as the IPO being matched
>4. rbind them all into the "matching" dataset.
>
>The function I currently have is pasted below, for your reference.
>Is there any way to make it produce the same result but much faster?
>Specifically, I am guessing eliminating some loops would be very
>good, but I don't see how, since I need to do some fancy footwork
>for each IPO in each quarter to find the matching firm. I'll be
>doing a few things similar to this, so it's somewhat important to up
>the efficiency of this. Maybe some of you R-fu masters can clue me
>in? :)
>
>I would appreciate any help, tips, tricks, tweaks, you name it! :)
>
>========== my function below ==========>
>fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>quarters_since_issue=40) {
>
> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>cheaper, so typecast the result to matrix
>
> colnames = names(tfdata)
>
> quarterends = sort(unique(tfdata$DATE))
>
> for (aquarter in quarterends) {
> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>
> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>(tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
>& (tfdata_quarter$IPO.Flag == 0), ]
> tfdata_quarter_ipoissuers = tfdata_quarter[
>tfdata_quarter$IPO.Flag == 1, ]
>
> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
> arow = tfdata_quarter_ipoissuers[i,]
> industrypeers = tfdata_quarter_fitting_nonissuers[
>tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
> industrypeers = industrypeers[
>order(industrypeers$Market.Cap.13f), ]
> if ( nrow(industrypeers) > 0 ) {
> if (
>nrow(industrypeers[industrypeers$Market.Cap.13f >>arow$Market.Cap.13f,
]) > 0 ) {
> bestpeer >industrypeers[industrypeers$Market.Cap.13f
>= arow$Market.Cap.13f,
>][1,]
> }
> else {
> bestpeer = industrypeers[nrow(industrypeers),]
> }
> bestpeer$Quarters.Since.IPO.Issue
>arow$Quarters.Since.IPO.Issue
>
>#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO
=>bestpeer$PERMNO] = 1
> result = rbind(result, as.matrix(bestpeer))
> }
> }
> #result = rbind(result, tfdata_quarter)
> print (aquarter)
> }
>
> result = as.data.frame(result)
> names(result) = colnames
> return(result)
>
>}
>
>========= end of my function ============>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
------------------------------
Message: 117
Date: Fri, 6 Jun 2008 17:55:04 -0500
From: "hadley wickham" <h.wickham at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: "Daniel Folkinshteyn" <dfolkins at gmail.com>
Cc: r-help at r-project.org, Patrick Burns <pburns at pburns.seanet.com>
Message-ID:
<f8e6ff050806061555k4d8b5947vc73e5bc50c419cff at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Jun 6, 2008 at 5:10 PM, Daniel Folkinshteyn <dfolkins at
gmail.com> wrote:> Hmm... ok... so i ran the code twice - once with a preallocated result,
> assigning rows to it, and once with a nrow=0 result, rbinding rows to it,
> for the first 20 quarters. There was no speedup. In fact, running with a
> preallocated result matrix was slower than rbinding to the matrix:
>
> for preallocated matrix:
> Time difference of 1.577779 mins
>
> for rbinding:
> Time difference of 1.498628 mins
>
> (the time difference only counts from the start of the loop til the end, so
> the time to allocate the empty matrix was /not/ included in the time
count).
>
> So, it appears that rbinding a matrix is not the bottleneck. (That it was
> actually faster than assigning rows could have been a random anomaly (e.g.
> some other process eating a bit of cpu during the run?), or not - at any
> rate, it doesn't make an /appreciable/ difference.
Why not try profiling? The profr package provides an alternative
display that I find more helpful than the default tools:
install.packages("profr")
library(profr)
p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
plot(p)
That should at least help you see where the slow bits are.
Hadley
--
http://had.co.nz/
------------------------------
Message: 118
Date: Fri, 06 Jun 2008 18:59:02 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: Don MacQueen <macq at llnl.gov>
Cc: r-help at r-project.org
Message-ID: <4849C136.3080608 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
thanks for the suggestions! I'll play with this over the weekend and see
what comes out. :)
on 06/06/2008 06:48 PM Don MacQueen said the following:> In a case like this, if you can possibly work with matrices instead of
> data frames, you might get significant speedup.
> (More accurately, I have had situations where I obtained speed up by
> working with matrices instead of dataframes.)
> Even if you have to code character columns as numeric, it can be worth it.
>
> Data frames have overhead that matrices do not. (Here's where profiling
> might have given a clue) Granted, there has been recent work in reducing
> the overhead associated with dataframes, but I think it's worth a try.
> Carrying along extra columns and doing row subsetting, rbinding, etc,
> means a lot more things happening in memory.
>
> So, for example, if all of your matching is based just on a few columns,
> extract those columns, convert them to a matrix, do all the matching,
> and then based on some sort of row index retrieve all of the associated
> columns.
>
> -Don
>
> At 2:09 PM -0400 6/5/08, Daniel Folkinshteyn wrote:
>> Hi everyone!
>>
>> I have a question about data processing efficiency.
>>
>> My data are as follows: I have a data set on quarterly institutional
>> ownership of equities; some of them have had recent IPOs, some have
>> not (I have a binary flag set). The total dataset size is 700k+ rows.
>>
>> My goal is this: For every quarter since issue for each IPO, I need to
>> find a "matched" firm in the same industry, and close in
market cap.
>> So, e.g., for firm X, which had an IPO, i need to find a matched
>> non-issuing firm in quarter 1 since IPO, then a (possibly different)
>> non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing
>> firm (there are about 8300 of these).
>>
>> Thus it seems to me that I need to be doing a lot of data selection
>> and subsetting, and looping (yikes!), but the result appears to be
>> highly inefficient and takes ages (well, many hours). What I am doing,
>> in pseudocode, is this:
>>
>> 1. for each quarter of data, getting out all the IPOs and all the
>> eligible non-issuing firms.
>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>> industry, sort them by size, and finally grab a matching firm closest
>> in size (the exact procedure is to grab the closest bigger firm if one
>> exists, and just the biggest available if all are smaller)
>> 3. assign the matched firm-observation the same "quarters since
issue"
>> as the IPO being matched
>> 4. rbind them all into the "matching" dataset.
>>
>> The function I currently have is pasted below, for your reference. Is
>> there any way to make it produce the same result but much faster?
>> Specifically, I am guessing eliminating some loops would be very good,
>> but I don't see how, since I need to do some fancy footwork for
each
>> IPO in each quarter to find the matching firm. I'll be doing a few
>> things similar to this, so it's somewhat important to up the
>> efficiency of this. Maybe some of you R-fu masters can clue me in? :)
>>
>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>
>> ========== my function below ==========>>
>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>> quarters_since_issue=40) {
>>
>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>> cheaper, so typecast the result to matrix
>>
>> colnames = names(tfdata)
>>
>> quarterends = sort(unique(tfdata$DATE))
>>
>> for (aquarter in quarterends) {
>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>
>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
&
>> (tfdata_quarter$IPO.Flag == 0), ]
>> tfdata_quarter_ipoissuers = tfdata_quarter[
>> tfdata_quarter$IPO.Flag == 1, ]
>>
>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>> arow = tfdata_quarter_ipoissuers[i,]
>> industrypeers = tfdata_quarter_fitting_nonissuers[
>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>> industrypeers = industrypeers[
>> order(industrypeers$Market.Cap.13f), ]
>> if ( nrow(industrypeers) > 0 ) {
>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>> >= arow$Market.Cap.13f, ]) > 0 ) {
>> bestpeer >>
industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>> }
>> else {
>> bestpeer = industrypeers[nrow(industrypeers),]
>> }
>> bestpeer$Quarters.Since.IPO.Issue >>
arow$Quarters.Since.IPO.Issue
>>
>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO =>>
bestpeer$PERMNO] = 1
>> result = rbind(result, as.matrix(bestpeer))
>> }
>> }
>> #result = rbind(result, tfdata_quarter)
>> print (aquarter)
>> }
>>
>> result = as.data.frame(result)
>> names(result) = colnames
>> return(result)
>>
>> }
>>
>> ========= end of my function ============>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------
Message: 119
Date: Fri, 6 Jun 2008 18:02:36 -0500
From: "hadley wickham" <h.wickham at gmail.com>
Subject: Re: [R] color scale mapped to B/W
To: "Michael Friendly" <friendly at yorku.ca>
Cc: R-Help <r-help at stat.math.ethz.ch>
Message-ID:
<f8e6ff050806061602k76f3b54ched7e6f732dfc09eb at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Jun 6, 2008 at 3:37 PM, Michael Friendly <friendly at yorku.ca>
wrote:> In an R graphic, I'm using
>
> cond.col <- c("green", "yellow", "red")
> to represent a quantitative variable, where green means 'OK',
yellow
> represents 'warning'
> and red represents 'danger'. Using these particular color names, in
B/W, red
> is darkest
> and yellow is lightest. I'd like to find color designations to replace
> yellow and green so
> that when printed in B/W, the yellowish color appears darker than the
> greenish one.
An alternative approach would be to convert the colours into Luv,
adjust luminance appropriately and then convert back:
cond.col <- c("green", "yellow", "red")
col <- col2rgb(cond.col)
col.Luv <- convertColor(t(col), "sRGB", "Luv")
rownames(col.Luv) <- cond.col
col.Luv[, "L"] <- c(8000, 6000, 8000)
t(convertColor(col.Luv, "Luv", "sRGB"))
However, that doesn't actually seem to work - the back-transformed
colours are the same as the original.
Hadley
--
http://had.co.nz/
------------------------------
Message: 120
Date: Fri, 06 Jun 2008 19:10:51 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: hadley wickham <h.wickham at gmail.com>
Cc: r-help at r-project.org, Patrick Burns <pburns at pburns.seanet.com>
Message-ID: <4849C3FB.9090600 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
on 06/06/2008 06:55 PM hadley wickham said the
following:> Why not try profiling? The profr package provides an alternative
> display that I find more helpful than the default tools:
>
> install.packages("profr")
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
[[elided Yahoo spam]]
------------------------------
Message: 121
Date: Sat, 7 Jun 2008 01:23:21 +0200 (CEST)
From: Achim Zeileis <Achim.Zeileis at wu-wien.ac.at>
Subject: Re: [R] color scale mapped to B/W
To: Michael Friendly <friendly at yorku.ca>
Cc: R-Help <r-help at stat.math.ethz.ch>
Message-ID:
<Pine.LNX.4.64.0806070110130.3742 at paninaro.stat-math.wu-wien.ac.at>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Fri, 6 Jun 2008, Michael Friendly wrote:
> In an R graphic, I'm using
>
> cond.col <- c("green", "yellow", "red")
> to represent a quantitative variable, where green means 'OK',
yellow
> represents 'warning'
> and red represents 'danger'. Using these particular color names, in
B/W, red
> is darkest
> and yellow is lightest. I'd like to find color designations to replace
> yellow and green so
> that when printed in B/W, the yellowish color appears darker than the
> greenish one.
>
> Is there some tool/code I can use to find these? i.e., something to display
a
> grid
> of color swatches with color codes/names I can look at in color and B/W to
> decide?
You could look at colors in HCL (i.e., polar LUV). For example, you could
choose a dark red HCL = (0, 90, 40) and a light green (120, 70, 90) and a
yellow somewhere in between.
To emulate what happens when you print that out, you just set the chroma
to zero.
There are some functions helpful for that in "vcd", you could do
## load package
library("vcd")
## select colors from dark red to light green
c1 <- heat_hcl(3, h = c(0, 120), c = c(90, 70), l = c(40, 90), power=1.7)
c2 <- heat_hcl(3, h = c(0, 120), c = 0, l = c(40, 90), power=1.7)
## visualize in color and grayscale emulation
plot(-1, -1, xlim = c(0, 1), ylim = c(0, 2), axes = FALSE)
rect(0:2/3, 0, 1:3/3, 1, col = c1, border = "transparent")
rect(0:2/3, 1, 1:3/3, 2, col = c2, border = "transparent")
The ideas underlying this color choice are described in this report we've
written together with Kurt and Paul:
http://epub.wu-wien.ac.at/dyn/openURL?id=oai:epub.wu-wien.ac.at:epub-wu-01_c87
hth,
Z
> thanks,
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology
> Dept.
> York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
> Toronto, ONT M3J 1P3 CANADA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------
Message: 122
Date: Fri, 6 Jun 2008 17:27:09 -0600
From: "Greg Snow" <Greg.Snow at imail.org>
Subject: Re: [R] color scale mapped to B/W
To: "Michael Friendly" <friendly at yorku.ca>,
"R-Help"
<r-help at stat.math.ethz.ch>
Message-ID:
<B37C0A15B8FB3C468B5BC7EBC7DA14CC60F6BE1A51 at LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset=us-ascii
Try this (you need tcl 8.5 and the TeachingDemos package):
library(teachingDemos)
tmpplot <- function(col1='red', col2='yellow',
col3='green'){
plot(1:10,1:10, type='n')
rect(1,1,4,4, col=col1)
rect(1,4,4,7, col=col2)
rect(1,7,4,10, col=col3)
rect(6,1,9,4, col=col2grey(col1))
rect(6,4,9,7, col=col2grey(col2))
rect(6,7,9,10, col=col2grey(col3))
text(5, c(2,5,8), c(col1, col2, col3))
}
cols <- colors()[ -c( 152:253, 260:361) ]
tkexamp( tmpplot(), list(col1=list('combobox', values=cols,
init='red'),
col2=list('combobox',values=cols, init='yellow'),
col3=list('combobox',values=cols, init='green') ) )
Hope it helps,
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Michael Friendly [friendly at yorku.ca]
Sent: Friday, June 06, 2008 2:37 PM
To: R-Help
Subject: [R] color scale mapped to B/W
In an R graphic, I'm using
cond.col <- c("green", "yellow", "red")
to represent a quantitative variable, where green means 'OK', yellow
represents 'warning'
and red represents 'danger'. Using these particular color names, in B/W,
red is darkest
and yellow is lightest. I'd like to find color designations to replace
yellow and green so
that when printed in B/W, the yellowish color appears darker than the
greenish one.
Is there some tool/code I can use to find these? i.e., something to
display a grid
of color swatches with color codes/names I can look at in color and B/W
to decide?
t hanks,
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 123
Date: Fri, 06 Jun 2008 19:35:13 -0400
From: Daniel Folkinshteyn <dfolkins at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: hadley wickham <h.wickham at gmail.com>
Cc: r-help at r-project.org, Patrick Burns <pburns at pburns.seanet.com>
Message-ID: <4849C9B1.4060308 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> install.packages("profr")
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
>
> Hadley
>
so profiling reveals that '[.data.frame' and '[[.data.frame' and
'[' are
the biggest timesuckers...
i suppose i'll try using matrices and see how that stacks up (since all
my cols are numeric, should be a problem-free approach).
but i'm really wondering if there isn't some neat vectorized approach i
could use to avoid at least one of the nested loops...
------------------------------
Message: 124
Date: Fri, 6 Jun 2008 17:41:15 -0600
From: "Greg Snow" <Greg.Snow at imail.org>
Subject: Re: [R] color scale mapped to B/W
To: "Michael Friendly" <friendly at yorku.ca>,
"R-Help"
<r-help at stat.math.ethz.ch>
Message-ID:
<B37C0A15B8FB3C468B5BC7EBC7DA14CC60F6BE1A52 at LP-EXMBVS10.CO.IHC.COM>
Content-Type: text/plain; charset=us-ascii
You may also want to look at the "show.colors" function in the
"DAAG" package to get candidate colors.
________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
Behalf Of Michael Friendly [friendly at yorku.ca]
Sent: Friday, June 06, 2008 2:37 PM
To: R-Help
Subject: [R] color scale mapped to B/W
In an R graphic, I'm using
cond.col <- c("green", "yellow", "red")
to represent a quantitative variable, where green means 'OK', yellow
represents 'warning'
and red represents 'danger'. Using these particular color names, in B/W,
red is darkest
and yellow is lightest. I'd like to find color designations to replace
yellow and green so
that when printed in B/W, the yellowish color appears darker than the
greenish one.
Is there some tool/code I can use to find these? i.e., something to
display a grid
of color swatches with color codes/names I can look at in color and B/W
to decide?
t hanks,
--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 125
Date: Fri, 06 Jun 2008 19:45:43 -0400
From: Esmail Bonakdarian <esmail.js at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: hadley wickham <h.wickham at gmail.com>
Cc: r-help at r-project.org
Message-ID: <4849CC27.6020506 at gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
hadley wickham wrote:>
Hi,
I tried this suggestion as I am curious about bottlenecks in my own
R code ...
> Why not try profiling? The profr package provides an alternative
> display that I find more helpful than the default tools:
>
> install.packages("profr")
> install.packages("profr")
Warning message:
package ?profr? is not available
>
any ideas?
Thanks,
Esmail
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
>
> Hadley
>
------------------------------
Message: 126
Date: Fri, 6 Jun 2008 16:46:35 -0700
From: Horace Tso <Horace.Tso at pgn.com>
Subject: Re: [R] Improving data processing efficiency
To: Daniel Folkinshteyn <dfolkins at gmail.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Message-ID:
<D49782AAF0ACCD4B836AA8D7D40BF417C30FEDE6C1 at APEXMAIL.corp.dom>
Content-Type: text/plain; charset="us-ascii"
Daniel, allow me to step off the party line here for a moment, in a problem like
this it's better to code your function in C and then call it from R. You get
vast amount of performance improvement instantly. (From what I see the process
of recoding in C should be quite straight forward.)
H.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Daniel Folkinshteyn
Sent: Friday, June 06, 2008 4:35 PM
To: hadley wickham
Cc: r-help at r-project.org; Patrick Burns
Subject: Re: [R] Improving data processing efficiency
> install.packages("profr")
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
>
> Hadley
>
so profiling reveals that '[.data.frame' and '[[.data.frame' and
'[' are
the biggest timesuckers...
i suppose i'll try using matrices and see how that stacks up (since all
my cols are numeric, should be a problem-free approach).
but i'm really wondering if there isn't some neat vectorized approach i
could use to avoid at least one of the nested loops...
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
------------------------------
Message: 127
Date: Fri, 06 Jun 2008 20:09:41 -0400
From: Esmail Bonakdarian <esmail.js at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: hadley wickham <h.wickham at gmail.com>
Cc: r-help at r-project.org
Message-ID: <4849D1C5.7090109 at gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Esmail Bonakdarian wrote:> hadley wickham wrote:
>>
>
> Hi,
>
> I tried this suggestion as I am curious about bottlenecks in my own
> R code ...
>
>> Why not try profiling? The profr package provides an alternative
>> display that I find more helpful than the default tools:
>>
>> install.packages("profr")
>
> > install.packages("profr")
> Warning message:
> package ?profr? is not available
I selected a different mirror in place of the Iowa one and it
worked. Odd, I just assumed all the same packages are available
on all mirrors.
------------------------------
Message: 128
Date: Sat, 7 Jun 2008 08:12:05 +0800
From: ronggui <ronggui.huang at gmail.com>
Subject: [R] Problem of installing Matrix
To: R-Help <r-help at stat.math.ethz.ch>
Message-ID:
<38b9f0350806061712l4b9bc55eoa9c5e4ca7c343e19 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
[wincent at PC-BSD]export MAKE=gmake
[wincent at PC-BSD]sudo R
.....> install.packages("Matrix")
--- Please select a CRAN mirror for use in this session ---
Loading Tcl/Tk interface ... done
trying URL
'http://bibs.snu.ac.kr/R/src/contrib/Matrix_0.999375-9.tar.gz'
Content type 'application/x-gzip' length 1483674 bytes (1.4 Mb)
opened URL
=================================================downloaded 1.4 Mb
/usr/local/lib/R/library
* Installing *source* package 'Matrix' ...
** libs
** arch -
"Makefile", line 8: Need an operator
"Makefile", line 13: Need an operator
"Makefile", line 16: Need an operator
"Makefile", line 27: Need an operator
"Makefile", line 29: Need an operator
"Makefile", line 31: Need an operator
make: fatal errors encountered -- cannot continue
ERROR: compilation failed for package 'Matrix'
** Removing '/usr/local/lib/R/library/Matrix'
The downloaded packages are in
/tmp/Rtmpq3enyj/downloaded_packages
Updating HTML index of packages in '.Library'
Warning message:
In install.packages("Matrix") :
installation of package 'Matrix' had non-zero exit
status>
> sessionInfo()
R version 2.6.1 (2007-11-26)
i386-portbld-freebsd7.0
locale:
zh_CN.eucCN/zh_CN.eucCN/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] rcompgen_0.1-17 tcltk_2.6.1 tools_2.6.1
> R.version
_
platform i386-portbld-freebsd7.0
arch i386
os freebsd7.0
system i386, freebsd7.0
status
major 2
minor 6.1
year 2007
month 11
day 26
svn rev 43537
language R
version.string R version 2.6.1 (2007-11-26)
--
HUANG Ronggui, Wincent http://ronggui.huang.googlepages.com/
Bachelor of Social Work, Fudan University, China
Master of sociology, Fudan University, China
Ph.D. Candidate, CityU of HK.
------------------------------
Message: 129
Date: Fri, 6 Jun 2008 17:23:32 -0700
From: "Charles C. Berry" <cberry at tajo.ucsd.edu>
Subject: Re: [R] Improving data processing efficiency
To: Daniel Folkinshteyn <dfolkins at gmail.com>
Cc: r-help at r-project.org, Patrick Burns <pburns at pburns.seanet.com>
Message-ID: <Pine.LNX.4.64.0806061647020.29398 at tajo.ucsd.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Fri, 6 Jun 2008, Daniel Folkinshteyn wrote:
>> install.packages("profr")
>> library(profr)
>> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
>> plot(p)
>>
>> That should at least help you see where the slow bits are.
>>
>> Hadley
>>
> so profiling reveals that '[.data.frame' and
'[[.data.frame' and '[' are the
> biggest timesuckers...
>
> i suppose i'll try using matrices and see how that stacks up (since all
my
> cols are numeric, should be a problem-free approach).
>
> but i'm really wondering if there isn't some neat vectorized
approach i could
> use to avoid at least one of the nested loops...
>
As far as a vectorized solution, I'll bet you could do ALL the lookups of
non-issuers for all issuers with a single call to findInterval() (modulo
some cleanup afterwards) , but the trickery needed to do that would make
your code a bit opaque.
And in the end I doubt it would beat mapply() (read on...) by enough to
make it worthwhile.
---
What you are doing is conditional on industry group and quarter.
So using
indus.quarter <- with(tfdat,
paste(as.character(DATE), as.character(HSICIG), sep=".")))
and then calls like this:
split( <various> , indus.quater[ relevant.subset ] )
you can create:
a list of all issuer market caps according to quarter and group,
a list of all non-issuer caps (that satisfy your 'since quarter'
restriction) according to quarter and group,
a list of all non issuer indexes (i.e. row numbers) that satisfy
that restriction according to quarter and group
Then you write a function that takes the elements of each list for a given
quarter-industry group, looks up the matching non-issuers for each issuer,
and returns their indexes.
findInterval() will allow you to do this lookup for all issuers in one
industry group in a given quarter simultaneously and greatly speed this
process (but you will need to deal with the possible non-uniqueness of the
non-issuer caps - perhaps by adding a tiny jitter() to the values).
Then you feed the function and the lists to mapply().
The result is a list of indexes on the original data.frame. You can
unsplit() this if you like, then use those indexes to build your final
"result" data.frame.
HTH,
Chuck
p.s. and if this all seems like too much work, you should at least avoid
needlessly creating data.frames. Specifically
reorder things so that
industrypeers = <etc>
is only done ONCE for each industry group by quarter combination and
change stuff like
nrow(industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ])
> 0
to
any( industrypeers$Market.Cap.13f >= arow$Market.Cap.13f )
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
------------------------------
Message: 130
Date: Fri, 6 Jun 2008 21:32:24 -0500
From: "hadley wickham" <h.wickham at gmail.com>
Subject: Re: [R] Improving data processing efficiency
To: "Esmail Bonakdarian" <esmail.js at gmail.com>
Cc: r-help at r-project.org
Message-ID:
<f8e6ff050806061932i2180298sbec7a9d41abbd3d1 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
>> > install.packages("profr")
>> Warning message:
>> package 'profr' is not available
>
> I selected a different mirror in place of the Iowa one and it
> worked. Odd, I just assumed all the same packages are available
> on all mirrors.
The Iowa mirror is rather out of date as the guy who was looking after
it passed away.
Hadley
--
http://had.co.nz/
------------------------------
Message: 131
Date: Fri, 6 Jun 2008 21:34:39 -0500
From: "hadley wickham" <h.wickham at gmail.com>
Subject: Re: [R] color scale mapped to B/W
To: "Achim Zeileis" <Achim.Zeileis at wu-wien.ac.at>
Cc: Michael Friendly <friendly at yorku.ca>, R-Help
<r-help at stat.math.ethz.ch>
Message-ID:
<f8e6ff050806061934u3702b69bmd275d71c848db864 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Jun 6, 2008 at 6:23 PM, Achim Zeileis
<Achim.Zeileis at wu-wien.ac.at> wrote:> On Fri, 6 Jun 2008, Michael Friendly wrote:
>
>> In an R graphic, I'm using
>>
>> cond.col <- c("green", "yellow",
"red")
>> to represent a quantitative variable, where green means 'OK',
yellow
>> represents 'warning'
>> and red represents 'danger'. Using these particular color
names, in B/W,
>> red is darkest
>> and yellow is lightest. I'd like to find color designations to
replace
>> yellow and green so
>> that when printed in B/W, the yellowish color appears darker than the
>> greenish one.
>>
>> Is there some tool/code I can use to find these? i.e., something to
>> display a grid
>> of color swatches with color codes/names I can look at in color and B/W
to
>> decide?
>
> You could look at colors in HCL (i.e., polar LUV). For example, you could
> choose a dark red HCL = (0, 90, 40) and a light green (120, 70, 90) and a
> yellow somewhere in between.
How did you get to those numbers? I seem to remember there being
someway to convert rgb to hcl, but I can't find it.
Hadley
--
http://had.co.nz/
------------------------------
Message: 132
Date: Fri, 06 Jun 2008 23:22:30 -0400
From: "john.polo" <jpolo at mail.usf.edu>
Subject: Re: [R] editing a data.frame
To: Daniel Folkinshteyn <dfolkins at gmail.com>
Cc: r-help at r-project.org
Message-ID: <4849FEF6.1070803 at mail.usf.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Daniel Folkinshteyn wrote:> works for me:
> > sub('1.00', '1', '1.00E-20')
> [1] "1E-20"
when i input what you wrote, i get the same result. but that doesn't
change the value for TreeTag at row 1501, it's just floating around in
space. if i try it for yr1bp$TreeTag[1501], which is 1.00E-20 i get this:
> yr1bp$TreeTag[1501]<-sub("1.00", "1",
yr1bp$TreeTag[1501])
Warning message:
In `[<-.factor`(`*tmp*`, 1501, value = "1E-20") :
invalid factor level, NAs generated
and then 1501 turns into:
1501 <NA> 2001 adult 32.5
which is less useful than the way it was originally input. thanks for
the suggestion.
john
> finally, if all your target strings are of the form 1.00E-20, you
> could sub the whole thing with a more general regexp:
>
> sub("([0-9])(\.[0-9]{2})(.*)", "\\1\\3", yourvector)
> (it matches a digit, followed by a dot and two digits, followed by
> "anything else", and takes out the "dot and two digits"
bit in the
> replacement, in the whole vector.)
thanks for that suggestion. it could come in handy.
> on 06/06/2008 03:25 PM john.polo said the following:
>> dear R users,
>>
>> the data frame (read in from a csv) looks like this:
>> TreeTag Census Stage DBH
>> 1 CW-W740 2001 juvenile 5.8
>> 2 CW-W739 2001 juvenile 4.3
>> 3 CW-W738 2001 juvenile 4.7
>> 4 CW-W737 2001 juvenile 5.4
>> 5 CW-W736 2001 juvenile 7.4
>> 6 CW-W735 2001 juvenile 5.4
>> ...
>> 1501 1.00E-20 2001 adult 32.5
>>
>> i would like to change values under the TreeTag column. as the last
>> value shows, some of the tags have decimals followed by 2 decimal
>> places. i just want whole numbers, i.e. not 1.00E-20, but 1E-20. i
>> have a rough understanding of regexp and grepped all the positions
>> that have the inappropriate tags. i tried sub() a couple of different
>> ways, like
>> yr1bp$TreeTag[1501]<-sub("1.00", "1",
yr1bp$TreeTag[1501])
>> and after turning yr1bp$TreeTag[1501] into <NA>,
>> yr1bp$TreeTag[1501]<-sub("", "1E-20",
yr1pb$TreeTag[1501])
>> and
>> sub("", "1E-20", yr1bp$TreeTag[1501])
>> but it's not working. i guess it has something to do with the
>> data.frame characteristics i'm not aware of or don't
understand.
>> would i somehow have to tear apart the columns, edit them, and then
>> put it back together? not that i know how to do that, but i'm
>> wondering out loud.
>>
>> john
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
------------------------------
Message: 133
Date: Fri, 6 Jun 2008 21:56:24 -0700 (PDT)
Subject: [R] error message with dat
To: r-help at r-project.org
Message-ID: <317894.77066.qm at web46105.mail.sp1.yahoo.com>
Content-Type: text/plain
Hello everyone,
I have two problems which I am unable to solve :
1.I am trying to add the row labels (g1-g2000) to the very left of a data
table. The data
table is 2000 rows by 62 columns.I have used the following code.
read.table(file="C:\\Documents and Settings\\Owner\\My Documents\\colon
cancer1.txt",header=T,row.names=1)
rowname(dat) <- paste("g", c(1:nrow(dat)), sep="")
file.show(file="C:\\Documents and Settings\\Owner\\My Documents\\colon
cancer1.txt")
The error message I get is "error in nrow(dat):object "dat" not
found
2.I am also trying to populate a scatter plot with data from two columns which
are 2000 values
long.I have tried the following code:
read.table(file="C:\\Documents and Settings\\Owner\\My Documents\\colon
cancer1.txt",header=T,row.names=1)
file.show(file="C:\\Documents and Settings\\Owner\\My Documents\\colon
cancer1..txt")
plot(50,1500,type='p',xlab='normal',ylab='tumor',main='Tumor
sample vs.Normal Sample-2000genes')
plot(50,1500,type='p',xlab='normal1',ylab='normal2',main='Two
normal samples--first 20 genes',pch=15,col='blue')
plot(dat(,1), dat(,2))
I get the following error message "error in plot (dat(,1),dat(,2) could
not find function dat
I am not sure how I am suppossed to use function dat that is where and how to
define it to the table?
Any help would be appreciated.
Paul
[[alternative HTML version deleted]]
------------------------------
Message: 134
Date: Sat, 7 Jun 2008 06:00:25 +0100 (BST)
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Subject: Re: [R] Problem of installing Matrix
To: ronggui <ronggui.huang at gmail.com>
Cc: R-Help <r-help at stat.math.ethz.ch>
Message-ID:
<alpine.LFD.1.10.0806070557360.28358 at gannet.stats.ox.ac.uk>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>From the DESCRIPTION for Matrix:
SystemRequirements: GNU make
Presumably you have a BSD make on your FreeBSD system. This has come up
before, and FreeBSD users have succeeded with GNU make.
On Sat, 7 Jun 2008, ronggui wrote:
> [wincent at PC-BSD]export MAKE=gmake
> [wincent at PC-BSD]sudo R
> .....
>> install.packages("Matrix")
> --- Please select a CRAN mirror for use in this session ---
> Loading Tcl/Tk interface ... done
> trying URL
'http://bibs.snu.ac.kr/R/src/contrib/Matrix_0.999375-9.tar.gz'
> Content type 'application/x-gzip' length 1483674 bytes (1.4 Mb)
> opened URL
> =================================================> downloaded 1.4 Mb
>
> /usr/local/lib/R/library
> * Installing *source* package 'Matrix' ...
> ** libs
> ** arch -
> "Makefile", line 8: Need an operator
> "Makefile", line 13: Need an operator
> "Makefile", line 16: Need an operator
> "Makefile", line 27: Need an operator
> "Makefile", line 29: Need an operator
> "Makefile", line 31: Need an operator
> make: fatal errors encountered -- cannot continue
> ERROR: compilation failed for package 'Matrix'
> ** Removing '/usr/local/lib/R/library/Matrix'
>
> The downloaded packages are in
> /tmp/Rtmpq3enyj/downloaded_packages
> Updating HTML index of packages in '.Library'
> Warning message:
> In install.packages("Matrix") :
> installation of package 'Matrix' had non-zero exit status
>>
>
>> sessionInfo()
> R version 2.6.1 (2007-11-26)
> i386-portbld-freebsd7.0
>
> locale:
> zh_CN.eucCN/zh_CN.eucCN/C/C/C/C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17 tcltk_2.6.1 tools_2.6.1
>
>> R.version
> _
> platform i386-portbld-freebsd7.0
> arch i386
> os freebsd7.0
> system i386, freebsd7.0
> status
> major 2
> minor 6.1
> year 2007
> month 11
> day 26
> svn rev 43537
> language R
> version.string R version 2.6.1 (2007-11-26)
>
> --
> HUANG Ronggui, Wincent http://ronggui.huang.googlepages.com/
> Bachelor of Social Work, Fudan University, China
> Master of sociology, Fudan University, China
> Ph.D. Candidate, CityU of HK.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
------------------------------
Message: 135
Date: Sat, 7 Jun 2008 01:39:51 -0400 (EDT)
From: Rebecca Sela <rsela at stern.nyu.edu>
Subject: [R] Predicting a single observatio using LME
To: r-help <r-help at r-project.org>
Message-ID:
<32407705.667441212817191240.JavaMail.root at calliope.stern.nyu.edu>
Content-Type: text/plain; charset=utf-8
When I use a model fit with LME, I get an error if I try to use
"predict" with a dataset consisting of a single line.
For example, using this data:> simpledata
Y t D ID
1 -1.464740870 1 0 1
2 1.222911373 2 0 1
3 -0.605996798 3 0 1
4 0.155692707 4 0 1
5 3.849619772 1 0 2
6 4.289213902 2 0 2
7 2.369407737 3 0 2
8 2.249052533 4 0 2
9 0.920044316 1 0 3
10 2.003262622 2 0 3
11 0.003833438 3 0 3
12 1.578300927 4 0 3
13 -0.842322442 1 1 4
14 -0.657256158 2 1 4
15 1.504491575 3 1 4
16 2.896007045 4 1 4
17 0.990505440 1 1 5
18 2.722942793 2 1 5
19 4.395861278 3 1 5
20 4.849296475 4 1 5
21 3.049616421 1 1 6
22 2.874405962 2 1 6
23 4.359511097 3 1 6
24 6.165419699 4 1 6
This happened:> testLME <- lme(Y~t+D,data=simpledata,random=~1|ID)
> predict(testLME, simpledata[1,])
Error in val[revOrder, level + 1] : incorrect number of dimensions
This has occurred with other datasets as well. Is this a bug in the code, or am
I doing something wrong?
(Also, is there a way to parse a formula of a type given to "random"?
For example, given ~1+t|ID, I'd like to be able to extract all the variable
names to the left of | and to the right of |, the way one can with a normal
formula.)
Thanks in advance!
Rebecca
------------------------------
Message: 136
Date: Sat, 7 Jun 2008 03:25:55 -0300
From: Reid Tingley <r_tingley at hotmail.com>
Subject: [R] expected risk from coxph (survival)
To: <r-help at r-project.org>
Message-ID: <BAY122-W2997A32B5EA5B347FFA87586B60 at phx.gbl>
Content-Type: text/plain
Hello,
When I try to to obtain the expected risk for a new dataset using coxph in the
survival package I get an error. Using the example from ?coxph:
> test1 <- list(time= c(4, 3,1,1,2,2,3),+
status=c(1,NA,1,0,1,1,0),+ x= c(0, 2,1,1,1,0,0),+
sex= c(0, 0,0,0,1,1,1))> cox<-coxph( Surv(time, status) ~ x +
strata(sex), test1) #stratified model> > new<-list(time= c(5,
1,1,2,2,4,3),+ status=c(1,NA,1,0,0,1,1),+ x=
c(0, 2,1,1,1,0,0),+ sex= c(0, 0,0,0,1,1,1))> >
predict(cox,new,type="expected")Error in predict.coxph(cox, new, type
= "expected") : Method not yet finished
I assume that this is something that has simply not yet been incorporated into
the survival package. Does anyone know of a way to calculate the expected risk
for a new data set? Is this even possible? I would appreciate any help that you
could give me.
Cheers,
Reid
_________________________________________________________________
[[alternative HTML version deleted]]
------------------------------
Message: 137
Date: Fri, 6 Jun 2008 14:30:56 -0700 (PDT)
From: RobertsLRRI <raymond.roberts at ncf.edu>
Subject: [R] txt file, 14000+ rows, only last 8000 appear
To: r-help at r-project.org
Message-ID: <17701519.post at talk.nabble.com>
Content-Type: text/plain; charset=us-ascii
when I load my data file in txt format into the R workstation I lose about
6000 rows, this is a problem. Is there a limit to the display capabilities
for the workstation? is all the information there and I just can't see the
first couple thousand rows?
--
View this message in context:
http://www.nabble.com/txt-file%2C-14000%2B-rows%2C-only-last-8000-appear-tp17701519p17701519.html
Sent from the R help mailing list archive at Nabble.com.
------------------------------
Message: 138
Date: Fri, 6 Jun 2008 16:14:35 -0700 (PDT)
Subject: [R] functions for high dimensional integral
To: r-help at r-project.org
Message-ID: <17702978.post at talk.nabble.com>
Content-Type: text/plain; charset=us-ascii
I need to compute a high dimensional integral. Currently I'm using the
function adapt in R package adapt. But this method is kind of slow to me.
I'm wondering if there are other solutions. Thanks.
Zhongwen
--
View this message in context:
http://www.nabble.com/functions-for-high-dimensional-integral-tp17702978p17702978.html
Sent from the R help mailing list archive at Nabble.com.
------------------------------
Message: 139
Date: Sat, 7 Jun 2008 05:58:26 +0200
From: "Mathieu Prevot" <mathieu.prevot at ens.fr>
Subject: [R] compilation failed on MacOSX.5 / icc 10.1 / ifort 10.1 /
R 2.7.0
To: r-help at R-project.org
Message-ID:
<3e473cc60806062058h36500ae3g670c88ecaf687c5b at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Hi,
I got the following problem when I type make. The error is not enough
verbose to me so I can find the problem. Please cc me, I'm not
subscribed.
Thanks,
Mathieu
---------------------------
make[4]: `vfonts.so' is up to date.
building system startup profile
building package 'base'
all.R is unchanged
../../../library/base/R/base is unchanged
dyld: lazy symbol binding failed: Symbol not found: _Rf_ScalarString
Referenced from: /Users/mathieuprevot/PRE/R-2.7.0/lib/libR.dylib
Expected in: dynamic lookup
dyld: Symbol not found: _Rf_ScalarString
Referenced from: /Users/mathieuprevot/PRE/R-2.7.0/lib/libR.dylib
Expected in: dynamic lookup
/bin/sh: line 1: 26329 Done cat ./makebasedb.R
26330 Trace/BPT trap | R_DEFAULT_PACKAGES=NULL LC_ALL=C
../../../bin/R --vanilla --slave > /dev/null
make[3]: *** [all] Error 133
make[2]: *** [R] Error 1
make[1]: *** [R] Error 1
make: *** [R] Error 1
------------------------------
Message: 140
Date: Sat, 7 Jun 2008 08:15:48 +0000 (UTC)
From: Dieter Menne <dieter.menne at menne-biomed.de>
Subject: Re: [R] expected risk from coxph (survival)
To: r-help at stat.math.ethz.ch
Message-ID: <loom.20080607T081341-321 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
Reid Tingley <r_tingley <at> hotmail.com>
writes:> When I try to to obtain the expected risk for a new dataset using coxph in
the
survival package I get an error.> Using the example from ?coxph:
# Example rewritten by DM; please do not use HTML mail
library(survival)
test1 <- list(time= c(4, 3,1,1,2,2,3),
status=c(1,NA,1,0,1,1,0),
x= c(0, 2,1,1,1,0,0),
sex= c(0,0,0,0,1,1,1))
cox<-coxph( Surv(time, status) ~ x + strata(sex), test1) #stratified model
new<-list(time= c(5, 1,1,2,2,4,3),
status=c(1,NA,1,0,0,1,1),
x= c(0, 2,1,1,1,0,0),
sex= c(0,0,0,0,1,1,1))
predict(cox,new,type="expected")
# }
# else if (type == "expected") {
# if (missing(newdata))
# pred <- y[, ncol(y)] - object$residuals
# else stop("Method not yet finished")
Looks like this is "by design"; see the code above.
You might try to use cph and predict.Design from Frank Harrell's Design
package
instead.
Dieter
------------------------------
Message: 141
Date: Sat, 7 Jun 2008 09:18:43 +0100
From: "Paul Smith" <phhs80 at gmail.com>
Subject: Re: [R] txt file, 14000+ rows, only last 8000 appear
To: r-help at r-project.org
Message-ID:
<6ade6f6c0806070118nfbd437eo8aa214af16b52de3 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Fri, Jun 6, 2008 at 10:30 PM, RobertsLRRI <raymond.roberts at ncf.edu>
wrote:>
> when I load my data file in txt format into the R workstation I lose about
> 6000 rows, this is a problem. Is there a limit to the display capabilities
> for the workstation? is all the information there and I just can't see
the
> first couple thousand rows?
> --
> View this message in context:
http://www.nabble.com/txt-file%2C-14000%2B-rows%2C-only-last-8000-appear-tp17701519p17701519.html
Does
nrow(your_data.frame)
return the correct number of rows? If so, R read all lines.
Paul
------------------------------
Message: 142
Date: Sat, 7 Jun 2008 10:21:58 +0200 (CEST)
From: Achim Zeileis <Achim.Zeileis at wu-wien.ac.at>
Subject: Re: [R] color scale mapped to B/W
To: hadley wickham <h.wickham at gmail.com>
Cc: Michael Friendly <friendly at yorku.ca>, R-Help
<r-help at stat.math.ethz.ch>
Message-ID:
<Pine.LNX.4.64.0806071011450.10440 at paninaro.stat-math.wu-wien.ac.at>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Fri, 6 Jun 2008, hadley wickham wrote:
> On Fri, Jun 6, 2008 at 6:23 PM, Achim Zeileis
> <Achim.Zeileis at wu-wien.ac.at> wrote:
>> On Fri, 6 Jun 2008, Michael Friendly wrote:
>>
>>> In an R graphic, I'm using
>>>
>>> cond.col <- c("green", "yellow",
"red")
>>> to represent a quantitative variable, where green means
'OK', yellow
>>> represents 'warning'
>>> and red represents 'danger'. Using these particular color
names, in B/W,
>>> red is darkest
>>> and yellow is lightest. I'd like to find color designations to
replace
>>> yellow and green so
>>> that when printed in B/W, the yellowish color appears darker than
the
>>> greenish one.
>>>
>>> Is there some tool/code I can use to find these? i.e., something to
>>> display a grid
>>> of color swatches with color codes/names I can look at in color and
B/W to
>>> decide?
>>
>> You could look at colors in HCL (i.e., polar LUV). For example, you
could
>> choose a dark red HCL = (0, 90, 40) and a light green (120, 70, 90) and
a
>> yellow somewhere in between.
>
> How did you get to those numbers?
>From scratch:
- hues 0 and 120 because Michael wanted red and green
- luminances 40 and 90 for sequential colors from dark to light
- chromas 90 and 70 for two reasons: only small differences in
chroma seemed necessary, and 90 and 70 are close to the maximal
values given the other two coordinates for each color.
> I seem to remember there being
> someway to convert rgb to hcl, but I can't find it.
I always use "colorspace" for that. See
example("polarLUV", package = "colorspace")
Best,
Z
> Hadley
>
>
> --
> http://had.co.nz/
>
>
------------------------------
Message: 143
Date: Sat, 7 Jun 2008 08:29:27 +0000 (UTC)
From: Dieter Menne <dieter.menne at menne-biomed.de>
Subject: Re: [R] Predicting a single observatio using LME
To: r-help at stat.math.ethz.ch
Message-ID: <loom.20080607T082734-878 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
Rebecca Sela <rsela <at> stern.nyu.edu> writes:
>
> When I use a model fit with LME, I get an error if I try to use
"predict" with
a dataset consisting of a single line.>
> For example, using this data:
> > simpledata
> Y t D ID
> 23 4.359511097 3 1 6
> 24 6.165419699 4 1 6
>
> This happened:
> > testLME <- lme(Y~t+D,data=simpledata,random=~1|ID)
> > predict(testLME, simpledata[1,])
> Error in val[revOrder, level + 1] : incorrect number of dimensions
>
> This has occurred with other datasets as well. Is this a bug in the code,
or
am I doing something wrong?
No, this looks like a bug due to dimension-dropping when using one row. Probably
nobody used it with one value before. As a workaround, do some cheating
predict(testLME, simpledata[c(1,2),])
Dieter
------------------------------
Message: 144
Date: Sat, 7 Jun 2008 08:36:05 +0000 (UTC)
From: Dieter Menne <dieter.menne at menne-biomed.de>
Subject: Re: [R] lsmeans
To: r-help at stat.math.ethz.ch
Message-ID: <loom.20080607T083356-644 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii
John Fox <jfox <at> mcmaster.ca> writes:
> I intend at some point to extend the effects package to linear and
> generalized linear mixed-effects models, probably using lmer() rather
> than lme(), but as you discovered, it doesn't handle these models now.
>
> It wouldn't be hard, however, to do the computations yourself, using
> the coefficient vector for the fixed effects and a suitably constructed
> model-matrix to compute the effects; you could also get standard errors
> by using the covariance matrix for the fixed effects.
>
>> Douglas Bates:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2007q2/000222.html>>
My big problem with lsmeans is
that I have never been able to understand how they should be
calculated and, more importantly, why one should want to calculate
them. In other words, what do lsmeans represent and why should I be
interested in these particular values?>>
Truly Confused, torn apart by the Masters
Dieter
------------------------------
Message: 145
Date: Sat, 7 Jun 2008 10:56:13 +0100 (BST)
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Subject: Re: [R] functions for high dimensional integral
Cc: r-help at r-project.org
Message-ID:
<alpine.LFD.1.10.0806071049170.10617 at gannet.stats.ox.ac.uk>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Fri, 6 Jun 2008, ZT2008 wrote:
> I need to compute a high dimensional integral. Currently I'm using the
> function adapt in R package adapt. But this method is kind of slow to me.
> I'm wondering if there are other solutions. Thanks.
What does 'high' mean? Numerical quadrature will be slow in more than a
handful of dimensions.
What to recommend depends on what you know about the function -- Evans &
Swartz (2000) 'Approximating Integrals via Monte Carlo and Deterministic
Methods' is a good reference on integration for statisticians.
But accurate evaluation of an integral in more than 2 or 3 dimensions is
potentially a very computer-intensive task -- people spend days of CPU
time using e.g. MCMC to do just that.
> Zhongwen
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
------------------------------
_______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
End of R-help Digest, Vol 64, Issue 7