thr3ads.net - similar to: "Reading name-value data"

2011 Oct 23

2

Summary stats in table

Suppose I have data like this: A <- sample(letters[1:3], 1000, replace=TRUE) B <- sample(LETTERS[1:2], 1000, replace=TRUE) x <- rnorm(1000) I can get a table of means via tapply(x, list(A, B), mean) and I can add the marginal means to this using cbind/rbind: main <- tapply(x, list(A,B), mean) Amargin <- tapply(x, list(A), mean) Bmargin <- tapply(x, list(B), mean)

Using grep() to subset lines of text

2008 Nov 29

2

Using grep() to subset lines of text

I have two vectors, a and b. b is a text file. I want to find in b those elements of a which occur at the beginning of the line in b. I have the following code, but it only returns a value for the first value in a, but I want both. Any ideas please. a = c(2,3) b = NULL b[1] = "aaa 2 aaa" b[2] = "2 aaa" b[3] = "3 aaa" b[4] = "aaa 3 aaa"

plyr: version 1.5

2011 Apr 11

0

plyr: version 1.5

# plyr plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise

plyr: version 1.5

2011 Apr 11

0

plyr: version 1.5

# plyr plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to: * fit the same model each patient subsets of a data frame * quickly calculate summary statistics for each group * perform group-wise

Preparing data for display

2008 Nov 10

1

Preparing data for display

I have a dataset of about 10^6 rows, each consisting of a timestamp, several factors, a string, some integers, and some floats. I'd like to graph this data in various ways, including straightforward ones (how many events per week over the past year for each of 4 values of some factor), some less straightforward. I've managed to do this by brute force, but I'd like to learn how to do

cumsum vs. sum

2009 Feb 17

2

cumsum vs. sum

I recently traced a bug of mine to the fact that cumsum(s)[length(s)] is not always exactly equal to sum(s). For example, x<-1/(12:14) sum(x) - cumsum(x)[3] => 2.8e-17 Floating-point addition is of course not exact, and in particular is not associative, so there are various possible reasons for this. Perhaps sum uses clever summing tricks to get more accurate results? In some

Object equality for S4 objects

2009 Jul 29

3

Object equality for S4 objects

To test two environments for object equality (Lisp EQ), I can use 'identity': > e1 <- environment(local(function()x)) > e2 <- environment(local(function()x)) > identical(e1,e2) # compares object identity [1] FALSE > identical(as.list(e1),as.list(e2)) # compares values as name->value mapping [1] TRUE # (is there a

E`<`<rrors in recursive default argument references

2009 Mar 09

3

E`<`<rrors in recursive default argument references

Tested in: R version 2.8.1 (2008-12-22) / Windows Recursive default argument references normally give nice clear errors. In the first set of examples, you get the error: Error in ... : promise already under evaluation: recursive default argument reference or earlier problems? (function(a = a) a ) () (function(a = a) c(a) ) () (function(a = a) a[1] ) () (function(a = a)

Class for time of day?

2009 May 20

2

Class for time of day?

What is the recommended class for time of day (independent of calendar date)? And what is the recommended way to get the time of day from a POSIXct object? (Not a string representation, but a computable representation.) I have looked in the man page for DateTimeClasses, in the Time Series Analysis Task View and in Spector's Data Manipulation book but haven't found these. Clearly I can

Variable/function namespaces WAS: Bug in subsetting data frame (PR#13515)

2009 Feb 10

1

Variable/function namespaces WAS: Bug in subsetting data frame (PR#13515)

On Tue, Feb 10, 2009 at 10:11 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote: > Stavros Macrakis wrote: >> On Tue, Feb 10, 2009 at 8:31 AM, Duncan Murdoch <murdoch at stats.uwo.ca>wrote: >>> The evaluator recognizes the context of usage and will get the >>> function for a function call.... >> Can you point me to chapter and verse in the language

transposing a data frame from horizontal to vertical (stacking)

2010 Jun 29

2

transposing a data frame from horizontal to vertical (stacking)

Hello, everyone! I have a very simple task - I have a data frame (see MyData below) and I need to stack the data (see result below). I wrote the syntax below - it's very basic and it does what I need. But I am sure what I am trying to do is a very typical task and there must be a much shorter/more elegant way of doing it. Any advice? Thank you very much!

Vectorized switch

2009 Dec 18

2

Vectorized switch

What is the 'idiomatic' way of writing a vectorized switch statement? That is, I would like to write, e.g., vswitch( c('a','x','b','a'), a= 1:4, b=11:14, 100 ) => c(1, 100, 13, 4 ) equivalent to ifelse( c('a','x','b','a') ==

Speed difference between df$a[1] and df[1,"a"]

2011 Oct 19

2

Speed difference between df$a[1] and df[1,"a"]

I was surprised to find that df$a[1] is an order of magnitude faster than df[1,"a"]: > df <- data.frame(a=1:10) > system.time(replicate(100000, df$a[3])) user system elapsed 0.36 0.00 0.36 > system.time(replicate(100000, df[3,"a"])) user system elapsed 4.09 0.00 4.09 A priori, I'd have thought that combining the row and column

The assign(paste(...,i),...) idiom

2009 Apr 20

2

The assign(paste(...,i),...) idiom

Judging from the traffic on this mailing list, a lot of R beginners are trying to write things like assign( paste( "myvar", i), ...) where they really should probably be writing myvar[i] <- ... Do we have any idea where this bizarre habit comes from? -s

R Books listing on R-Project

2009 May 27

1

R Books listing on R-Project

I was wondering what the criteria were for including books on the Books Related to R page <http://www.r-project.org/doc/bib/R-books.html>. (There is no maintainer listed on this page.) In particular, I was wondering why the following two books are not listed: * Andrew Gelman, Jennifer Hill, *Data Analysis Using Regression and Multilevel/Hierarchical Models*. (CRAN package 'arm') *

R and Scheme

2008 Dec 08

4

R and Scheme

I've read in many places that R semantics are based on Scheme semantics. As a long-time Lisp user and implementor, I've tried to make this more precise, and this is what I've found so far. I've excluded trivial things that aren't basic semantic issues: support for arbitrary-precision integers; subscripting; general style; etc. I would appreciate corrections or additions from

Definition of = vs. <-

2009 Apr 01

2

Definition of = vs. <-

NOTA BENE: This email is about `=`, the assignment operator (e.g. {a=1} which is equivalent to { `=`(a,1) } ), not `=` the named-argument syntax (e.g. f(a=1), which is equivalent to eval(structure(quote(f(1)),names=c('','a'))). As far as I can tell from the documentation, assignment with = is precisely equivalent to assignment with <-. Yet they call different primitives: >

Definition of = vs. <-

2009 Apr 01

2

Definition of = vs. <-

NOTA BENE: This email is about `=`, the assignment operator (e.g. {a=1} which is equivalent to { `=`(a,1) } ), not `=` the named-argument syntax (e.g. f(a=1), which is equivalent to eval(structure(quote(f(1)),names=c('','a'))). As far as I can tell from the documentation, assignment with = is precisely equivalent to assignment with <-. Yet they call different primitives: >

General binary search?

2011 Apr 04

2

General binary search?

Is there a generic binary search routine in a standard library which a) works for character vectors b) runs in O(log(N)) time? I'm aware of findInterval(x,vec), but it is restricted to numeric vectors. I'm also aware of various hashing solutions (e.g. new.env(hash=TRUE) and fastmatch), but I need the greatest-lower-bound match in my application. findInterval is also slow for

Handling of factors

2009 Jan 21

1

Handling of factors

I'm rather confused by the semantics of factors. When applied to factors, some functions (whose results are elements of the original factor argument) return results of class factor, some return integer vectors, some return character vectors, some give errors. I understand some but not all of this. Consider: Preserve factors: `[`, `[[`, sort, unique, subset, head, tapply, rep, rev, by,

similar to: Reading name-value data