thr3ads.net - similar to: "applying to dataframe rows"

Displaying 20 results from an estimated 10000 matches similar to: "applying to dataframe rows"

creating horizontal dataframes with column names

2008 Sep 17

creating horizontal dataframes with column names

Greetings -- in order to write back to SQL databases, one needs to create a dataframe with values. I can get column names of an existing table with sqlColumns. Say I have a vector of values (if they're all the same type), or a list (if different). How do I create a dataframe with column names given by my sqlColumns? To make it concrete, how do we make a dataframe A B C 1 2 3

name scoping within dataframe index

2009 Jan 26

name scoping within dataframe index

Every time I have to prefix a dataframe column inside the indexing brackets with the dataframe name, e.g. df[df$colname==value,] -- I am wondering, why isn't there an R scoping rule that search starts with the dataframe names, as if we'd said with(df, df[colname==value,]) -- wouldn't that be a reasonable default to prepend to the name search path? Cheers, Alexy

growing dataframes with rbind

2009 Feb 24

growing dataframes with rbind

I'm growing a large dataframe by composing new rows and then doing row <- compute.new.row.somehow(...) d <- rbind(d,row) Is this a fast/preferred way? Cheers, Alexy

printing levels as tuples

2007 Nov 23

printing levels as tuples

I'm running rle() on a long vector, and get a result which looks like > uc Run Length Encoding lengths: int [1:16753] 1 1 1 1 1 1 1 1 1 1 ... values : int [1:16753] 29462748 22596107 18322820 14323315 12684505 9909036 7296916 6857692 5884755 5883697 ... I can print uc$names or uc$levels separately. Is there any way to print them together as tuples, looking like (29462748, 1)

splitting time vector into days

2008 Sep 09

splitting time vector into days

Greetings -- I have a dataframe a with one element a vector, time, of POSIXct values. What's a good way to split the data frame into periods of a$time, e.g. days, and apply a function, e.g. mean, to some other column of the dataframe, e.g. a$value? Cheers, Alexy

exporting a split list

2007 Nov 27

exporting a split list

Using wk <- with(d, split(word, kind)), I get the following class table: wk$`1` [1] "a" "bra" ... # (*) wk$`10` "ca" "dabra" ... Now I need to export it in the following format: class num_members examples 1 23 a bra ... 10 4 ca dabra For each class C such as `1`, I need to print the

namespaces

2008 Oct 02

namespaces

I'd like to control my namespace thoroughly, separated by task. Is there a way, in R session, to introduce namespaces for tasks dynamically and switch them as needed? Or, is there a combination of load/save workspace steps which can facilitate this? Cheers, Alexy

uniq -c

2007 Nov 21

uniq -c

Is there an R analog of the Unix command uniq -c: http://en.wikipedia.org/wiki/Uniq Given an array x, uniq -c replaces each contiguous subsequence of identical numbers with a tuple (count, number). E.g. $ cat > usample 10 10 9 8 8 7 7 7 6 3 1 1 1 0 $ uniq -c usample 2 10 1 9 2 8 3 7 1 6 1 3 3 1 1 0 Cheers, Alexy

Functional pattern-matching in R

2008 Oct 29

Functional pattern-matching in R

I found there's a very good functional set of operations in R, such as apply family, Hadley Wickham's lovely plyr, etc. There's even a Reduce (a.k.a. fold). Now I wonder how can we do pattern-matching? E.g., now I split dimensions like this: m <- dim(V)[1] # R n <- dim(V)[2] # still R While even Matlab allows for [m,n] = size(V) % MATLAB! Ideally I'd be able to

rbind a heterogeneous row

2011 Mar 23

rbind a heterogeneous row

I have a dataframe with many rows like this: > df X1 X2 X3 X4 X5 X6 X7 week d sim1 FALSE TRUE TRUE TRUE TRUE TRUE TRUE 1 0.3064985 sim1 is the rowname, X1..X7,week,d are the column names. X1..X7 are factors, booleans in this case. I need to add another row, represented by the following list: list(rep(T,7),5,0.0) -- i.e, TRUE in all boolean columns,

shrink a dataframe for plotting

2007 Nov 21

shrink a dataframe for plotting

I get tables with millions of rows. For plotting to a screen-size jpg, obviously just about 1000 points are enough. Instead of feeding plot() the original millions of rows, I'd rather shrink the original dataframe, using some kind of the following interpolation: -- split dataframe into chunks of N rows each, e.g. 1000 rows each -- compute average for each column -- issue one new row

R/OCaml?

2008 Oct 09

R/OCaml?

Did anyone try to write R extensions in OCaml? What would it entail to enable it? Cheers, Alexy

dealing with NAs in time series

2008 Sep 05

dealing with NAs in time series

Certain timeseries I have had outliers, which I removed by assigning NA to their positions. Now acf() refuses to go to work. What's the right way to remove outliers from ts objects, and what are teh standard ways to interpolate NAs in them? Cheers, Alexy

partitioning vectors of intervals

2008 Sep 28

partitioning vectors of intervals

I have two pairs of time intervals: coarse- and fine-grained. They're components of their respective dataframes, looking like, coarse: endtime starttime 1 t1_end t1_start 2 t2_end t2_start ... fine: is the same, except that its intervals presumably fall into the coarse's enclosing ones. The problem is to partition

factors to integers preserving value in a dataframe

2009 Feb 27

factors to integers preserving value in a dataframe

I want to produce a dataframe with integer columns for elements of string pairs: pairs <- c("10 21","23 45") pairs.split <- lapply(pairs,function(x)strsplit(x," ")) pdf <- as.data.frame(pairs.split) names(pdf) <- c("p","q") -- at this point things look good, except the columns are factors, as I didn't change the default

R as a programming language

2007 Nov 07

R as a programming language

Greetings -- coming from Python/Ruby perspective, I'm wondering about certain features of R as a programming language. Say I have a huge table t of the form run ord unit words new 1 1 6939 1013 641 1 2 275 1001 518 1 3 3314 1008 488 1 4 14154 1018 463 1 5 2982 1006 421 Alternatively, it

[:]

2007 Nov 24

[:]

What are idioms for taking a head or a tail of a vector, either up to an index, or from an index to the end? Also -- is it necessary to use length(v) to refer to the last element? E.g., Python has v[:3] # indices 0,1,2 v[3:] # indices 3,4,... v[-1] # the last element of v v[:-1] # all but last Cheers, Alexy

1.095e+09 for integers

2009 Feb 23

1.095e+09 for integers

I've had a very long file written out by R with write.table, with fields of time values, converted from POSIXlt as.numeric. Among 2.5 million values, very few had 6 trailing zeroes, and those were output in scientific notation as in the subject. Is this the default behavior for long integers, and how can it be turned off (with all digits for any integer field in write.table)? This

figure margins too large for a barplot in png, pdf ok

2008 May 07

figure margins too large for a barplot in png, pdf ok

I've used to have a script with a barplot command it in, preceded by a png: png(graph.file,height=H,width=W) barplot(t,names.arg=breaks[2:(length(t)+1)],tck=gridlines) -- worked before R 2.6.2. When I tried it in R 2.6.2, which I have for a while but didn't run with that script, it complained, the margins too large, and I've googled the messages from our list where neither

accessing and preserving list names in lapply

2009 Feb 27

accessing and preserving list names in lapply

Sometimes I'm iterating over a list where names are keys into another data structure, e.g. a related list. Then I can't use lapply as it does [[]] and loses the name. Then I do something like this: do.one <- function(ldf) { # list-dataframe item key <- names(ldf) meat <- ldf[[1]] mydf <- some.df[[key]] # related data structure r.df <-

similar to: applying to dataframe rows