thr3ads.net - similar to: "Combining many files into one"

Displaying 20 results from an estimated 60000 matches similar to: "Combining many files into one"

Combining many csv files into one and adding a column with an id of each csv file read

2008 Apr 08

Combining many csv files into one and adding a column with an id of each csv file read

Dear R experts, I have been looking into the help-pages and old questions from the R-Help site, but the options offered there don't seem to work in my case. First of all, I am working on Windows XP, using R version 2.6.2. I am attaching two csv files as an example of how the data I am traying to put together is delivered to us. On the first row of every csv file is the name of the

combining large list of data.frames

2012 Apr 19

combining large list of data.frames

It's normal for me to create a list of data.frames and then use do.call('rbind', list(...)) to create a single data.frame. However, I've noticed as the size of the list grows large, it is perhaps better to do this in chunks. As an example here's a list of 20,000 similar data.frames. # create list of data.frames dat <- vector("list", 20000) for(i in

R combining many vectors of predictable name into one date frame

2012 Jul 13

R combining many vectors of predictable name into one date frame

G'day R (power) users, I have a many vectors, called: ib1 ib2 ib3 ... ib100 and I would like them in one data frame (df) such that: > df ib1 ib2 ib3 ib4 ..... ib100 x x x x x x x x x x x x x x x I have attempted: hold.list <- list(objects(pattern="ib")) df <- data.frame(hold.list) but that

Creating multiple graphs based on one variable

2009 May 26

Creating multiple graphs based on one variable

Dear List, I would like to create several graphs of similar data. I have x and y values for several different individuals (in this case fish). I would like to plot the x and y values for each fish separately. I can do it using a for loop, but I think I should be using "apply". Please let me know what I am doing wrong, or if there is a "better" way to do this. What I have

Reading and coalescing many datafiles.

2005 Apr 14

Reading and coalescing many datafiles.

Greetings. I've got some analysis problems I'm trying to solve, the raw data for which are accumulated in a bunch of time-and-date-based files. /some/path/2005-01-02-00-00-02 etc. The best 'read all these files' method I've seen in the r-help archives comes down to for (df in my_list_of_filenames ) { dat <- rbind(dat,my_read_function(df)) } which,

aggregate slow with many rows - alternative?

2005 Oct 13

aggregate slow with many rows - alternative?

Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -------------- dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, 32656, 32656, 32672, 32672, 32699 ),

xyplot, overlay two variables on one plot with group factors

2010 Feb 14

xyplot, overlay two variables on one plot with group factors

All I want to overlay two variables on the same plot following their appropriate grouping. I have attempted to use subscripting in panel with panel.xyplot, but I can't get the grouping to follow into the panel...here is an example... dat<-data.frame( y= log(1:10), y2=10:19, x=1:10, grp = as.factor(1) ) dat2<-data.frame( y= log(10:19), y2= 20:29, x=1:10, grp = as.factor(c(2)) )

how to read a group of files into one dataset?

2011 Aug 25

how to read a group of files into one dataset?

for example : I have files with the name "ma01.dat","ma02.dat","ma03.dat","ma04.dat",I want to read the data in these files into one data.frame flnm<-paste("obs",101:114,"_err.dat",sep="") newdata<-read.table(flnm,skip=2) data<-(flnm,skip=2) but the data only contains data from the flnm[1] I also tried as below : for

help with functions

2009 Sep 04

help with functions

Hi all, I have got 2 function (see bellow) which are simplifications of what I need to do. These functions are precisely the same, except for the last line. My question is, why doesn't function testA work in the same way as function testB. Both functions produce two objects, "a" and "b" that must merged with rbind. The difference is that in testA, I specify the name

Fwd: Re: Combining many dataframes from listings of objects?

2002 May 15

Fwd: Re: Combining many dataframes from listings of objects?

> I want to combine (rbind) many dataframes into a single data frame, but "automatically" > specifying the names of the dataframes as listing of object names. > E.g., combine these 18 df objects into one big df using something conceptually like this : > rbind(objects(pattern="*.df")) Brian Ripley suggested that something along the lines of:

how to use do.call("rbind", get(list(mlist)))

2005 Jan 21

how to use do.call("rbind", get(list(mlist)))

I have around 200 data frames I want to rbind in a vectorized way. The object names are: m302 m303 ... m500 So I tried: mlist <- paste("m",302:500,sep="") dat <- do.call("rbind", get(list(mlist))) and I get "Error in get(x, envir, mode, inherits) : invalid first argument" I know "rbind" is valid because dat <- rbind(m302, m303,

combining same-day lab measurements with 'apply'

2008 Oct 15

combining same-day lab measurements with 'apply'

Another request for help implementing the 'apply' functions to avoid a loop structure... I am working with a data set that includes lab measurements taken at different dates for the subjects, with some subjects having more results than others. I would like to average lab results for each subject that were taken on the same day. I can do this using a for loop, but would like to know how

dramatic speed difference in lapply

2010 Feb 26

dramatic speed difference in lapply

So I have a function that does lapply's for me based on dimension. Currently only works for length(pivotColumns)=2 because I haven't fixed the rbinds. I have two versions. One runs WAYYY faster than the other. And I'm not sure why. Fast Version: fedb.ddplyWrapper2Fast <- function(data, pivotColumns, listNameFunctions, ...){ lapplyFunctionRecurse <- function(cdata, level=1,

within group sequential subtraction

2011 Mar 10

within group sequential subtraction

Hi Everyone, I would like to do sequential subtractions within a group so that I know the time between separate observations for a group of individuals. My data: data <- structure(list(group = c("IND1", "IND1", "IND2", "IND2", "IND2", "IND3", "IND4", "IND5", "IND6", "IND6"), date_obs =

overriding "summary.default" or "summary.data.frame". How?

2012 Mar 20

overriding "summary.default" or "summary.data.frame". How?

I suppose everybody who makes a package for the first time thinks "I can change anything!" and then runs into this same question. Has anybody written out information on how a package can override functions in R base in the R 2.14 (mandatory NAMESPACE era)? Suppose I want to alphabetize variables in a summary.data.frame, or return the standard deviation with the mean in summary output.

converting stata's by syntax to R

2005 Aug 01

converting stata's by syntax to R

I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam <- c(1,2,3,3,4,4,4) wt <- c(1,1,0.6,0.4,0.4,0.4,0.2) keep <- c(1,1,1,0,1,0,0)

Data formatting for matplot

2009 Sep 28

Data formatting for matplot

Dear List, I am wanting to produce a multiple line plot, and know I can do it with matplot but can't get my data in the format I need. I have a dataframe with three columns; individuals ID, x, and y. I have tried split() but it gives me a list of matrices, which is closer but not quite what I need. For example: id<-rep(seq(1,5,1),length.out=100) x<-rnorm(100,5,1)

weighted.mean and tapply (again)

2005 May 25

weighted.mean and tapply (again)

I read answers to questions including the words "tapply" and "weighted.mean", but I didn't understand either the problem (data) or the solution provided. Here is my question ... > dat[1:10,] GROUP VALUE FREQUENCY 1 2 2 78 2 2 3 40 3 2 4 16 4 2 5 3 5 2 6 1 6 2 8 1 7

An example of very slow computation

2011 Aug 17

An example of very slow computation

This message is about a curious difference in timing between two ways of computing the same function. One uses expm, so is expected to be a bit slower, but "a bit" turned out to be a factor of >1000. The code is below. We would be grateful if anyone can point out any egregious bad practice in our code, or enlighten us on why one approach is so much slower than the other. The problem

Help with ddply to eliminate a for..loop

2010 Aug 26

Help with ddply to eliminate a for..loop

I created a small example to show something that I do a lot of. "scale" data by month and return a data.frame with the output. "id" represents repeated observations over "time" and I want to scale the "slope" variable. The "out" variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply

similar to: Combining many files into one