similar to: Combining many files into one

Displaying 20 results from an estimated 60000 matches similar to: "Combining many files into one"

2008 Apr 08
1
Combining many csv files into one and adding a column with an id of each csv file read
Dear R experts, I have been looking into the help-pages and old questions from the R-Help site, but the options offered there don't seem to work in my case. First of all, I am working on Windows XP, using R version 2.6.2. I am attaching two csv files as an example of how the data I am traying to put together is delivered to us. On the first row of every csv file is the name of the
2012 Apr 19
1
combining large list of data.frames
It's normal for me to create a list of data.frames and then use do.call('rbind', list(...)) to create a single data.frame. However, I've noticed as the size of the list grows large, it is perhaps better to do this in chunks. As an example here's a list of 20,000 similar data.frames. # create list of data.frames dat <- vector("list", 20000) for(i in
2012 Jul 13
1
R combining many vectors of predictable name into one date frame
G'day R (power) users, I have a many vectors, called: ib1 ib2 ib3 ... ib100 and I would like them in one data frame (df) such that: > df ib1 ib2 ib3 ib4 ..... ib100 x x x x x x x x x x x x x x x I have attempted: hold.list <- list(objects(pattern="ib")) df <- data.frame(hold.list) but that
2009 May 26
4
Creating multiple graphs based on one variable
Dear List, I would like to create several graphs of similar data. I have x and y values for several different individuals (in this case fish). I would like to plot the x and y values for each fish separately. I can do it using a for loop, but I think I should be using "apply". Please let me know what I am doing wrong, or if there is a "better" way to do this. What I have
2005 Apr 14
2
Reading and coalescing many datafiles.
Greetings. I've got some analysis problems I'm trying to solve, the raw data for which are accumulated in a bunch of time-and-date-based files. /some/path/2005-01-02-00-00-02 etc. The best 'read all these files' method I've seen in the r-help archives comes down to for (df in my_list_of_filenames ) { dat <- rbind(dat,my_read_function(df)) } which,
2005 Oct 13
3
aggregate slow with many rows - alternative?
Hi, I use the code below to aggregate / cnt my test data. It works fine, but the problem is with my real data (33'000 rows) where the function is really slow (nothing happened in half an hour). Does anybody know of other functions that I could use? Thanks, Hans-Peter -------------- dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656, 32656, 32656, 32672, 32672, 32699 ),
2010 Feb 14
2
xyplot, overlay two variables on one plot with group factors
All I want to overlay two variables on the same plot following their appropriate grouping. I have attempted to use subscripting in panel with panel.xyplot, but I can't get the grouping to follow into the panel...here is an example... dat<-data.frame( y= log(1:10), y2=10:19, x=1:10, grp = as.factor(1) ) dat2<-data.frame( y= log(10:19), y2= 20:29, x=1:10, grp = as.factor(c(2)) )
2011 Aug 25
2
how to read a group of files into one dataset?
for example : I have files with the name "ma01.dat","ma02.dat","ma03.dat","ma04.dat",I want to read the data in these files into one data.frame flnm<-paste("obs",101:114,"_err.dat",sep="") newdata<-read.table(flnm,skip=2) data<-(flnm,skip=2) but the data only contains data from the flnm[1] I also tried as below : for
2009 Sep 04
2
help with functions
Hi all, I have got 2 function (see bellow) which are simplifications of what I need to do. These functions are precisely the same, except for the last line. My question is, why doesn't function testA work in the same way as function testB. Both functions produce two objects, "a" and "b" that must merged with rbind. The difference is that in testA, I specify the name
2002 May 15
1
Fwd: Re: Combining many dataframes from listings of objects?
> I want to combine (rbind) many dataframes into a single data frame, but "automatically" > specifying the names of the dataframes as listing of object names. > E.g., combine these 18 df objects into one big df using something conceptually like this : > rbind(objects(pattern="*.df")) Brian Ripley suggested that something along the lines of:
2005 Jan 21
6
how to use do.call("rbind", get(list(mlist)))
I have around 200 data frames I want to rbind in a vectorized way. The object names are: m302 m303 ... m500 So I tried: mlist <- paste("m",302:500,sep="") dat <- do.call("rbind", get(list(mlist))) and I get "Error in get(x, envir, mode, inherits) : invalid first argument" I know "rbind" is valid because dat <- rbind(m302, m303,
2008 Oct 15
1
combining same-day lab measurements with 'apply'
Another request for help implementing the 'apply' functions to avoid a loop structure... I am working with a data set that includes lab measurements taken at different dates for the subjects, with some subjects having more results than others. I would like to average lab results for each subject that were taken on the same day. I can do this using a for loop, but would like to know how
2010 Feb 26
2
dramatic speed difference in lapply
So I have a function that does lapply's for me based on dimension. Currently only works for length(pivotColumns)=2 because I haven't fixed the rbinds. I have two versions. One runs WAYYY faster than the other. And I'm not sure why. Fast Version: fedb.ddplyWrapper2Fast <- function(data, pivotColumns, listNameFunctions, ...){ lapplyFunctionRecurse <- function(cdata, level=1,
2011 Mar 10
2
within group sequential subtraction
Hi Everyone, I would like to do sequential subtractions within a group so that I know the time between separate observations for a group of individuals. My data: data <- structure(list(group = c("IND1", "IND1", "IND2", "IND2", "IND2", "IND3", "IND4", "IND5", "IND6", "IND6"), date_obs =
2012 Mar 20
1
overriding "summary.default" or "summary.data.frame". How?
I suppose everybody who makes a package for the first time thinks "I can change anything!" and then runs into this same question. Has anybody written out information on how a package can override functions in R base in the R 2.14 (mandatory NAMESPACE era)? Suppose I want to alphabetize variables in a summary.data.frame, or return the standard deviation with the mean in summary output.
2005 Aug 01
6
converting stata's by syntax to R
I am struggling with migrating some stata code to R. I have a data frame containing, sometimes, repeat observations (rows) of the same family. I want to keep only one observation per family, selecting that observation according to some other variable. An example data frame is: # construct example data fam <- c(1,2,3,3,4,4,4) wt <- c(1,1,0.6,0.4,0.4,0.4,0.2) keep <- c(1,1,1,0,1,0,0)
2009 Sep 28
2
Data formatting for matplot
Dear List, I am wanting to produce a multiple line plot, and know I can do it with matplot but can't get my data in the format I need. I have a dataframe with three columns; individuals ID, x, and y. I have tried split() but it gives me a list of matrices, which is closer but not quite what I need. For example: id<-rep(seq(1,5,1),length.out=100) x<-rnorm(100,5,1)
2005 May 25
2
weighted.mean and tapply (again)
I read answers to questions including the words "tapply" and "weighted.mean", but I didn't understand either the problem (data) or the solution provided. Here is my question ... > dat[1:10,] GROUP VALUE FREQUENCY 1 2 2 78 2 2 3 40 3 2 4 16 4 2 5 3 5 2 6 1 6 2 8 1 7
2011 Aug 17
2
An example of very slow computation
This message is about a curious difference in timing between two ways of computing the same function. One uses expm, so is expected to be a bit slower, but "a bit" turned out to be a factor of >1000. The code is below. We would be grateful if anyone can point out any egregious bad practice in our code, or enlighten us on why one approach is so much slower than the other. The problem
2010 Aug 26
3
Help with ddply to eliminate a for..loop
I created a small example to show something that I do a lot of. "scale" data by month and return a data.frame with the output. "id" represents repeated observations over "time" and I want to scale the "slope" variable. The "out" variable shows the output I want. My for..loop does the job but is probably very slow versus other methods. ddply