Displaying 20 results from an estimated 60000 matches similar to: "Combining many files into one"
2008 Apr 08
1
Combining many csv files into one and adding a column with an id of each csv file read
Dear R experts,
I have been looking into the help-pages and old
questions from the R-Help site, but the options
offered there don't seem to work in my case.
First of all, I am working on Windows XP, using R
version 2.6.2.
I am attaching two csv files as an example of how
the data I am traying to put together is delivered to
us. On the first row of every csv file is the name of
the
2012 Apr 19
1
combining large list of data.frames
It's normal for me to create a list of data.frames and then use
do.call('rbind', list(...)) to create a single data.frame. However,
I've noticed as the size of the list grows large, it is perhaps better
to do this in chunks. As an example here's a list of 20,000 similar
data.frames.
# create list of data.frames
dat <- vector("list", 20000)
for(i in
2012 Jul 13
1
R combining many vectors of predictable name into one date frame
G'day R (power) users,
I have a many vectors, called:
ib1
ib2
ib3
...
ib100
and I would like them in one data frame (df) such that:
> df
ib1 ib2 ib3 ib4 ..... ib100
x x x x x
x x x x x
x x x x x
I have attempted:
hold.list <- list(objects(pattern="ib"))
df <- data.frame(hold.list)
but that
2009 May 26
4
Creating multiple graphs based on one variable
Dear List,
I would like to create several graphs of similar data. I have x and y values for several different individuals (in this case fish). I would like to plot the x and y values for each fish separately. I can do it using a for loop, but I think I should be using "apply". Please let me know what I am doing wrong, or if there is a "better" way to do this. What I have
2005 Apr 14
2
Reading and coalescing many datafiles.
Greetings.
I've got some analysis problems I'm trying to solve, the raw data for which
are accumulated in a bunch of time-and-date-based files.
/some/path/2005-01-02-00-00-02
etc.
The best 'read all these files' method I've seen in the r-help archives comes
down to
for (df in my_list_of_filenames )
{
dat <- rbind(dat,my_read_function(df))
}
which,
2005 Oct 13
3
aggregate slow with many rows - alternative?
Hi,
I use the code below to aggregate / cnt my test data. It works fine,
but the problem is with my real data (33'000 rows) where the function
is really slow (nothing happened in half an hour).
Does anybody know of other functions that I could use?
Thanks,
Hans-Peter
--------------
dat <- data.frame( Datum = c( 32586, 32587, 32587, 32625, 32656,
32656, 32656, 32672, 32672, 32699 ),
2010 Feb 14
2
xyplot, overlay two variables on one plot with group factors
All
I want to overlay two variables on the same plot following their appropriate
grouping. I have attempted to use subscripting in panel with panel.xyplot,
but I can't get the grouping to follow into the panel...here is an
example...
dat<-data.frame(
y= log(1:10),
y2=10:19,
x=1:10,
grp = as.factor(1)
)
dat2<-data.frame(
y= log(10:19),
y2= 20:29,
x=1:10,
grp = as.factor(c(2))
)
2011 Aug 25
2
how to read a group of files into one dataset?
for example : I have files with the name
"ma01.dat","ma02.dat","ma03.dat","ma04.dat",I want to read the data in
these files into one data.frame
flnm<-paste("obs",101:114,"_err.dat",sep="")
newdata<-read.table(flnm,skip=2)
data<-(flnm,skip=2)
but the data only contains data from the flnm[1]
I also tried as below :
for
2009 Sep 04
2
help with functions
Hi all,
I have got 2 function (see bellow) which are simplifications of what I need
to do. These functions are precisely the same, except for the last line.
My question is, why doesn't function testA work in the same way as function
testB.
Both functions produce two objects, "a" and "b" that must merged with rbind.
The difference is that in testA, I specify the name
2002 May 15
1
Fwd: Re: Combining many dataframes from listings of objects?
> I want to combine (rbind) many dataframes into a single data frame, but "automatically"
> specifying the names of the dataframes as listing of object names.
> E.g., combine these 18 df objects into one big df using something conceptually like this :
> rbind(objects(pattern="*.df"))
Brian Ripley suggested that something along the lines of:
2005 Jan 21
6
how to use do.call("rbind", get(list(mlist)))
I have around 200 data frames I want to rbind in a vectorized way.
The object names are:
m302
m303
...
m500
So I tried:
mlist <- paste("m",302:500,sep="")
dat <- do.call("rbind", get(list(mlist)))
and I get "Error in get(x, envir, mode, inherits) : invalid first argument"
I know "rbind" is valid because
dat <- rbind(m302, m303,
2008 Oct 15
1
combining same-day lab measurements with 'apply'
Another request for help implementing the 'apply' functions to avoid a
loop structure...
I am working with a data set that includes lab measurements taken at
different dates for the subjects, with some subjects having more
results than others. I would like to average lab results for each
subject that were taken on the same day. I can do this using a for
loop, but would like to know how
2010 Feb 26
2
dramatic speed difference in lapply
So I have a function that does lapply's for me based on dimension. Currently
only works for length(pivotColumns)=2 because I haven't fixed the rbinds. I
have two versions. One runs WAYYY faster than the other. And I'm not sure
why.
Fast Version:
fedb.ddplyWrapper2Fast <- function(data, pivotColumns, listNameFunctions,
...){
lapplyFunctionRecurse <- function(cdata, level=1,
2011 Mar 10
2
within group sequential subtraction
Hi Everyone,
I would like to do sequential subtractions within a group so that I know the
time between separate observations for a group of individuals.
My data:
data <- structure(list(group = c("IND1", "IND1", "IND2",
"IND2", "IND2", "IND3", "IND4", "IND5",
"IND6", "IND6"), date_obs =
2012 Mar 20
1
overriding "summary.default" or "summary.data.frame". How?
I suppose everybody who makes a package for the first time thinks "I
can change anything!" and then runs into this same question. Has
anybody written out information on how a package can override
functions in R base in the R 2.14 (mandatory NAMESPACE era)?
Suppose I want to alphabetize variables in a summary.data.frame, or
return the standard deviation with the mean in summary output.
2005 Aug 01
6
converting stata's by syntax to R
I am struggling with migrating some stata code to R. I have a data
frame containing, sometimes, repeat observations (rows) of the same
family. I want to keep only one observation per family, selecting
that observation according to some other variable. An example data
frame is:
# construct example data
fam <- c(1,2,3,3,4,4,4)
wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
keep <- c(1,1,1,0,1,0,0)
2009 Sep 28
2
Data formatting for matplot
Dear List,
I am wanting to produce a multiple line plot, and know I can do it with matplot but can't get my data in the format I need. I have a dataframe with three columns; individuals ID, x, and y. I have tried split() but it gives me a list of matrices, which is closer but not quite what I need. For example:
id<-rep(seq(1,5,1),length.out=100)
x<-rnorm(100,5,1)
2005 May 25
2
weighted.mean and tapply (again)
I read answers to questions including the words "tapply" and
"weighted.mean", but I didn't understand either the problem (data) or the
solution provided.
Here is my question ...
> dat[1:10,]
GROUP VALUE FREQUENCY
1 2 2 78
2 2 3 40
3 2 4 16
4 2 5 3
5 2 6 1
6 2 8 1
7
2011 Aug 17
2
An example of very slow computation
This message is about a curious difference in timing between two ways of computing the
same function. One uses expm, so is expected to be a bit slower, but "a bit" turned out to
be a factor of >1000. The code is below. We would be grateful if anyone can point out any
egregious bad practice in our code, or enlighten us on why one approach is so much slower
than the other. The problem
2010 Aug 26
3
Help with ddply to eliminate a for..loop
I created a small example to show something that I do a lot of. "scale"
data by month and return a data.frame with the output. "id" represents
repeated observations over "time" and I want to scale the "slope"
variable. The "out" variable shows the output I want. My for..loop
does the job but is probably very slow versus other methods. ddply