similar to: Efficient Merging of two huge sorted data frames?---Use merge()?

Displaying 20 results from an estimated 12000 matches similar to: "Efficient Merging of two huge sorted data frames?---Use merge()?"

2012 Jan 16
1
GAM without intercept reports a huge deviance
Hi all, I constructed a GAM model with a linear term and two smooth terms, all of them statistically significant but the intercept was not significant. The adjusted r2 of this model is 0.572 and the deviance 65.3. I decided to run the model again without intercept, so I used in R the following instruction: regression= gam(dependent~ +linear_independent +s(smooth_independent_1)
2010 Sep 12
2
Efficient ways of merging data frames
Hi all, I am just wondering if there is a more efficient way of merging two large datasets based on the values of multiple columns, some of which are not numerical. The default merge function in dataframe is very inefficient and the merge function in data.table seems to be faster, but it does not seem to allow keys that are not numerical in nature. Any other suggestion? Thanks a lot!
2006 May 10
2
Concatenating data frame
Hello, I have searched through the R-help archive and find that the easiest way to concatenate data records in a dataframe is to use rbind() I know we can do that using rbind, but it is slow when we are doing rbind thousands of times to a growing list, each time adding one or two records to the ever growing existing data because in existingRecords<-rbind(existingRecords,
2017 Sep 28
0
Efficient Package for Huge datasets in R
Dear Sir/Madam, I have a large data set of 10,17,289 observations of 10,830 variables. I need to use PCA to reduce the dimension of dataset. I have already tried irlba, prcomp and nsprcomp packages in R but couldn't do for huge data sets. i.e pc <- prcomp_irlba(sparseYY[1:5000,], n=50, retx = TRUE, center = TRUE, scale. = FALSE) able to get only few PCs for 5000 rows only so can you
2006 Jul 27
6
Any interest in "merge" and "by" implementations specifically for sorted data?
Hi Developers, I am looking for another new project to help me get more up to speed on R and to learn something outside of R internals. One recent R issue I have run into is finding a fast implementations of the equivalent to the following SAS code: /* MDPC is an integer sort key made from two integer columns */ MDPC = (MD * 100000) + PCO; /* sort the dataset by the key */ PROC SORT;
2008 Feb 20
1
insert() function
Hello, I am trying to insert a certain number of points into a certain position of a vector with this code: x <- seq(1:10909) x1 <- c(13112-10909) spect1 <- rnorm(13112) interpol <- approx(x,spect1,xout=c(seq(from=1, by=((10909 - 1)/(x1 - 1)), length.out=x1))) pos <- round(interpol$x,0) intensities <- interpol$y spect2 <- insert(spect1,ats=pos,values=intensities)
2009 Sep 10
2
Merge data frames but prefer values in one
Hello everyone, My problem is better explained with an example: > x=data.frame(a=1:4,b=1:4,c=rnorm(4)) > x a b c 1 1 1 -0.8821089 2 2 2 -0.7082583 3 3 3 -0.5948835 4 4 4 -1.8571443 > y=data.frame(a=c(1,3),b=3,c=rnorm(2)) > y a b c 1 1 3 -0.273155973 2 3 3 0.009517862 Now I want to merge x and y by columns a and b, hence creating a data.frame with all
2011 Oct 03
1
Efficient way to do a merge in R
Dear all, I am new in R and I have been faced with the following problem, that slows me down a lot. I am short of ideas to circumvent it. So, any help would be highly appreciated: I have 2 dataframes x and y. x is very big (70 million observations), whereas y is smaller (300000 observations). All the observations of y are present in x. But y has one additional variable that I would like to
2010 Sep 29
1
cor() alternative for huge data set
Hi, I am have a data set of around 43000 probes(rows), and have to calculate correlation matrix. When I run cor function in R, its throwing an error message of RAM shortage which was obvious for such huge number of rows. I am not getting a logical way to cut off this huge number of entities, is there an alternative to pearson correlation or with other dist() methods calculation(euclidean) that
2018 Nov 29
1
Best way of merging mbox files
aside from cat? On Thu, Nov 29, 2018 at 03:07:58PM -0800, Joseph Tam wrote: > On Thu, 29 Nov 2018, Marc Roos wrote: > > >When concatenating mbox files like described here > >https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end > >up with an 'unsorted' mbox file. Is this going to be a problem > >esspecially when they are large >2GB's
2018 Nov 29
0
Best way of merging mbox files
On Thu, 29 Nov 2018, Marc Roos wrote: > When concatenating mbox files like described here > https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end > up with an 'unsorted' mbox file. Is this going to be a problem > esspecially when they are large >2GB's and new emails will be written to > it? I don't think it will be a problem, but you might have
2013 Feb 07
1
Merging data in arrays
Dear All, Here is a hypothetical sample (sorry for the clumsy code): A1 <- matrix(1:5, nrow=5, ncol=1) A2 <- matrix(6:10, nrow=5, ncol=1) A3 <- matrix(11:15, nrow=5, ncol=1) A4 <- matrix(16:20, nrow=5, ncol=1) A5 <- matrix(21:25, nrow=5, ncol=1) A6 <- matrix(26:30, nrow=5, ncol=1) B1 <- matrix(c(A1, A2, A3), nrow=5, ncol=3) B2 <- matrix(c(A2, A3, A4), nrow=5, ncol=3) B3
2004 Jan 22
1
Calculation of normalised red and green intensities
Dear Sir/Madam, I could succesfully normalise my microarray data using marrayNorm package. However, i have not been able to get normalised red and green channel intensities through R package. Is there a possibility to write a formula to calculate back the red and green channel intensities after normalisation of the data. Do I need to incorporate this formula in my R script? I am biologist
2006 Jul 25
2
convert decimals to fractions - sorted
Dear all, Based on my question a few months ago https://stat.ethz.ch/pipermail/r-help/2006-January/086952.html and solved with https://stat.ethz.ch/pipermail/r-help/2006-January/086955.html https://stat.ethz.ch/pipermail/r-help/2006-January/086956.html and from https://stat.ethz.ch/pipermail/r-help/2006-January/086958.html frac.fun <- function(x, den){ dec <- seq(0, den) / den nams
2004 Oct 04
2
Help with Affymetrix data
I have CEL files from Affymetrix Mouse Array 430_2 and am trying to get the the individual PM intensities (11 per gene) for each sample. I would like to write out this into a tab delimited text file. Where am I stalling? This is what I've done: Change dir(to where CEL files are saved) Data <- ReadAffy() eset <- rma(Data) write.exprs(eset, file="mydata.txt") With this I am
2003 Aug 15
1
Merging and sorting multiple data.frame
Dear R help, I'm pretty new to R and would be grateful for help. I have 11 data.frames, each with 3 columns of data. Each has the same row.names, however these are not sorted. Please tell me the best way to sort these (by row.names) and secondly the best way to extract data columns from these to form a merged table. Thanks a million Aedin
2009 Sep 28
6
What is the most efficient way to split a table into 2 groups?
I have the following: @lot = Lot.find(params[:id]) part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id]) I guess I should mention that Lot :belongs_to => :part I was looking at the log following the execution of these two statements and I saw something like this: Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) Part
2003 Mar 27
1
Deprecated component in plot.histogram (PR#2696)
Hi everyone, As "intensities" is deprecated as a synonym for "density" in the output of hist it would be a good idea to replace its occurence with "density" in plot.histogram, where we currently have y <- if (freq) x$counts else x$intensities Note that, to add to the confusion, x$density is also referred to in plot.histogram. Cheers, Jonathan.
2008 Jan 21
1
sorting in 'merge'
Hello everyone, I've been advised to use merge to extract information from two data.frames with a number of common columns, but I cannot get a grasp on how it sorts the result. With sort=FALSE, I would expect it to give the result back sorted exactly as the input was but it seems it is not always the case, especially when there are repeats in the input. For example: > a =
2006 Sep 26
2
Sort problem with merge (again)
# R version 2.3.1 (2006-06-01) Debian Linux "testing" # Is the following behaviour a bug, feature or just a lack of # understanding on my part? I see that this was discussed here # last March with no apparent resolution. d <- as.factor(c("1970-04-04","1970-08-11","1970-10-18")) x <- c(9,10,11) ch <- data.frame(Date=d,X=x) d <-