thr3ads.net - similar to: "Efficient Merging of two huge sorted data frames?---Use merge()?"

Displaying 20 results from an estimated 12000 matches similar to: "Efficient Merging of two huge sorted data frames?---Use merge()?"

GAM without intercept reports a huge deviance

2012 Jan 16

GAM without intercept reports a huge deviance

Hi all, I constructed a GAM model with a linear term and two smooth terms, all of them statistically significant but the intercept was not significant. The adjusted r2 of this model is 0.572 and the deviance 65.3. I decided to run the model again without intercept, so I used in R the following instruction: regression= gam(dependent~ +linear_independent +s(smooth_independent_1)

Efficient ways of merging data frames

2010 Sep 12

Efficient ways of merging data frames

Hi all, I am just wondering if there is a more efficient way of merging two large datasets based on the values of multiple columns, some of which are not numerical. The default merge function in dataframe is very inefficient and the merge function in data.table seems to be faster, but it does not seem to allow keys that are not numerical in nature. Any other suggestion? Thanks a lot!

Concatenating data frame

2006 May 10

Concatenating data frame

Hello, I have searched through the R-help archive and find that the easiest way to concatenate data records in a dataframe is to use rbind() I know we can do that using rbind, but it is slow when we are doing rbind thousands of times to a growing list, each time adding one or two records to the ever growing existing data because in existingRecords<-rbind(existingRecords,

Efficient Package for Huge datasets in R

2017 Sep 28

Efficient Package for Huge datasets in R

Dear Sir/Madam, I have a large data set of 10,17,289 observations of 10,830 variables. I need to use PCA to reduce the dimension of dataset. I have already tried irlba, prcomp and nsprcomp packages in R but couldn't do for huge data sets. i.e pc <- prcomp_irlba(sparseYY[1:5000,], n=50, retx = TRUE, center = TRUE, scale. = FALSE) able to get only few PCs for 5000 rows only so can you

Any interest in "merge" and "by" implementations specifically for sorted data?

2006 Jul 27

Any interest in "merge" and "by" implementations specifically for sorted data?

Hi Developers, I am looking for another new project to help me get more up to speed on R and to learn something outside of R internals. One recent R issue I have run into is finding a fast implementations of the equivalent to the following SAS code: /* MDPC is an integer sort key made from two integer columns */ MDPC = (MD * 100000) + PCO; /* sort the dataset by the key */ PROC SORT;

insert() function

2008 Feb 20

insert() function

Hello, I am trying to insert a certain number of points into a certain position of a vector with this code: x <- seq(1:10909) x1 <- c(13112-10909) spect1 <- rnorm(13112) interpol <- approx(x,spect1,xout=c(seq(from=1, by=((10909 - 1)/(x1 - 1)), length.out=x1))) pos <- round(interpol$x,0) intensities <- interpol$y spect2 <- insert(spect1,ats=pos,values=intensities)

Merge data frames but prefer values in one

2009 Sep 10

Merge data frames but prefer values in one

Hello everyone, My problem is better explained with an example: > x=data.frame(a=1:4,b=1:4,c=rnorm(4)) > x a b c 1 1 1 -0.8821089 2 2 2 -0.7082583 3 3 3 -0.5948835 4 4 4 -1.8571443 > y=data.frame(a=c(1,3),b=3,c=rnorm(2)) > y a b c 1 1 3 -0.273155973 2 3 3 0.009517862 Now I want to merge x and y by columns a and b, hence creating a data.frame with all

Efficient way to do a merge in R

2011 Oct 03

Efficient way to do a merge in R

Dear all, I am new in R and I have been faced with the following problem, that slows me down a lot. I am short of ideas to circumvent it. So, any help would be highly appreciated: I have 2 dataframes x and y. x is very big (70 million observations), whereas y is smaller (300000 observations). All the observations of y are present in x. But y has one additional variable that I would like to

cor() alternative for huge data set

2010 Sep 29

cor() alternative for huge data set

Hi, I am have a data set of around 43000 probes(rows), and have to calculate correlation matrix. When I run cor function in R, its throwing an error message of RAM shortage which was obvious for such huge number of rows. I am not getting a logical way to cut off this huge number of entities, is there an alternative to pearson correlation or with other dist() methods calculation(euclidean) that

Best way of merging mbox files

2018 Nov 29

Best way of merging mbox files

aside from cat? On Thu, Nov 29, 2018 at 03:07:58PM -0800, Joseph Tam wrote: > On Thu, 29 Nov 2018, Marc Roos wrote: > > >When concatenating mbox files like described here > >https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end > >up with an 'unsorted' mbox file. Is this going to be a problem > >esspecially when they are large >2GB's

Best way of merging mbox files

2018 Nov 29

Best way of merging mbox files

On Thu, 29 Nov 2018, Marc Roos wrote: > When concatenating mbox files like described here > https://xaizek.github.io/2013-03-30/merge-mbox-mailboxes/. You will end > up with an 'unsorted' mbox file. Is this going to be a problem > esspecially when they are large >2GB's and new emails will be written to > it? I don't think it will be a problem, but you might have

Merging data in arrays

2013 Feb 07

Merging data in arrays

Dear All, Here is a hypothetical sample (sorry for the clumsy code): A1 <- matrix(1:5, nrow=5, ncol=1) A2 <- matrix(6:10, nrow=5, ncol=1) A3 <- matrix(11:15, nrow=5, ncol=1) A4 <- matrix(16:20, nrow=5, ncol=1) A5 <- matrix(21:25, nrow=5, ncol=1) A6 <- matrix(26:30, nrow=5, ncol=1) B1 <- matrix(c(A1, A2, A3), nrow=5, ncol=3) B2 <- matrix(c(A2, A3, A4), nrow=5, ncol=3) B3

Calculation of normalised red and green intensities

2004 Jan 22

Calculation of normalised red and green intensities

Dear Sir/Madam, I could succesfully normalise my microarray data using marrayNorm package. However, i have not been able to get normalised red and green channel intensities through R package. Is there a possibility to write a formula to calculate back the red and green channel intensities after normalisation of the data. Do I need to incorporate this formula in my R script? I am biologist

convert decimals to fractions - sorted

2006 Jul 25

convert decimals to fractions - sorted

Dear all, Based on my question a few months ago https://stat.ethz.ch/pipermail/r-help/2006-January/086952.html and solved with https://stat.ethz.ch/pipermail/r-help/2006-January/086955.html https://stat.ethz.ch/pipermail/r-help/2006-January/086956.html and from https://stat.ethz.ch/pipermail/r-help/2006-January/086958.html frac.fun <- function(x, den){ dec <- seq(0, den) / den nams

Help with Affymetrix data

2004 Oct 04

Help with Affymetrix data

I have CEL files from Affymetrix Mouse Array 430_2 and am trying to get the the individual PM intensities (11 per gene) for each sample. I would like to write out this into a tab delimited text file. Where am I stalling? This is what I've done: Change dir(to where CEL files are saved) Data <- ReadAffy() eset <- rma(Data) write.exprs(eset, file="mydata.txt") With this I am

Merging and sorting multiple data.frame

2003 Aug 15

Merging and sorting multiple data.frame

Dear R help, I'm pretty new to R and would be grateful for help. I have 11 data.frames, each with 3 columns of data. Each has the same row.names, however these are not sorted. Please tell me the best way to sort these (by row.names) and secondly the best way to extract data columns from these to form a merged table. Thanks a million Aedin

What is the most efficient way to split a table into 2 groups?

2009 Sep 28

What is the most efficient way to split a table into 2 groups?

I have the following: @lot = Lot.find(params[:id]) part_nums = Part.all(:conditions => ["id <> ?", @lot.part.id]) I guess I should mention that Lot :belongs_to => :part I was looking at the log following the execution of these two statements and I saw something like this: Lot Load (0.4ms) SELECT * FROM "lots" WHERE ("lots"."id" = 13) Part

Deprecated component in plot.histogram (PR#2696)

2003 Mar 27

Deprecated component in plot.histogram (PR#2696)

Hi everyone, As "intensities" is deprecated as a synonym for "density" in the output of hist it would be a good idea to replace its occurence with "density" in plot.histogram, where we currently have y <- if (freq) x$counts else x$intensities Note that, to add to the confusion, x$density is also referred to in plot.histogram. Cheers, Jonathan.

sorting in 'merge'

2008 Jan 21

sorting in 'merge'

Hello everyone, I've been advised to use merge to extract information from two data.frames with a number of common columns, but I cannot get a grasp on how it sorts the result. With sort=FALSE, I would expect it to give the result back sorted exactly as the input was but it seems it is not always the case, especially when there are repeats in the input. For example: > a =

Sort problem with merge (again)

2006 Sep 26

Sort problem with merge (again)

# R version 2.3.1 (2006-06-01) Debian Linux "testing" # Is the following behaviour a bug, feature or just a lack of # understanding on my part? I see that this was discussed here # last March with no apparent resolution. d <- as.factor(c("1970-04-04","1970-08-11","1970-10-18")) x <- c(9,10,11) ch <- data.frame(Date=d,X=x) d <-

similar to: Efficient Merging of two huge sorted data frames?---Use merge()?