similar to: Organizing Large Datasets

Displaying 20 results from an estimated 10000 matches similar to: "Organizing Large Datasets"

2010 Feb 20
1
What is your system for WorkFlow and Source Code Organizing in R ?
Hello dear R users, Recently there has been several fascinating threads on the website stackoverflow <http://stackoverflow.com> regarding the subject of R WorkFlow best practices: - What best practices do you use for programming in R?<http://stackoverflow.com/questions/2258092/what-best-practices-do-you-use-for-programming-in-r> - Workflow for statistical analysis and report
2012 Feb 10
1
Need to aggregate large dataset by week...
Hi all, I have a large dataset with ~8600 observations that I want to compress to weekly means. There are 9 variables (columns), and I have already added a "week" column with 51 weeks. I have been looking at the functions: aggregate, tapply, apply, etc. and I am just not savvy enough with R to figure this out on my own, though I'm sure it's fairly easy. I also have the Dates
2001 Mar 20
2
Help with large datasets
I am a new user of R (1.2.2 on Alpha), but have been using S and then Splus heavily for about 10 years now. My problem is this. The data I analyze comprise large sets. Typically I am analyzing 5000 observations on 90 variables over several hundred subjects. Sometimes 500,000 subjects with 200 variables. Unfortunately, although my Alpha has 1.5 gig of ram, R as it is configured seems to be set
2008 Sep 15
1
Help... Organizing multiple spreadsheets data into a huge R data structure!
Hello R users, I am relatively new to the R program, and I hope some of you can offer me some suggestions on how to organize my data in R using some of the more advanced data structuring technique. Here's my scenario: I have date set of 50 participants (each with conditions and demographic data), each participant performed 2x16 trials, for each trial, there was specific information about the
2015 Jun 24
0
Organizing a Pre Astricon road trip
Hi All, I am cross posting this to the Asterisk Users, Biz, and Dev lists at the suggestion of David Duffet, so sorry if you see it multiple times. As this year Astricon is in Orlando, and most of us are tech geeks in on form or another, we are trying to organize a road trip to see NASA's Kennedy Space Center. I have been in touch with their group sales office and was told that there is a
2006 Sep 19
5
Recommendations for organizing hosts into groups?
I have a few different groupings of hosts, and I am wondering what is the best way to organize the node configuration for them. I have a few machines that are VMware hosts, a bunch of VMware guests, and the main admin server which runs a bunch of stuff like puppetmasterd. I''ve got a bunch of classes/*.pp which define configuration for sudo, yum, java, etc. Right now I just have
2015 Jun 25
0
Organizing a Pre Astricon road trip (Eric Klein)
Sorry, apparently I forgot that we are looking at the Monday before Devcon for this trip. > ------------------------------ > > Message: 4 > Date: Wed, 24 Jun 2015 11:59:29 +0300 > From: Eric Klein <eric.klein at greenfieldtech.net> > To: asterisk-users at lists.digium.com > Subject: [asterisk-users] Organizing a Pre Astricon road trip > Message-ID: >
2000 Sep 28
2
organizing work; dump function
> Dear R-users! > > I am using R 1.0.0 and Windows NT 4.0 > In the past I have used several different working directories for different projects, and during many of these projects I have written some functions for particular purposes. Now I thought I would be nice to have all these "personal" functions collected in one place, and to make them available in R no matter which
2002 Apr 29
3
Organizing the help files in a package
Dear all!! I am using R1.4.1 on windows 98. I had been trying to organize the package and has already been able to document some of the functions in to .Rd (R documentation) files. From these .Rd files I generated plain text files as well as html files. I have also given the 00Index file in each of the directories: html/ help/ data/ man/ Problem: I don't get the help using comand
2000 Sep 29
1
SUMMARY and follow up question: organizing work; dump function
Thanks very much to Friedrich Leisch, Brian Ripley and Thomas Lumley for their useful answers. Basically, they all suggested creation of a personal package as the easiest and best way to hold ones own functions. This indeed is quite easy and very useful. In addition, Friedrich Leisch pointed out that he's planning to add an "append" argument to the dump() function. One follow up
2011 Jun 07
1
Packages for R-CRAN (organizing aspects)
Hello, I have some ideas for packages that I want to provide on R-CRAN. One package alreads is working, but I have some warnings in when compiling. Also I don't know if the libraries in use are only working on Unix/Linux. So I have some general questions: - If I'm not sure if the code would also work on windows (needing some ceratain libraries or tools), would it be better to
2005 Apr 24
1
large dataset import, aggregation and reshape
Dear useRs We have a data-set (comma delimited) with 12Millions of rows, and 5 columns (in fact many more, but we need only 4 of them): id, factor 'a' (5 levels), factor 'b' (15 levels), date-stamp, numeric measurement. We run R on suse-linux 9.1 with 2GB RAM, (and a 3.5GB swap file). on average we have 30 obs. per id. We want to aggregate (eg. sum of the measuresments under
2008 Jan 24
3
How should I organize data to compare differences in matched pairs?
I'm just learning how to use R right now, so I'm not sure what the most efficient way to organize these data is. I had subjects perform the same task twice with slight changes between the rounds. I want to analyze differences between the rounds. All of the subjects also answered a questionnaire. Putting all of one subject's information on one row seems sloppy. I was thinking about
2006 May 18
1
Patch 5091
Hi, I submitted a patch (for the first time) a couple of days ago (http://dev.rubyonrails.org/ticket/5091 - the decsription is included below). I''ve noticed some posts to rails-core notifying the world of patches, since there''s been no activity on said patch I thought I''d do the same. Please let me know if this breaks protocol. Cheers, Ian White --- patch summary
2006 Jul 29
0
SOAP for large datasets
I''ve been playing around with a soap interface to an application that can return large datasets (up to 50mb or so). There are also some nested structures for which I''ve used ActionWebService::Struct with 2-3 nested members of oher ActionWebService::Struct members. In addition to chewing up a ton of memory, cpu ulilization isn''t that great either. My development
2013 Nov 30
1
bnlearn and very large datasets (> 1 million observations)
Hi Anyone have experience with very large datasets and the Bayesian Network package, bnlearn? In my experience R doesn't react well to very large datasets. Is there a way to divide up the dataset into pieces and incrementally learn the network with the pieces? This would also be helpful incase R crashes, because I could save the network after learning each piece. Thank you.
2010 Jul 16
1
Question about KLdiv and large datasets
Hi all, when running KL on a small data set, everything is fine: require("flexmix") n <- 20 a <- rnorm(n) b <- rnorm(n) mydata <- cbind(a,b) KLdiv(mydata) however, when this dataset increases require("flexmix") n <- 10000000 a <- rnorm(n) b <- rnorm(n) mydata <- cbind(a,b) KLdiv(mydata) KL seems to be not defined. Can somebody explain what is going
2011 Jul 20
0
Competing risk regression with CRR slow on large datasets?
Hi, I posted this question on stats.stackexchange.com 3 days ago but the answer didn't really address my question concerning the speed in competing risk regression. I hope you don't mind me asking it in this forum: I?m doing a registry based study with almost 200 000 observations and I want to perform a competing risk analysis. My problem is that the crr() in the cmprsk package is
2013 Jan 19
2
importing large datasets in R
Hi Everyone, I am a little new to R and the first problem I am facing is the dilemma whether R is suitable for files of size 2 GB's and slightly more then 2 Million rows. When I try importing the data using read.table, it seems to take forever and I have to cancel the command. Are there any special techniques or methods which i can use or some tricks of the game that I should keep in mind in
2009 Feb 26
0
glm with large datasets
Hi all, I have to run a logit regresion over a large dataset and I am not sure about the best option to do it. The dataset is about 200000x2000 and R runs out of memory when creating it. After going over help archives and the mailing lists, I think there are two main options, though I am not sure about which one will be better. Of course, any alternative will be welcome as well. Actually, I