thr3ads.net - similar to: "Where can I find information on how to subsample a time series?"

Displaying 20 results from an estimated 700 matches similar to: "Where can I find information on how to subsample a time series?"

beginner's guide to C++ programming with R packages?

2009 Jun 26

beginner's guide to C++ programming with R packages?

Hello, again. I'm interested to learn how programmers develop & test C/C++ code with R packages in Linux. I've been reading R source and the manual on Writing R Extensions but there are just a couple of details I can't understand. I wish I could watch over a developer's shoulder to see how people actually do this. I've tested a bit. I am able to take package.tar.gz

Memory management issues

2009 Jul 05

Memory management issues

Hi everybody, I have been interfacing some C++ library code into an R package but ran into optimization issues specific to memory management that require some insight into the GC. One of the C++ libraries returns simple vectors of integers, doubles and complex which are allocated and managed from the library itself. I cannot know the length of the array beforehand, so I cannot pre-allocate that

exercise in frustration: applying a function to subsamples

2010 Jul 12

exercise in frustration: applying a function to subsamples

>From the documentation I have found, it seems that one of the functions from package plyr, or a combination of functions like split and lapply would allow me to have a really short R script to analyze all my data (I have reduced it to a couple hundred thousand records with about half a dozen records. I get the same result from ddply and split/lapply: >

Query about using timestamps returned by SQL as 'factor' for split

2010 Jul 08

Query about using timestamps returned by SQL as 'factor' for split

I have a simple query as follows: "SELECT m_id,sale_date,YEAR(sale_date),WEEK(sale_date),return_type,DATEDIFF(return_date,sale_date) AS elapsed_time FROM risk_input" I can get, and view, all the data that that query returns. The question is, sale_date is a timestamp, and I need to call split to group this data by m_id and the week in which the sale occurred. Obviously, I would

any suggestions to deal with 'Argument list too long' for a R CMD check?

2008 Dec 09

any suggestions to deal with 'Argument list too long' for a R CMD check?

Since, gcc was using upwards of 2gb of ram to compile my package, I just split all the functions into individual files. I guess I'm too clever for myself, because now I get hit with the "Argument list too long" error. Is there a way to deal with this aside from writing my own configure script (which could possibly feed the gcc commands one by one). -Whit RHEL 5 [whit at

repeat resampling with different subsample sizes

2013 Jan 18

repeat resampling with different subsample sizes

Hi, I'm trying to write a code (see below) to randomly resample measurements of one variable (say here the variable "counts" in the data frame "dat") with different resampled subsample sizes. The code works fine for a single resampled subsample size (in the code below = 10). I then tried to generalize this by writing a function with a loop, where in each loop the function

analyze subsample of dataframe

2008 Sep 16

analyze subsample of dataframe

Hi there, I'm dealing with a pretty big dataset (~22,000 entries) with numerous entries for every day over a period of several years. I have a column "judy" (for Julian Day) with 0 beginning on Jan. 1st of every new year (I want to compare tendencies between years). However, in order to control for a leap year (2004), I simply need to subtract 1 from every judy value for the year

Subsample points for mclust

2009 Jul 21

Subsample points for mclust

Hi all! I have an ordered vector of values. The distribution of these values can be modeled by a sum of Gaussians. So I'm using the package 'mclust' to get the Gaussians's parameters for this 1D distribution. It works very well, but, for input sizes above 100.000 values it starts taking really forever. Unfortunately my dataset has around 4.6M values... My question: is it

Size of subsample in ecodist mantel()

2012 Jun 28

Size of subsample in ecodist mantel()

What is the size of the boostrapped subsample in ecodist mantel() thanks [[alternative HTML version deleted]]

Big Data reading subsample csv

2012 Aug 16

Big Data reading subsample csv

Hello, I'm most grateful for your time to read this. I have a uber size 30GB file of 6 million records and 3000 (mostly categorical data) columns in csv format. I want to bootstrap subsamples for multinomial regression, but it's proving difficult even with my 64GB RAM in my machine and twice that swap file , the process becomes super slow and halts. I'm thinking about generating

Selecting a subsample so that it follows a distribution.

2011 Mar 02

Selecting a subsample so that it follows a distribution.

Hi All, I want to select rows at random from a large data.frame while achieving a particular distribution defined my a given subset of this data.frame. How can I do this? More details and what I've done so far is given below. I have gene expression data and gene sets of interest. In order to look at enrichment of differential expression I'm doing a simple permutation approach: Selecting

Random selection from a subsample

2010 Dec 19

Random selection from a subsample

Dear Mailing List I have a data set (data4) consisting of a number of factors and a response variable. I wish to randomly sample from a combination of two of those factors (GIS_station and Distance_code2) and return a new dataframe containing the original data structure (i.e. all the columns) but only containing the randomly selected rows. The number of rows in each combination of GIS_station

Sample of a subsample

2017 Sep 25

Sample of a subsample

For personal aesthetic reasons, I changed the name "data" to "dat". Your code, with a slight modification: set.seed (1357) ## for reproducibility dat <- data.frame(var1=seq(1:40), var2=seq(40,1)) dat$sampleNo <- 0 idx <- sample(seq(1,nrow(dat)), size=10, replace=F) dat[idx,"sampleNo"] <-1 ## yielding > dat var1 var2 sampleNo 1 1 40

Sample of a subsample

2017 Sep 25

Sample of a subsample

Hello everybody! I have the following problem: I'd like to select a sample from a subsample in a dataset. Actually, I don't want to select it, but to create a new variable sampleNo that indicates to which sample (one or two) a case belongs to. Lets suppose I have a dataset containing 40 cases: data <- data.frame(var1=seq(1:40), var2=seq(40,1)) The first sample (n=10) I drew like

Tiered volume performance degrades badly after a volume stop/start or system restart.

2018 Jan 31

Tiered volume performance degrades badly after a volume stop/start or system restart.

Tested it in two different environments lately with exactly same results. Was trying to get better read performance from local mounts with hundreds of thousands maildir email files by using SSD, hoping that .gluster file stat read will improve which does migrate to hot tire. After seeing what you described for 24 hours and confirming all move around on the tires is done - killed it. Here are my

Tiered volume performance degrades badly after a volume stop/start or system restart.

2018 Jan 30

Tiered volume performance degrades badly after a volume stop/start or system restart.

I am fighting this issue: Bug 1540376 ? Tiered volume performance degrades badly after a volume stop/start or system restart. https://bugzilla.redhat.com/show_bug.cgi?id=1540376 Does anyone have any ideas on what might be causing this, and what a fix or work-around might be? Thanks! ~ Jeff Byers ~ Tiered volume performance degrades badly after a volume stop/start or system restart. The

Sample of a subsample

2017 Sep 25

Sample of a subsample

Hi David, I was about to post a reply when Bert responded. His answer is good and his comment to use the name 'dat' rather than 'data' is instructive. I am providing my suggestion as well because I think it may address what was causing you some confusion (mainly to use "which", but also the missing !) idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ),

Tiered volume performance degrades badly after a volume stop/start or system restart.

2018 Feb 01

Tiered volume performance degrades badly after a volume stop/start or system restart.

This problem appears to be related to the sqlite3 DB files that are used for the tiering file access counters, stored on each hot and cold tier brick in .glusterfs/<volname>.db. When the tier is first created, these DB files do not exist, they are created, and everything works fine. On a stop/start or service restart, the .db files are already present, albeit empty since I don't have

Date, date, POSIX question

2006 Nov 05

Date, date, POSIX question

I have been working with R extensively for several months. I switched from SAS and Matlab to R. My question is Can anyone explain the benefits and detractions of the 'Date' package verses the 'date' package and verses 'POSIX' dates. I have noticed several other packages use one or the other. Rmetrics seems to standardize on POSIX. I can only see differences in

how to subsample all possible combinations of n species taken 1:n at a time?

2009 Apr 06

how to subsample all possible combinations of n species taken 1:n at a time?

Hello I apologise for the length of this entry but please bear with me. In short: I need a way of subsampling communities from all possible communities of n taxa taken 1:n at a time without having to calculate all possible combinations (because this gives me a memory error - using combn() or expand.grid() at least). Does anyone know of a function? Or can you help me edit the combn or

similar to: Where can I find information on how to subsample a time series?