similar to: Where can I find information on how to subsample a time series?

Displaying 20 results from an estimated 700 matches similar to: "Where can I find information on how to subsample a time series?"

2009 Jun 26
3
beginner's guide to C++ programming with R packages?
Hello, again. I'm interested to learn how programmers develop & test C/C++ code with R packages in Linux. I've been reading R source and the manual on Writing R Extensions but there are just a couple of details I can't understand. I wish I could watch over a developer's shoulder to see how people actually do this. I've tested a bit. I am able to take package.tar.gz
2009 Jul 05
3
Memory management issues
Hi everybody, I have been interfacing some C++ library code into an R package but ran into optimization issues specific to memory management that require some insight into the GC. One of the C++ libraries returns simple vectors of integers, doubles and complex which are allocated and managed from the library itself. I cannot know the length of the array beforehand, so I cannot pre-allocate that
2010 Jul 12
2
exercise in frustration: applying a function to subsamples
>From the documentation I have found, it seems that one of the functions from package plyr, or a combination of functions like split and lapply would allow me to have a really short R script to analyze all my data (I have reduced it to a couple hundred thousand records with about half a dozen records. I get the same result from ddply and split/lapply: >
2010 Jul 08
1
Query about using timestamps returned by SQL as 'factor' for split
I have a simple query as follows: "SELECT m_id,sale_date,YEAR(sale_date),WEEK(sale_date),return_type,DATEDIFF(return_date,sale_date) AS elapsed_time FROM risk_input" I can get, and view, all the data that that query returns. The question is, sale_date is a timestamp, and I need to call split to group this data by m_id and the week in which the sale occurred. Obviously, I would
2008 Dec 09
1
any suggestions to deal with 'Argument list too long' for a R CMD check?
Since, gcc was using upwards of 2gb of ram to compile my package, I just split all the functions into individual files. I guess I'm too clever for myself, because now I get hit with the "Argument list too long" error. Is there a way to deal with this aside from writing my own configure script (which could possibly feed the gcc commands one by one). -Whit RHEL 5 [whit at
2013 Jan 18
0
repeat resampling with different subsample sizes
Hi, I'm trying to write a code (see below) to randomly resample measurements of one variable (say here the variable "counts" in the data frame "dat") with different resampled subsample sizes. The code works fine for a single resampled subsample size (in the code below = 10). I then tried to generalize this by writing a function with a loop, where in each loop the function
2008 Sep 16
1
analyze subsample of dataframe
Hi there, I'm dealing with a pretty big dataset (~22,000 entries) with numerous entries for every day over a period of several years. I have a column "judy" (for Julian Day) with 0 beginning on Jan. 1st of every new year (I want to compare tendencies between years). However, in order to control for a leap year (2004), I simply need to subtract 1 from every judy value for the year
2009 Jul 21
1
Subsample points for mclust
Hi all! I have an ordered vector of values. The distribution of these values can be modeled by a sum of Gaussians. So I'm using the package 'mclust' to get the Gaussians's parameters for this 1D distribution. It works very well, but, for input sizes above 100.000 values it starts taking really forever. Unfortunately my dataset has around 4.6M values... My question: is it
2012 Jun 28
2
Size of subsample in ecodist mantel()
What is the size of the boostrapped subsample in ecodist mantel() thanks [[alternative HTML version deleted]]
2012 Aug 16
1
Big Data reading subsample csv
Hello, I'm most grateful for your time to read this. I have a uber size 30GB file of 6 million records and 3000 (mostly categorical data) columns in csv format. I want to bootstrap subsamples for multinomial regression, but it's proving difficult even with my 64GB RAM in my machine and twice that swap file , the process becomes super slow and halts. I'm thinking about generating
2011 Mar 02
0
Selecting a subsample so that it follows a distribution.
Hi All, I want to select rows at random from a large data.frame while achieving a particular distribution defined my a given subset of this data.frame. How can I do this? More details and what I've done so far is given below. I have gene expression data and gene sets of interest. In order to look at enrichment of differential expression I'm doing a simple permutation approach: Selecting
2010 Dec 19
1
Random selection from a subsample
Dear Mailing List I have a data set (data4) consisting of a number of factors and a response variable. I wish to randomly sample from a combination of two of those factors (GIS_station and Distance_code2) and return a new dataframe containing the original data structure (i.e. all the columns) but only containing the randomly selected rows. The number of rows in each combination of GIS_station
2017 Sep 25
0
Sample of a subsample
For personal aesthetic reasons, I changed the name "data" to "dat". Your code, with a slight modification: set.seed (1357) ## for reproducibility dat <- data.frame(var1=seq(1:40), var2=seq(40,1)) dat$sampleNo <- 0 idx <- sample(seq(1,nrow(dat)), size=10, replace=F) dat[idx,"sampleNo"] <-1 ## yielding > dat var1 var2 sampleNo 1 1 40
2017 Sep 25
2
Sample of a subsample
Hello everybody! I have the following problem: I'd like to select a sample from a subsample in a dataset. Actually, I don't want to select it, but to create a new variable sampleNo that indicates to which sample (one or two) a case belongs to. Lets suppose I have a dataset containing 40 cases: data <- data.frame(var1=seq(1:40), var2=seq(40,1)) The first sample (n=10) I drew like
2018 Jan 31
1
Tiered volume performance degrades badly after a volume stop/start or system restart.
Tested it in two different environments lately with exactly same results. Was trying to get better read performance from local mounts with hundreds of thousands maildir email files by using SSD, hoping that .gluster file stat read will improve which does migrate to hot tire. After seeing what you described for 24 hours and confirming all move around on the tires is done - killed it. Here are my
2018 Jan 30
2
Tiered volume performance degrades badly after a volume stop/start or system restart.
I am fighting this issue: Bug 1540376 ? Tiered volume performance degrades badly after a volume stop/start or system restart. https://bugzilla.redhat.com/show_bug.cgi?id=1540376 Does anyone have any ideas on what might be causing this, and what a fix or work-around might be? Thanks! ~ Jeff Byers ~ Tiered volume performance degrades badly after a volume stop/start or system restart. The
2017 Sep 25
1
Sample of a subsample
Hi David, I was about to post a reply when Bert responded. His answer is good and his comment to use the name 'dat' rather than 'data' is instructive. I am providing my suggestion as well because I think it may address what was causing you some confusion (mainly to use "which", but also the missing !) idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ),
2018 Feb 01
0
Tiered volume performance degrades badly after a volume stop/start or system restart.
This problem appears to be related to the sqlite3 DB files that are used for the tiering file access counters, stored on each hot and cold tier brick in .glusterfs/<volname>.db. When the tier is first created, these DB files do not exist, they are created, and everything works fine. On a stop/start or service restart, the .db files are already present, albeit empty since I don't have
2006 Nov 05
2
Date, date, POSIX question
I have been working with R extensively for several months. I switched from SAS and Matlab to R. My question is Can anyone explain the benefits and detractions of the 'Date' package verses the 'date' package and verses 'POSIX' dates. I have noticed several other packages use one or the other. Rmetrics seems to standardize on POSIX. I can only see differences in
2009 Apr 06
3
how to subsample all possible combinations of n species taken 1:n at a time?
Hello I apologise for the length of this entry but please bear with me. In short: I need a way of subsampling communities from all possible communities of n taxa taken 1:n at a time without having to calculate all possible combinations (because this gives me a memory error - using combn() or expand.grid() at least). Does anyone know of a function? Or can you help me edit the combn or