thr3ads.net - similar to: "Subsetting data systematically"

Displaying 20 results from an estimated 4000 matches similar to: "Subsetting data systematically"

identify selected substances across individuals

2007 Jan 21

identify selected substances across individuals

An embedded and charset-unspecified text was scrubbed... Name: inte tillg?nglig Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070121/436ed377/attachment.pl

Na/NaN error in subsampling script

2003 Feb 12

Na/NaN error in subsampling script

R-help readers, I''m having a problem with an R script (see below), which regularly generates the error message, Error in start:(start + (sample.length - 1)) : NA/NaN argument , for which I am unsure of the cause. In essence, the script (below) generates the start and end points for random subsamples from along a vector (in reality a transect (of a given length,

Query about extracting subsets from a table

2007 Jan 23

Query about extracting subsets from a table

Hi I am trying to process tabular data as follows: Data in the input file is of the form genome1 genome2 tree-dist log10escore Genome1 and genome2 are alphabetic. Tree-dist and log10escore are numeric. I wish to extract only those rows from this table where the log10escore is less than -3. data <-read.table(filename); data$log10escore = data$log10escore[ data$log10escore < -3]; I

Systematically biased count data regression model

2007 Aug 09

Systematically biased count data regression model

Dear all, I am attempting to explain patterns of arthropod family richness (count data) using a regression model. It seems to be able to do a pretty good job as an explanatory model (i.e. demonstrating relationships between dependent and independent variables), but it has systematic problems as a predictive model: It is biased high at low observed values of family richness and biased low at

Openblas?

2020 Jul 15

Openblas?

Hello, I thought that I should try openblas when building a CRAN package containing lots of old (twentieth century) C-code with frequent calls to blas and lapack routines. I have the following options on my Ubuntu 20.04 machine: Selection Path Priority Status ------------------------------------------------------------ * 0

Order variables automatically

2013 Jan 01

Order variables automatically

Hi, I have a dataset with 6 categorical variables. I have used this following code to make the variables u1-u6 ordered factors and this works well. cat1cat2 cat3 cat4 cat5 cat6 ? 0 ? ?? 1 ? ? 1????? 0 ??? 0? ?? 1 ? 1 ? ?? 1 ? ? 0 ? ?? 0 ? ? 0 ? ? 0 ....... .... ############ data<-read,table("example.txt") data <- as.data.frame(lapply(data, ordered)) ############ Now,

how to rearrange a dataframe

2010 Feb 23

how to rearrange a dataframe

Hi all, I'd appreciate if anyone can help me with this... I have a data frame that looks like this: 1 + name1 1 2 3 2 + name2 5 9 10 2 - name3 56 74 93 1 - name4 65 75 98 I need to rearrange this in a way so that the rows with "1" in the first column, and "-" in the second column; then columns 4 and 6 should switch places. That is, column 6 would be now column 4 and

unbalanced anova with subsampling (Type III SS)

2011 May 21

unbalanced anova with subsampling (Type III SS)

Hello R-users, I am trying to obtain Type III SS for an ANOVA with subsampling. My design is slightly unbalanced with either 3 or 4 subsamples per replicate. The basic aov model would be: fit <- aov(y~x+Error(subsample)) But this gives Type I SS and not Type III. But, using the drop() option: drop1(fit, test="F") I get an error message: "Error in

Computing row differences in new columns

2011 Mar 21

Computing row differences in new columns

Hi I have the following columns with dates and results, sorted by subject and date. I'd like to compute the differences in dates and results for each patient, based on the previous row. Obviously the last entry for each subject should be a NA. Which would be the best way to accomplished that ? I guess questions like that have been already answered a thousand times, so I apologize for

how to subsample all possible combinations of n species taken 1:n at a time?

2009 Apr 06

how to subsample all possible combinations of n species taken 1:n at a time?

Hello I apologise for the length of this entry but please bear with me. In short: I need a way of subsampling communities from all possible communities of n taxa taken 1:n at a time without having to calculate all possible combinations (because this gives me a memory error - using combn() or expand.grid() at least). Does anyone know of a function? Or can you help me edit the combn or

Randomly split a sample in two equal subsamples

2010 Oct 31

Randomly split a sample in two equal subsamples

Dear all, I would like to randomly split a sample in two equally large subsamples. The sample data is stored as a matrix with each row representing an individual and each column representing some variable (e.g., name, age, sex, etc.); the first row contains the names of the variables; the first column contains the individual number (1:n, for n individuals); the number of individuals is even (so,

subsampling

2005 Jan 14

subsampling

hi, I would like to subsample the array c(1:200) at random into ten subsamples v1,v2,...,v10. I tried with to go progressively like this: > x<-c(1:200) > v1<-sample(x,20) > y<-x[-v1] > v2<-sample(y,20) and then I want to do: >x<-y[-v2] Error: subscript out of bounds.

Help with interpolation

2013 Jan 17

Help with interpolation

hi guys I need to interpolate values for the zero coupon yield curve. Following data is given date days rate 1996 01

Plot 3 lines in one graph

2012 Nov 05

Plot 3 lines in one graph

I'm new with R. I want to plot 3 lines in one graph. This is my data: print(x) V1 V2 V3 V41 -4800 25195.73 7415.219 7264.282 -2800 15195.73 5415.219 7264.28 I tried using matplot, but I cannot get exactly what I want. This is what I get, and this is my code: matplot(x[,1],x[,-1],type='b', xlab = "epsilon_h", ylab = "Value2", xlim=

Big Data reading subsample csv

2012 Aug 16

Big Data reading subsample csv

Hello, I'm most grateful for your time to read this. I have a uber size 30GB file of 6 million records and 3000 (mostly categorical data) columns in csv format. I want to bootstrap subsamples for multinomial regression, but it's proving difficult even with my 64GB RAM in my machine and twice that swap file , the process becomes super slow and halts. I'm thinking about generating

Standard error of standard deviation: bootstrap or theoretical results?

2003 Aug 06

Standard error of standard deviation: bootstrap or theoretical results?

Dear R users, This is more a statistical question rather than an R question. I'd appreciate it if you can give me some suggestions. I have a sample of a time series (sample size 500, fat tail in density). I am trying to calculate the Standard error of standard deviation of a sub-block-sample (sample size 250). I take 100 this kind of sub-block-sample, randomly. For these 100 subsamples, I

pseudo code

2007 Oct 09

pseudo code

Hey there! I got a pseudo code and don't know how to apply it to R, maybe someone can help me: Input: A dataset X, kmax: maximum number of clusters, num_subsamples: number of subsamples. Output: S(i; k) - a distribution of similarities between partitions into k clusters of a reference clustering and clustering of subsamples; i = 1 to num_subsamples Requires: T = cluster(X): A hierarchical

fwdmsa package: Error in search.normal(X[samp, ], verbose = FALSE) : At least one item has no variance

2012 Mar 21

fwdmsa package: Error in search.normal(X[samp, ], verbose = FALSE) : At least one item has no variance

I'm using the fwdmsa package to identify deviant cases in a Mokken scale analysis. I've run into a problem., separate from the one I posted previously. The problem comes with items that are "easy" by IRT standards. A good scale should include a range of difficulties; yet when I include "easy" items in a forward search I continuously run into the problem that these items

Trouble retrieving the second largest value from each row of a data.frame

2010 Jul 24

Trouble retrieving the second largest value from each row of a data.frame

I have a data frame with a couple million lines and want to retrieve the largest and second largest values in each row, along with the label of the column these values are in. For example row 1 strongest=-11072 secondstrongest=-11707 strongestantenna=value120 secondstrongantenna=value60 Below is the code I am using and a truncated data.frame. Retrieving the largest value was easy, but I have

deleting certain observations in a data frame

2008 Feb 14

deleting certain observations in a data frame

Hi, I'm wondering what the fastest way is to delete certain data points (observations) in a data frame. I have a vector of the indices/row.names I would like to delete. I have tried replacing list by list, but it always complains about different lengths, "replacing list of length a with length b" and so on. Another way to think of it is that it's a generazation of na.rm I

similar to: Subsetting data systematically