similar to: Speeding up lots of calls to GLM

Displaying 20 results from an estimated 1200 matches similar to: "Speeding up lots of calls to GLM"

2007 Jan 21
2
efficient code. how to reduce running time?
Hi, I am new to R. and even though I've made my code to run and do what it needs to . It is taking forever and I can't use it like this. I was wondering if you could help me find ways to fix the code to run faster. Here are my codes.. the data set is a bunch of 0s and 1s in a data.frame. What I am doing is this. I pick a column and make up a new column Y with values associated with that
2008 Jan 21
2
reordering huge data file
Dear R-experts, My problem is how to handle a 10GB data file containing genotype data. The file is in a particular format (Illumina final report) and needs to be altered and merged with phenotype data for further analysis. PERL seems to be an frequently used solution for this type of work, however I am inclined to think it should be doable with R. How do I open a text-file, line by line,
2011 Apr 18
2
Working with massive matrices in R
Hello, I'm (eventually) attempting a singular value decomposition of a 3200 x 527829 matrix in R version 2.10.1. The script is as follows: ###---------Begin Script here-------### library(Matrix) snps <- 527829 ## Number of SNPs N <- 3200 ## Sample size y <- rnorm(N, 100,1) ## simulated phenotype system.time( ## read in matrix
2011 Jul 14
2
R package: pbatR
Dear All, Does anybody have experience with R package pbatR (http://cran.r-project.org/web/packages/pbatR/index.html)? I am trying to use it to analyze the family-based case-control data, but the package totally doesn?t work on my computer. I contacted the authors of the package, but I haven?t heard anything from them. Following the package manual, I tried the simple example as below:
2009 Sep 22
2
glm analysis repeated for 900 variables
Dear R users, Could you help my with the following problem? I want to repeat a glm analysis with 2 independent variables for all 900 variables (snps) in my data set. So, I want to check whether snp1 has a different effect on my outcome variable in patients and controls(phenotype). And repeat that for snp2 to snp900. Is there an easy way to get a summary of the data, e.g. a list of P values of all
2011 Jul 26
1
Message for R-help mailing list
Dear r-helpers, I would be very grateful if you could post the message below on the r-help discussion board. Thank you very much! Best Wishes, Pawel Hello R community, I am generating lots of results using the fisher.test function, testing many 2x2 tables of SNPs for association with a particular phenotype. A typical output of the fisher.test function would be (for example): data: data1
2011 Aug 25
2
within-groups variance and between-groups variance
Hello, I have been looking for functions for calculating the within-groups variance and between-groups variance, for the case where you have several numerical variables describing samples from a number of groups. I didn't find such functions in R, so wrote my own versions myself (see below). I can calculate the within- and between-groups variance for the Sepal.length variable (iris[1]) in
2006 Apr 06
4
Reshaping genetic data from long to wide
Bottom Line Up Front: How does one reshape genetic data from long to wide? I currently have a lot of data. About 180 individuals (some probands/patients, some parents, rare siblings) and SNP data from 6000 loci on each. The standard formats seem to be something along the lines of Famid, pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2, locus2Allele1, locus2Allele2, etc In other
2012 Mar 14
3
Needing a better solution to a lookup problem.
I have a solution (actually a few) to this problem, but none are computationally efficient enough to be useful. I'm hoping someone can enlighten me to a better solution. I have data frame of chromosome/position pairs (along with other data for the location). For each pair I need to determine if it is with in a given data frame of ranges. I need to keep only the pairs that are within any of
2012 Oct 23
1
factor or character
Hi, The program below work very well. (snps = c('rs621782_G', 'rs8087639_G', 'rs8094221_T', 'rs7227515_A', 'rs537202_C')) Selec = todos[ , colnames(todos) %in% snps] head(Selec) But, I have a data set with 1.000 columns and I need extract 70 to use (like snps in command above). This 70 snps are in a file. So I create a file to extract them with
2013 Nov 08
1
SNPRelate: Plink conversion
Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as "samples" and in turn, the samples are recognised as SNPs: >snpsgdsSummary("chr2L") Some values of snp.position are invalid (should be > 0)! Some values of
2007 Feb 05
3
RSNPper SNPinfo and making it handle a vector
If I run an analysis which generates statistical tests on many SNPs I would naturally want to get more details on the most significant SNPs. Directly from within R one can get the information by loading RSNPer (from Bioconductor) and simply issuing a command SNPinfo(2073285). Unfortunately, the command cannot handle a vector and therefore only wants to do one at a time. I tried the lapply and
2014 Jul 21
1
Multiple versions of data in a package
Dear R-devel, I am writing for help on how I should include parallel sets of data in my package. Brief summary: I am new to using data within packages. I want a user to be able to specify one of two alternative versions of within-package datasets to use, and I want to load just that one. I have a solution that works, but it doesn't seem as simple as it should be from a user's
2004 Feb 19
1
piece wise application of functions
Dear all, After struggling for some time with *apply() and eva() without success, I decided to ask for help. I have 3 lists labeled with, each contains 3 different interpolation functions with identical names: > names(missgp0) [1] "spl.1mb" "spl.2mb" "spl.5mb" > > names(missgp1) [1] "spl.1mb" "spl.2mb" "spl.5mb" > >
2011 Jun 21
4
Re; Getting SNPS from PLINK to R
I a using plink on a large SNP dataset with a .map and .ped file. I want to get some sort of file say a list of all the SNPs that plink is saying that I have. ANyideas on how to do this? -- Thanks, Jim. [[alternative HTML version deleted]]
2013 Oct 03
1
prcomp - surprising structure
Hello, I did a pca with over 200000 snps for 340 observations (ids). If I plot the eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot (rotation[,2]) I see a strange "column" in my data (see attachment). I suggest it is an artefact (but of what?). Suggestion: I used prcomp this way: prcomp (mat), where mat is a matrix with the column means already substracted followed by a
2005 Apr 13
1
logistic regression weights problem
Hi All, I have a problem with weighted logistic regression. I have a number of SNPs and a case/control scenario, but not all genotypes are as "guaranteed" as others, so I am using weights to downsample the importance of individuals whose genotype has been heavily "inferred". My data is quite big, but with a dummy example: > status <- c(1,1,1,0,0) > SNPs <-
2020 Oct 29
1
R: sim1000G
Hi, I am using the sim1000G R package to simulate data for case/control study. I can not figure out how to manipulate this code to be able to generate 10% or 50% causal SNPs in R. This is whole code provided as example on GitHub: library(sim1000G) vcf_file = "region-chr4-357-ANK2.vcf.gz" #nvariants = 442, ss=1000 vcf = readVCF( vcf_file, maxNumberOfVariants = 442 ,min_maf =
2011 Feb 03
1
bug in codetools/R CMD check?
Hi Mr Tierney, I have noticed an error message from R 1.12.x's CMD check for a while (apparently prof Ripley completely rewrote CMD check in R 1.12+) e.g.: http://bioconductor.org/checkResults/2.7/bioc-LATEST/snpMatrix/lamb2-checksrc.html ---------------- * checking R code for possible problems ... NOTE Warning: non-unique value when setting 'row.names': ?new? Error in
2011 Jul 27
1
SNP Tables
Hello, I have indicators for the present of absent of a snps in columns and the categorey (case control column). I would like to extract ONLY the tables and the indices (SNPS) that give me 2 x 3 tables. Some gives 2x 2 tables when one of the allelle is missing. The data look like the matrix snpmat below: so the first snp should give me the following table: (aa=0, Aa=1 and AA=2) aa