similar to: Generating missingness on SNP data

Displaying 20 results from an estimated 10000 matches similar to: "Generating missingness on SNP data"

2011 Dec 13
0
snpStats imputed SNP probabilities
Hi, Does anybody know how to obtain the imputed SNP genotype probabilities from the snpStats package? I am interested in using an imputation method implemented in R to be further used in a simulation study context. I have found the snpStats package that seems to contain suitable functions to do so. As far as I could find out from the package vignette examples and its help, it gives the
2007 Oct 30
0
Plotting question: how to plot SNP location data?
Hello, I would like to plot specific SNPs with their exact locations on a chromosome. Based on my genotyping results I would like to separate these SNPs in three different categories: 1, 2 and 3 and use different colours to represent these categories. The script below generates the sample data. I can plot these with the image function using the following: val <- 1:3 samp <- sample(val,
2012 Jun 14
1
Can someone recommend a package for SNP cluster analysis of Fluidigm microarrays?
I know that there are quite a few packages out that there for cluster analysis. The problem that I am facing is finding a package that will not incorporate all my samples into clusters but just the samples that fit a threshold (that I have not set yet and may need help finding the right level) for genotyping. It should be able to "no call" samples outside the clusters. It also needs to
2008 Dec 24
0
command Polygenic gives error message concerning dimensions of data
Dear Sir/Madam, Since a few day now I try to use the command "polygenic" from the GenAbel package. However, I keep bumping up against an error message: "Error in polygenic(Testo, kin = kinship, data = data1) : dimension of outcome and kinship.matrix do not match". My data exists of 1240 individuals with 74 markers. It mainly consists of small families (2 or more brothers,
2006 Jun 05
3
Fastest way to do HWE.exact test on 100K SNP data?
Hi everyone, I'm using the function 'HWE.exact' of 'genetics' package to compute p-values of the HWE test. My data set consists of ~600 subjects (cases and controls) typed at ~ 10K SNP markers; the test is applied separately to cases and controls. The genotypes are stored in a list of 'genotype' objects, all.geno, and p-values are calculated inside the loop over all
2005 Mar 04
0
Is aggregate() what I need here?
I'm pretty new to R, and I've been given a script by a user who wants some help with it. I know enough about the way R works to know that this is a very inefficient way to do what the user wants (the LSB_JOBINDEX stuff is added by me so that this can work on many hundreds of input data files as LSF jobs - it's the nested loops I'm really interested in):
2010 Feb 12
1
"drop if missing" command?
This will probably seem very simple to experienced R programmers: I am doing a snp association analysis and am at the model-fitting stage. I am using the Stats package's "drop1" with the following code: ##geno is the dataset ## the dependent variable (casectrln) is dichotomous and coded 0,1 ## rs743572_2 is one of the snps (which is coded 0,1,2 for the 3 genotypes)
2006 Apr 06
4
Reshaping genetic data from long to wide
Bottom Line Up Front: How does one reshape genetic data from long to wide? I currently have a lot of data. About 180 individuals (some probands/patients, some parents, rare siblings) and SNP data from 6000 loci on each. The standard formats seem to be something along the lines of Famid, pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2, locus2Allele1, locus2Allele2, etc In other
2005 Apr 13
1
logistic regression weights problem
Hi All, I have a problem with weighted logistic regression. I have a number of SNPs and a case/control scenario, but not all genotypes are as "guaranteed" as others, so I am using weights to downsample the importance of individuals whose genotype has been heavily "inferred". My data is quite big, but with a dummy example: > status <- c(1,1,1,0,0) > SNPs <-
2011 Jul 27
1
SNP Tables
Hello, I have indicators for the present of absent of a snps in columns and the categorey (case control column). I would like to extract ONLY the tables and the indices (SNPS) that give me 2 x 3 tables. Some gives 2x 2 tables when one of the allelle is missing. The data look like the matrix snpmat below: so the first snp should give me the following table: (aa=0, Aa=1 and AA=2) aa
2012 Aug 24
0
A question about GRAMMAR calculations in the FAM_MDR algorithm
Dear R developers: I am a PHD candidate student in the school of public health of Peking University and my major is genetic epidemiology. I am learning the FAM-MDR algorithm, which is used to detect the gene-gene and gene-environment interactions in the data of pedigree. The codes were written by Tom Cattaert of the University of Liege. The algorithms and the sample datasets are available at
2005 Apr 05
2
cat bailing out in a for loop
Dear All, I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42 SNPs. I am using the function HWE.exact from the package "genetics". In order not to do a lot of coding "by hand", I have a for loop that goes through each column (each column is one SNP) and gives me the p.value for HWE.exact. Unfortunately some SNP have reached fixation and HWE.exact requires a
2013 Oct 03
1
prcomp - surprising structure
Hello, I did a pca with over 200000 snps for 340 observations (ids). If I plot the eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot (rotation[,2]) I see a strange "column" in my data (see attachment). I suggest it is an artefact (but of what?). Suggestion: I used prcomp this way: prcomp (mat), where mat is a matrix with the column means already substracted followed by a
2010 May 28
0
how to use GenABEL genetic information??
Does anyone use the R library GenABEL? I am using it to calculate SNP interactions. I have a list of 100 SNPs, I need to look at the interaction between each of two SNPs among the list. my question is how to perform this in GenABEL. I want to use the "lm" function, but don't know how to use the SNP information. for example: result <- (lm(y~SNP1+SNP2+SNP1*SNP2)) the problem here
2012 Feb 24
1
Missing Data in Stepwise selection of Logistic regression
Hi all, I am running Stepwise logistic regression and i have : 1- Multiple covatiates included in each model (No missing data) 2- Genotype data (SNPs) about 500,000 . I partitioned the data to multiple files (there are missing data) I run the step by including all the covariates and one SNP at each model. but i got this message : number of rows in use has changed: remove missing values? In
2013 Nov 08
1
SNPRelate: Plink conversion
Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as "samples" and in turn, the samples are recognised as SNPs: >snpsgdsSummary("chr2L") Some values of snp.position are invalid (should be > 0)! Some values of
2007 May 25
1
Read in 250K snp chips
I'm having trouble getting summaries out of the 250K snp chips in R. I'm using the oligo package and when I attempt to create the necessary SnpQSet object (to get genotype calls and intensities) using snprma, I encounter memory issues. Anyone have an alternative package or workaround for these large snp chips? -- View this message in context:
2007 Jan 21
2
efficient code. how to reduce running time?
Hi, I am new to R. and even though I've made my code to run and do what it needs to . It is taking forever and I can't use it like this. I was wondering if you could help me find ways to fix the code to run faster. Here are my codes.. the data set is a bunch of 0s and 1s in a data.frame. What I am doing is this. I pick a column and make up a new column Y with values associated with that
2011 Dec 09
1
minor allele frequency comparison
Hi all, We are using two methods to identify SNPs. One is based on resequencing the genome and aligning the reads to the sequenced genome to identify SNPs (data available for 44 individuals). Another is based on SNP array with selected loci (30000 loci, 870 individuals). I want to compare the results from the resequencing based minor allele frequency and Array based minor allele frequency.
2010 Feb 28
1
Combining 2 columns into 1 column many times in a very large dataset
*Combining 2 columns into 1 column many times in a very large dataset* The clumsy solutions I am working on are not going to be very fast if I can get them to work and the true dataset is ~1500 X 45000 so they need to be efficient. I've searched the R help files and the archives for this list and have some possible workable solutions for 2) and 3) but not my question 1). However, I include