Displaying 20 results from an estimated 10000 matches similar to: "Generating missingness on SNP data"
2011 Dec 13
0
snpStats imputed SNP probabilities
Hi,
Does anybody know how to obtain the imputed SNP genotype probabilities from the snpStats package?
I am interested in using an imputation method implemented in R to be further used in a simulation study context.
I have found the snpStats package that seems to contain suitable functions to do so.
As far as I could find out from the package vignette examples and its help, it gives the
2007 Oct 30
0
Plotting question: how to plot SNP location data?
Hello,
I would like to plot specific SNPs with their exact locations on a
chromosome. Based on my genotyping results I would like to separate
these SNPs in three different categories: 1, 2 and 3 and use different
colours to represent these categories. The script below generates the
sample data. I can plot these with the image function using the
following:
val <- 1:3
samp <- sample(val,
2012 Jun 14
1
Can someone recommend a package for SNP cluster analysis of Fluidigm microarrays?
I know that there are quite a few packages out that there for cluster
analysis. The problem that I am facing is finding a package that will not
incorporate all my samples into clusters but just the samples that fit a
threshold (that I have not set yet and may need help finding the right
level) for genotyping. It should be able to "no call" samples outside the
clusters. It also needs to
2008 Dec 24
0
command Polygenic gives error message concerning dimensions of data
Dear Sir/Madam,
Since a few day now I try to use the command "polygenic" from the GenAbel
package. However, I keep bumping up against an error message: "Error in
polygenic(Testo, kin = kinship, data = data1) : dimension of outcome and
kinship.matrix do not match".
My data exists of 1240 individuals with 74 markers. It mainly consists of
small families (2 or more brothers,
2006 Jun 05
3
Fastest way to do HWE.exact test on 100K SNP data?
Hi everyone,
I'm using the function 'HWE.exact' of 'genetics' package to compute p-values of
the HWE test. My data set consists of ~600 subjects (cases and controls) typed
at ~ 10K SNP markers; the test is applied separately to cases and controls. The
genotypes are stored in a list of 'genotype' objects, all.geno, and p-values are
calculated inside the loop over all
2005 Mar 04
0
Is aggregate() what I need here?
I'm pretty new to R, and I've been given a script by a user who wants
some help with it. I know enough about the way R works to know that
this is a very inefficient way to do what the user wants (the
LSB_JOBINDEX stuff is added by me so that this can work on many
hundreds of input data files as LSF jobs - it's the nested loops I'm
really interested in):
2010 Feb 12
1
"drop if missing" command?
This will probably seem very simple to experienced R programmers:
I am doing a snp association analysis and am at the model-fitting stage. I
am using the Stats package's "drop1" with the following code:
##geno is the dataset
## the dependent variable (casectrln) is dichotomous and coded 0,1
## rs743572_2 is one of the snps (which is coded 0,1,2 for the 3 genotypes)
2006 Apr 06
4
Reshaping genetic data from long to wide
Bottom Line Up Front: How does one reshape genetic data from long to wide?
I currently have a lot of data. About 180 individuals (some
probands/patients, some parents, rare siblings) and SNP data from 6000 loci
on each. The standard formats seem to be something along the lines of Famid,
pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2,
locus2Allele1, locus2Allele2, etc
In other
2005 Apr 13
1
logistic regression weights problem
Hi All,
I have a problem with weighted logistic regression. I have a number of
SNPs and a case/control scenario, but not all genotypes are as
"guaranteed" as others, so I am using weights to downsample the
importance of individuals whose genotype has been heavily "inferred".
My data is quite big, but with a dummy example:
> status <- c(1,1,1,0,0)
> SNPs <-
2011 Jul 27
1
SNP Tables
Hello,
I have indicators for the present of absent of a snps in columns and the
categorey (case control column). I would like to extract ONLY the tables and
the indices (SNPS) that give me 2 x 3 tables. Some gives 2x 2 tables when
one of the allelle is missing. The data look like the matrix snpmat below:
so the first snp should give me the following table: (aa=0, Aa=1 and AA=2)
aa
2012 Aug 24
0
A question about GRAMMAR calculations in the FAM_MDR algorithm
Dear R developers:
I am a PHD candidate student in the school of public health of Peking
University and my major is genetic epidemiology. I am learning the FAM-MDR
algorithm, which is used to detect the gene-gene and gene-environment
interactions in the data of pedigree. The codes were written by Tom
Cattaert of the University of Liege. The algorithms and the sample datasets
are available at
2005 Apr 05
2
cat bailing out in a for loop
Dear All,
I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42
SNPs. I am using the function HWE.exact from the package "genetics".
In order not to do a lot of coding "by hand", I have a for loop that
goes through each column (each column is one SNP) and gives me the
p.value for HWE.exact. Unfortunately some SNP have reached fixation and
HWE.exact requires a
2013 Oct 03
1
prcomp - surprising structure
Hello,
I did a pca with over 200000 snps for 340 observations (ids). If I plot the
eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot
(rotation[,2]) I see a strange "column" in my data (see attachment). I
suggest it is an artefact (but of what?).
Suggestion:
I used prcomp this way: prcomp (mat), where mat is a matrix with the column
means already substracted followed by a
2010 May 28
0
how to use GenABEL genetic information??
Does anyone use the R library GenABEL? I am using it to calculate SNP
interactions.
I have a list of 100 SNPs, I need to look at the interaction between each of
two SNPs among the list. my question is how to perform this in GenABEL. I
want to use the "lm" function, but don't know how to use the SNP
information.
for example:
result <- (lm(y~SNP1+SNP2+SNP1*SNP2))
the problem here
2012 Feb 24
1
Missing Data in Stepwise selection of Logistic regression
Hi all,
I am running Stepwise logistic regression and i have :
1- Multiple covatiates included in each model (No missing data)
2- Genotype data (SNPs) about 500,000 .
I partitioned the data to multiple files (there are missing data)
I run the step by including all the covariates and one SNP at each model.
but i got this message :
number of rows in use has changed: remove missing values?
In
2013 Nov 08
1
SNPRelate: Plink conversion
Hi,
Following my earlier posts about having problems performing a PCA, I have
worked out what the problem is. The problem lies within the PLINK to gds
conversion.
It seems as though the SNPs are imported as "samples" and in turn, the
samples are recognised as SNPs:
>snpsgdsSummary("chr2L")
Some values of snp.position are invalid (should be > 0)!
Some values of
2007 May 25
1
Read in 250K snp chips
I'm having trouble getting summaries out of the 250K snp chips in R. I'm
using the oligo package and when I attempt to create the necessary SnpQSet
object (to get genotype calls and intensities) using snprma, I encounter
memory issues.
Anyone have an alternative package or workaround for these large snp chips?
--
View this message in context:
2007 Jan 21
2
efficient code. how to reduce running time?
Hi,
I am new to R.
and even though I've made my code to run and do what it needs to .
It is taking forever and I can't use it like this.
I was wondering if you could help me find ways to fix the code to run
faster.
Here are my codes..
the data set is a bunch of 0s and 1s in a data.frame.
What I am doing is this.
I pick a column and make up a new column Y with values associated with that
2011 Dec 09
1
minor allele frequency comparison
Hi all,
We are using two methods to identify SNPs. One is based on resequencing
the genome and aligning the reads to the sequenced genome to identify SNPs
(data available for 44 individuals). Another is based on SNP array with
selected loci (30000 loci, 870 individuals). I want to compare the results
from the resequencing based minor allele frequency and Array based minor
allele frequency.
2010 Feb 28
1
Combining 2 columns into 1 column many times in a very large dataset
*Combining 2 columns into 1 column many times in a very large dataset*
The clumsy solutions I am working on are not going to be very fast if I can
get them to work and the true dataset is ~1500 X 45000 so they need to be
efficient. I've searched the R help files and the archives for this list and
have some possible workable solutions for 2) and 3) but not my question 1).
However, I include