Displaying 20 results from an estimated 1000 matches similar to: "Getting SNPS from PLINK to R"
2011 Jun 21
4
Re; Getting SNPS from PLINK to R
I a using plink on a large SNP dataset with a .map and .ped file.
I want to get some sort of file say a list of all the SNPs that plink is
saying that I have. ANyideas on how to do this?
--
Thanks,
Jim.
[[alternative HTML version deleted]]
2013 Nov 08
1
SNPRelate: Plink conversion
Hi,
Following my earlier posts about having problems performing a PCA, I have
worked out what the problem is. The problem lies within the PLINK to gds
conversion.
It seems as though the SNPs are imported as "samples" and in turn, the
samples are recognised as SNPs:
>snpsgdsSummary("chr2L")
Some values of snp.position are invalid (should be > 0)!
Some values of
2006 Jun 05
3
Fastest way to do HWE.exact test on 100K SNP data?
Hi everyone,
I'm using the function 'HWE.exact' of 'genetics' package to compute p-values of
the HWE test. My data set consists of ~600 subjects (cases and controls) typed
at ~ 10K SNP markers; the test is applied separately to cases and controls. The
genotypes are stored in a list of 'genotype' objects, all.geno, and p-values are
calculated inside the loop over all
2008 May 05
1
genotypes simulation
Hello,
I am having really hard time finding a good article about simulating
genotypes of cases and controls at a disease locus using R.
if you guys can point me or guide me where i can find more information, it
will be helpful.
thanks,
Claire
--
View this message in context: http://www.nabble.com/genotypes-simulation-tp17065607p17065607.html
Sent from the R help mailing list archive at
2008 Jan 21
2
reordering huge data file
Dear R-experts,
My problem is how to handle a 10GB data file containing genotype data. The file is in a particular format (Illumina final report) and needs to be altered and merged with phenotype data for further analysis.
PERL seems to be an frequently used solution for this type of work, however I am inclined to think it should be doable with R.
How do I open a text-file, line by line,
2006 Apr 06
4
Reshaping genetic data from long to wide
Bottom Line Up Front: How does one reshape genetic data from long to wide?
I currently have a lot of data. About 180 individuals (some
probands/patients, some parents, rare siblings) and SNP data from 6000 loci
on each. The standard formats seem to be something along the lines of Famid,
pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2,
locus2Allele1, locus2Allele2, etc
In other
2005 Apr 05
2
cat bailing out in a for loop
Dear All,
I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42
SNPs. I am using the function HWE.exact from the package "genetics".
In order not to do a lot of coding "by hand", I have a for loop that
goes through each column (each column is one SNP) and gives me the
p.value for HWE.exact. Unfortunately some SNP have reached fixation and
HWE.exact requires a
2007 Jan 21
2
efficient code. how to reduce running time?
Hi,
I am new to R.
and even though I've made my code to run and do what it needs to .
It is taking forever and I can't use it like this.
I was wondering if you could help me find ways to fix the code to run
faster.
Here are my codes..
the data set is a bunch of 0s and 1s in a data.frame.
What I am doing is this.
I pick a column and make up a new column Y with values associated with that
2013 Oct 03
1
prcomp - surprising structure
Hello,
I did a pca with over 200000 snps for 340 observations (ids). If I plot the
eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot
(rotation[,2]) I see a strange "column" in my data (see attachment). I
suggest it is an artefact (but of what?).
Suggestion:
I used prcomp this way: prcomp (mat), where mat is a matrix with the column
means already substracted followed by a
2011 Jan 03
0
Using PCA to correct p-values from snpMatrix
Hi R-help folks,
I have been doing some single SNP association work using snpMatrix. This works
well, but produces a lot of false positives, because of population structure in
my data. I would like to correct the p-values (which snpMatrix gives me) for
population structure, possibly using principle component analysis (PCA).
My data is complicated, so here's a simple example of what
2013 Jul 02
2
Recoding variables based on reference values in data frame
I'm new to R (previously used SAS primarily) and I have a genetics data
frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3,
...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of
the data is shown below:
SNP_ID SNP1 SNP2 SNP3 SNP4 Maj_Allele C G C A Min_Allele T A T G ID1
CC GG CT AA ID2 CC GG CC AA ID3 CC GG
nc
AA
2008 Aug 22
2
help needed for HWE.exact in library "genetics"
Hi,
I have a genotype data for both case and controls and would like to calculate the HW p-value. However, since the number of one genotype is 0, I got wired result. Would someone help me to figure it out? Or confirm it's right? Thanks a lot.
============
> library( "genetics" )
NOTE: THIS PACKAGE IS NOW OBSOLETE.
The R-Genetics project has developed an set of enhanced
2005 Apr 13
1
logistic regression weights problem
Hi All,
I have a problem with weighted logistic regression. I have a number of
SNPs and a case/control scenario, but not all genotypes are as
"guaranteed" as others, so I am using weights to downsample the
importance of individuals whose genotype has been heavily "inferred".
My data is quite big, but with a dummy example:
> status <- c(1,1,1,0,0)
> SNPs <-
2009 Jan 13
3
problem whit Geneland
I do the these passages:
library(Geneland)
set.seed(1)
data <- simdata(nindiv=200,
coord.lim=c(0,1,0,1) ,
number.nuclei=5 ,
allele.numbers=rep(10,20),
IBD=FALSE,
npop=2,
give.tess.grid=FALSE)
geno <- data$genotypes
coord <- t(data$coord.indiv)
path.mcmc <-
2011 Apr 14
1
integer and floating-point storage
I note that "current implementations of R use 32-bit integers for integer
vectors," but I am working with large arrays that contain integers from 0
to 3, so they could be stored as unsigned 8-bit integers. Can R do this?
(FYI -- This is for storing minor-allele counts for genetic studies.
There are 0, 1 or 2 minor alleles and 3 would represent missing.)
It is theoretically possible
2010 Feb 12
1
"drop if missing" command?
This will probably seem very simple to experienced R programmers:
I am doing a snp association analysis and am at the model-fitting stage. I
am using the Stats package's "drop1" with the following code:
##geno is the dataset
## the dependent variable (casectrln) is dichotomous and coded 0,1
## rs743572_2 is one of the snps (which is coded 0,1,2 for the 3 genotypes)
2011 Dec 09
1
minor allele frequency comparison
Hi all,
We are using two methods to identify SNPs. One is based on resequencing
the genome and aligning the reads to the sequenced genome to identify SNPs
(data available for 44 individuals). Another is based on SNP array with
selected loci (30000 loci, 870 individuals). I want to compare the results
from the resequencing based minor allele frequency and Array based minor
allele frequency.
2007 Feb 05
3
RSNPper SNPinfo and making it handle a vector
If I run an analysis which generates statistical tests on many SNPs I would
naturally want to get more details on the most significant SNPs. Directly
from within R one can get the information by loading RSNPer (from
Bioconductor) and simply issuing a command SNPinfo(2073285). Unfortunately,
the command cannot handle a vector and therefore only wants to do one at a
time.
I tried the lapply and
2006 Jun 20
2
multi-dimension array of raw
I would like to store and manipulate large sets of marker genotypes
compactly using "raw" data arrays. This works fine for vectors or
matrices, but I run into the error shown in the example below as soon
as I try to use 3 dimensional arrays (eg. animal x marker x allele).
> a <- array(as.raw(1:6),c(2,3))
> a
[,1] [,2] [,3]
[1,] 01 03 05
[2,] 02 04 06
>
2012 May 23
3
applying cbind (or any function) across all components in a list
#If I have two lists as follows
a1<- array(1:6, dim=c(2,3))
a2<- array(7:12, dim=c(2,3))
l1<- list(a1,a2)
a3<- array(1:4, dim=c(2,2))
a4<- array(5:8, dim=c(2,2))
l2<- list(a3,a4)
#how can I create a new list with the mean across all arrays within the
list, so all components are included? As an example for [[1]];
cbind((l1[[1]][,1]+l2[[1]][,1])/2,