similar to: Getting SNPS from PLINK to R

Displaying 20 results from an estimated 1000 matches similar to: "Getting SNPS from PLINK to R"

2011 Jun 21
4
Re; Getting SNPS from PLINK to R
I a using plink on a large SNP dataset with a .map and .ped file. I want to get some sort of file say a list of all the SNPs that plink is saying that I have. ANyideas on how to do this? -- Thanks, Jim. [[alternative HTML version deleted]]
2013 Nov 08
1
SNPRelate: Plink conversion
Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as "samples" and in turn, the samples are recognised as SNPs: >snpsgdsSummary("chr2L") Some values of snp.position are invalid (should be > 0)! Some values of
2006 Jun 05
3
Fastest way to do HWE.exact test on 100K SNP data?
Hi everyone, I'm using the function 'HWE.exact' of 'genetics' package to compute p-values of the HWE test. My data set consists of ~600 subjects (cases and controls) typed at ~ 10K SNP markers; the test is applied separately to cases and controls. The genotypes are stored in a list of 'genotype' objects, all.geno, and p-values are calculated inside the loop over all
2008 May 05
1
genotypes simulation
Hello, I am having really hard time finding a good article about simulating genotypes of cases and controls at a disease locus using R. if you guys can point me or guide me where i can find more information, it will be helpful. thanks, Claire -- View this message in context: http://www.nabble.com/genotypes-simulation-tp17065607p17065607.html Sent from the R help mailing list archive at
2008 Jan 21
2
reordering huge data file
Dear R-experts, My problem is how to handle a 10GB data file containing genotype data. The file is in a particular format (Illumina final report) and needs to be altered and merged with phenotype data for further analysis. PERL seems to be an frequently used solution for this type of work, however I am inclined to think it should be doable with R. How do I open a text-file, line by line,
2006 Apr 06
4
Reshaping genetic data from long to wide
Bottom Line Up Front: How does one reshape genetic data from long to wide? I currently have a lot of data. About 180 individuals (some probands/patients, some parents, rare siblings) and SNP data from 6000 loci on each. The standard formats seem to be something along the lines of Famid, pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2, locus2Allele1, locus2Allele2, etc In other
2005 Apr 05
2
cat bailing out in a for loop
Dear All, I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42 SNPs. I am using the function HWE.exact from the package "genetics". In order not to do a lot of coding "by hand", I have a for loop that goes through each column (each column is one SNP) and gives me the p.value for HWE.exact. Unfortunately some SNP have reached fixation and HWE.exact requires a
2007 Jan 21
2
efficient code. how to reduce running time?
Hi, I am new to R. and even though I've made my code to run and do what it needs to . It is taking forever and I can't use it like this. I was wondering if you could help me find ways to fix the code to run faster. Here are my codes.. the data set is a bunch of 0s and 1s in a data.frame. What I am doing is this. I pick a column and make up a new column Y with values associated with that
2013 Oct 03
1
prcomp - surprising structure
Hello, I did a pca with over 200000 snps for 340 observations (ids). If I plot the eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot (rotation[,2]) I see a strange "column" in my data (see attachment). I suggest it is an artefact (but of what?). Suggestion: I used prcomp this way: prcomp (mat), where mat is a matrix with the column means already substracted followed by a
2011 Jan 03
0
Using PCA to correct p-values from snpMatrix
Hi R-help folks, I have been doing some single SNP association work using snpMatrix. This works well, but produces a lot of false positives, because of population structure in my data. I would like to correct the p-values (which snpMatrix gives me) for population structure, possibly using principle component analysis (PCA). My data is complicated, so here's a simple example of what
2013 Jul 02
2
Recoding variables based on reference values in data frame
I'm new to R (previously used SAS primarily) and I have a genetics data frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3, ...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of the data is shown below: SNP_ID SNP1 SNP2 SNP3 SNP4 Maj_Allele C G C A Min_Allele T A T G ID1 CC GG CT AA ID2 CC GG CC AA ID3 CC GG nc AA
2008 Aug 22
2
help needed for HWE.exact in library "genetics"
Hi, I have a genotype data for both case and controls and would like to calculate the HW p-value. However, since the number of one genotype is 0, I got wired result. Would someone help me to figure it out? Or confirm it's right? Thanks a lot. ============ > library( "genetics" ) NOTE: THIS PACKAGE IS NOW OBSOLETE. The R-Genetics project has developed an set of enhanced
2005 Apr 13
1
logistic regression weights problem
Hi All, I have a problem with weighted logistic regression. I have a number of SNPs and a case/control scenario, but not all genotypes are as "guaranteed" as others, so I am using weights to downsample the importance of individuals whose genotype has been heavily "inferred". My data is quite big, but with a dummy example: > status <- c(1,1,1,0,0) > SNPs <-
2009 Jan 13
3
problem whit Geneland
I do the these passages: library(Geneland) set.seed(1) data <- simdata(nindiv=200, coord.lim=c(0,1,0,1) , number.nuclei=5 , allele.numbers=rep(10,20), IBD=FALSE, npop=2, give.tess.grid=FALSE) geno <- data$genotypes coord <- t(data$coord.indiv) path.mcmc <-
2011 Apr 14
1
integer and floating-point storage
I note that "current implementations of R use 32-bit integers for integer vectors," but I am working with large arrays that contain integers from 0 to 3, so they could be stored as unsigned 8-bit integers. Can R do this? (FYI -- This is for storing minor-allele counts for genetic studies. There are 0, 1 or 2 minor alleles and 3 would represent missing.) It is theoretically possible
2010 Feb 12
1
"drop if missing" command?
This will probably seem very simple to experienced R programmers: I am doing a snp association analysis and am at the model-fitting stage. I am using the Stats package's "drop1" with the following code: ##geno is the dataset ## the dependent variable (casectrln) is dichotomous and coded 0,1 ## rs743572_2 is one of the snps (which is coded 0,1,2 for the 3 genotypes)
2011 Dec 09
1
minor allele frequency comparison
Hi all, We are using two methods to identify SNPs. One is based on resequencing the genome and aligning the reads to the sequenced genome to identify SNPs (data available for 44 individuals). Another is based on SNP array with selected loci (30000 loci, 870 individuals). I want to compare the results from the resequencing based minor allele frequency and Array based minor allele frequency.
2007 Feb 05
3
RSNPper SNPinfo and making it handle a vector
If I run an analysis which generates statistical tests on many SNPs I would naturally want to get more details on the most significant SNPs. Directly from within R one can get the information by loading RSNPer (from Bioconductor) and simply issuing a command SNPinfo(2073285). Unfortunately, the command cannot handle a vector and therefore only wants to do one at a time. I tried the lapply and
2006 Jun 20
2
multi-dimension array of raw
I would like to store and manipulate large sets of marker genotypes compactly using "raw" data arrays. This works fine for vectors or matrices, but I run into the error shown in the example below as soon as I try to use 3 dimensional arrays (eg. animal x marker x allele). > a <- array(as.raw(1:6),c(2,3)) > a [,1] [,2] [,3] [1,] 01 03 05 [2,] 02 04 06 >
2012 May 23
3
applying cbind (or any function) across all components in a list
#If I have two lists as follows a1<- array(1:6, dim=c(2,3)) a2<- array(7:12, dim=c(2,3)) l1<- list(a1,a2) a3<- array(1:4, dim=c(2,2)) a4<- array(5:8, dim=c(2,2)) l2<- list(a3,a4) #how can I create a new list with the mean across all arrays within the list, so all components are included? As an example for [[1]]; cbind((l1[[1]][,1]+l2[[1]][,1])/2,