thr3ads.net - similar to: "Getting SNPS from PLINK to R"

Displaying 20 results from an estimated 1000 matches similar to: "Getting SNPS from PLINK to R"

2011 Jun 21

Re; Getting SNPS from PLINK to R

I a using plink on a large SNP dataset with a .map and .ped file. I want to get some sort of file say a list of all the SNPs that plink is saying that I have. ANyideas on how to do this? -- Thanks, Jim. [[alternative HTML version deleted]]

SNPRelate: Plink conversion

2013 Nov 08

SNPRelate: Plink conversion

Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as "samples" and in turn, the samples are recognised as SNPs: >snpsgdsSummary("chr2L") Some values of snp.position are invalid (should be > 0)! Some values of

Fastest way to do HWE.exact test on 100K SNP data?

2006 Jun 05

Fastest way to do HWE.exact test on 100K SNP data?

Hi everyone, I'm using the function 'HWE.exact' of 'genetics' package to compute p-values of the HWE test. My data set consists of ~600 subjects (cases and controls) typed at ~ 10K SNP markers; the test is applied separately to cases and controls. The genotypes are stored in a list of 'genotype' objects, all.geno, and p-values are calculated inside the loop over all

genotypes simulation

2008 May 05

genotypes simulation

Hello, I am having really hard time finding a good article about simulating genotypes of cases and controls at a disease locus using R. if you guys can point me or guide me where i can find more information, it will be helpful. thanks, Claire -- View this message in context: http://www.nabble.com/genotypes-simulation-tp17065607p17065607.html Sent from the R help mailing list archive at

reordering huge data file

2008 Jan 21

reordering huge data file

Dear R-experts, My problem is how to handle a 10GB data file containing genotype data. The file is in a particular format (Illumina final report) and needs to be altered and merged with phenotype data for further analysis. PERL seems to be an frequently used solution for this type of work, however I am inclined to think it should be doable with R. How do I open a text-file, line by line,

Reshaping genetic data from long to wide

2006 Apr 06

Reshaping genetic data from long to wide

Bottom Line Up Front: How does one reshape genetic data from long to wide? I currently have a lot of data. About 180 individuals (some probands/patients, some parents, rare siblings) and SNP data from 6000 loci on each. The standard formats seem to be something along the lines of Famid, pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2, locus2Allele1, locus2Allele2, etc In other

cat bailing out in a for loop

2005 Apr 05

cat bailing out in a for loop

Dear All, I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42 SNPs. I am using the function HWE.exact from the package "genetics". In order not to do a lot of coding "by hand", I have a for loop that goes through each column (each column is one SNP) and gives me the p.value for HWE.exact. Unfortunately some SNP have reached fixation and HWE.exact requires a

efficient code. how to reduce running time?

2007 Jan 21

efficient code. how to reduce running time?

Hi, I am new to R. and even though I've made my code to run and do what it needs to . It is taking forever and I can't use it like this. I was wondering if you could help me find ways to fix the code to run faster. Here are my codes.. the data set is a bunch of 0s and 1s in a data.frame. What I am doing is this. I pick a column and make up a new column Y with values associated with that

prcomp - surprising structure

2013 Oct 03

prcomp - surprising structure

Hello, I did a pca with over 200000 snps for 340 observations (ids). If I plot the eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot (rotation[,2]) I see a strange "column" in my data (see attachment). I suggest it is an artefact (but of what?). Suggestion: I used prcomp this way: prcomp (mat), where mat is a matrix with the column means already substracted followed by a

Using PCA to correct p-values from snpMatrix

2011 Jan 03

Using PCA to correct p-values from snpMatrix

Hi R-help folks, I have been doing some single SNP association work using snpMatrix. This works well, but produces a lot of false positives, because of population structure in my data. I would like to correct the p-values (which snpMatrix gives me) for population structure, possibly using principle component analysis (PCA). My data is complicated, so here's a simple example of what

Recoding variables based on reference values in data frame

2013 Jul 02

Recoding variables based on reference values in data frame

I'm new to R (previously used SAS primarily) and I have a genetics data frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3, ...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of the data is shown below: SNP_ID SNP1 SNP2 SNP3 SNP4 Maj_Allele C G C A Min_Allele T A T G ID1 CC GG CT AA ID2 CC GG CC AA ID3 CC GG nc AA

help needed for HWE.exact in library "genetics"

2008 Aug 22

help needed for HWE.exact in library "genetics"

Hi, I have a genotype data for both case and controls and would like to calculate the HW p-value. However, since the number of one genotype is 0, I got wired result. Would someone help me to figure it out? Or confirm it's right? Thanks a lot. ============ > library( "genetics" ) NOTE: THIS PACKAGE IS NOW OBSOLETE. The R-Genetics project has developed an set of enhanced

logistic regression weights problem

2005 Apr 13

logistic regression weights problem

Hi All, I have a problem with weighted logistic regression. I have a number of SNPs and a case/control scenario, but not all genotypes are as "guaranteed" as others, so I am using weights to downsample the importance of individuals whose genotype has been heavily "inferred". My data is quite big, but with a dummy example: > status <- c(1,1,1,0,0) > SNPs <-

problem whit Geneland

2009 Jan 13

problem whit Geneland

I do the these passages: library(Geneland) set.seed(1) data <- simdata(nindiv=200, coord.lim=c(0,1,0,1) , number.nuclei=5 , allele.numbers=rep(10,20), IBD=FALSE, npop=2, give.tess.grid=FALSE) geno <- data$genotypes coord <- t(data$coord.indiv) path.mcmc <-

integer and floating-point storage

2011 Apr 14

integer and floating-point storage

I note that "current implementations of R use 32-bit integers for integer vectors," but I am working with large arrays that contain integers from 0 to 3, so they could be stored as unsigned 8-bit integers. Can R do this? (FYI -- This is for storing minor-allele counts for genetic studies. There are 0, 1 or 2 minor alleles and 3 would represent missing.) It is theoretically possible

"drop if missing" command?

2010 Feb 12

"drop if missing" command?

This will probably seem very simple to experienced R programmers: I am doing a snp association analysis and am at the model-fitting stage. I am using the Stats package's "drop1" with the following code: ##geno is the dataset ## the dependent variable (casectrln) is dichotomous and coded 0,1 ## rs743572_2 is one of the snps (which is coded 0,1,2 for the 3 genotypes)

minor allele frequency comparison

2011 Dec 09

minor allele frequency comparison

Hi all, We are using two methods to identify SNPs. One is based on resequencing the genome and aligning the reads to the sequenced genome to identify SNPs (data available for 44 individuals). Another is based on SNP array with selected loci (30000 loci, 870 individuals). I want to compare the results from the resequencing based minor allele frequency and Array based minor allele frequency.

RSNPper SNPinfo and making it handle a vector

2007 Feb 05

RSNPper SNPinfo and making it handle a vector

If I run an analysis which generates statistical tests on many SNPs I would naturally want to get more details on the most significant SNPs. Directly from within R one can get the information by loading RSNPer (from Bioconductor) and simply issuing a command SNPinfo(2073285). Unfortunately, the command cannot handle a vector and therefore only wants to do one at a time. I tried the lapply and

multi-dimension array of raw

2006 Jun 20

multi-dimension array of raw

I would like to store and manipulate large sets of marker genotypes compactly using "raw" data arrays. This works fine for vectors or matrices, but I run into the error shown in the example below as soon as I try to use 3 dimensional arrays (eg. animal x marker x allele). > a <- array(as.raw(1:6),c(2,3)) > a [,1] [,2] [,3] [1,] 01 03 05 [2,] 02 04 06 >

applying cbind (or any function) across all components in a list

2012 May 23

applying cbind (or any function) across all components in a list

#If I have two lists as follows a1<- array(1:6, dim=c(2,3)) a2<- array(7:12, dim=c(2,3)) l1<- list(a1,a2) a3<- array(1:4, dim=c(2,2)) a4<- array(5:8, dim=c(2,2)) l2<- list(a3,a4) #how can I create a new list with the mean across all arrays within the list, so all components are included? As an example for [[1]]; cbind((l1[[1]][,1]+l2[[1]][,1])/2,

similar to: Getting SNPS from PLINK to R