I a using plink on a large SNP dataset with a .map and .ped file. I want to get some sort of file say a list of all the SNPs that plink is saying that I have. ANyideas on how to do this? -- Thanks, Jim. [[alternative HTML version deleted]]
Hi, If you go to this site: http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml#teach And download the teaching.zip file, I think there was information in the word document about reading plink data into R, though I am not 100% sure. I think a read.table("filename.ped", header=T) command may be enough. The word document is for plink beginners so it may not be what you are looking for. Tina On Mon, Jun 20, 2011 at 11:32 PM, Jim Silverton <jim.silverton at gmail.com> wrote:> > I a using plink on a large SNP dataset with a .map and .ped file. > I want to get some sort of file say a list of all the SNPs that plink is > saying that I have. ANyideas on how to do this? > > -- > Thanks, > Jim. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Jim, If you convert the ped and map files to binary plink files the .bim file will tell you the name and the position of the snps. This would be the easiest method. Alternatively packages like GenABEL and genetics have functions to read in PLINK formatted data for analysis in R. Best wishes, Natalie On 21/06/2011 14:23, Clemontina Alexander wrote:> Hi, > If you go to this site: > http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml#teach > > And download the teaching.zip file, I think there was information in > the word document about reading plink data into R, though I am not > 100% sure. I think a read.table("filename.ped", header=T) command may > be enough. The word document is for plink beginners so it may not be > what you are looking for. > > Tina > > > > > > > > On Mon, Jun 20, 2011 at 11:32 PM, Jim Silverton<jim.silverton at gmail.com> wrote: >> I a using plink on a large SNP dataset with a .map and .ped file. >> I want to get some sort of file say a list of all the SNPs that plink is >> saying that I have. ANyideas on how to do this? >> >> -- >> Thanks, >> Jim. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Jim, If you convert the ped and map files to binary plink files the .bim file will tell you the name and the position of the snps. This would be the easiest method. Alternatively packages like GenABEL and genetics have functions to read in PLINK formatted data for analysis in R. Best wishes, Natalie On 21/06/2011 14:23, Clemontina Alexander wrote:> Hi, > If you go to this site: > http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml#teach > > And download the teaching.zip file, I think there was information in > the word document about reading plink data into R, though I am not > 100% sure. I think a read.table("filename.ped", header=T) command may > be enough. The word document is for plink beginners so it may not be > what you are looking for. > > Tina > > > > > > > > On Mon, Jun 20, 2011 at 11:32 PM, Jim Silverton<jim.silverton at gmail.com> wrote: >> I a using plink on a large SNP dataset with a .map and .ped file. >> I want to get some sort of file say a list of all the SNPs that plink is >> saying that I have. ANyideas on how to do this? >> >> -- >> Thanks, >> Jim. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Resending to correct bad subject line... On Mon, 20 Jun 2011, Jim Silverton wrote:> I a using plink on a large SNP dataset with a .map and .ped file. I want > to get some sort of file say a list of all the SNPs that plink is saying > that I have. ANyideas on how to do this?All the SNPs you have are listed in the .map file. An easy way to put the data in to R, if there isn't too much, is to do this: plink --file whatever --out whatever --recodeA That will make a file called whatever.raw, single space delimited, consisting of minor allele counts (0, 1, 2, NA) that you can bring into R like this: data <- read.table("whatever.raw", delim=" ", header=T) If you have tons of data, you'll want to work with the compact binary format (four genotypes per byte): plink --file whatever --out whatever --make-bed Then see David Duffy's reply. However, I'm not sure if R can work with the compact format in memory. It might expand those genotypes (minor allele counts) from two-bit integers to double-precision floats. What does read.plink() create in memory? There is another package I've been meaning to look at that is supposed to help with the memory management problem for large genotype files: http://cran.r-project.org/web/packages/ff/ I haven't used it yet, but I am hopeful. Maybe David Duffy or someone else here will know more about it. If you have a lot of data, also consider chopping the data into pieces before loading it into R. That's what we do. With a 100 core system, I break the data into 100 files (I use the GNU/Linux "split" command and a few other tricks) and have all 100 cores run at once to analyze the data. When I work with genotype data as allele counts using Octave, I store the data, both in files and in memory, as unsigned 8-bit integers, using 3 as the missing value. That's still inefficient compared to the PLINK system, but it is way better than using doubles. Best, Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota