Displaying 20 results from an estimated 200 matches similar to: "Working with massive matrices in R"
2012 Oct 23
1
factor or character
Hi,
The program below work very well.
(snps = c('rs621782_G', 'rs8087639_G', 'rs8094221_T', 'rs7227515_A',
'rs537202_C'))
Selec = todos[ , colnames(todos) %in% snps]
head(Selec)
But, I have a data set with 1.000 columns and I need extract 70 to use
(like snps in command above).
This 70 snps are in a file. So I create a file to extract them with
2012 Mar 14
3
Needing a better solution to a lookup problem.
I have a solution (actually a few) to this problem, but none are computationally efficient enough to be useful. I'm hoping someone can enlighten me to a better solution.
I have data frame of chromosome/position pairs (along with other data for the location). For each pair I need to determine if it is with in a given data frame of ranges. I need to keep only the pairs that are within any of
2013 Nov 08
1
SNPRelate: Plink conversion
Hi,
Following my earlier posts about having problems performing a PCA, I have
worked out what the problem is. The problem lies within the PLINK to gds
conversion.
It seems as though the SNPs are imported as "samples" and in turn, the
samples are recognised as SNPs:
>snpsgdsSummary("chr2L")
Some values of snp.position are invalid (should be > 0)!
Some values of
2014 Jul 21
1
Multiple versions of data in a package
Dear R-devel,
I am writing for help on how I should include parallel sets of data in
my package.
Brief summary: I am new to using data within packages. I want a user to
be able to specify one of two alternative versions of within-package
datasets to use, and I want to load just that one. I have a solution
that works, but it doesn't seem as simple as it should be from a user's
2007 Feb 05
3
RSNPper SNPinfo and making it handle a vector
If I run an analysis which generates statistical tests on many SNPs I would
naturally want to get more details on the most significant SNPs. Directly
from within R one can get the information by loading RSNPer (from
Bioconductor) and simply issuing a command SNPinfo(2073285). Unfortunately,
the command cannot handle a vector and therefore only wants to do one at a
time.
I tried the lapply and
2004 Feb 19
1
piece wise application of functions
Dear all,
After struggling for some time with *apply() and eva() without
success, I decided to ask for help.
I have 3 lists labeled with, each contains 3 different
interpolation functions with identical names:
> names(missgp0)
[1] "spl.1mb" "spl.2mb" "spl.5mb"
>
> names(missgp1)
[1] "spl.1mb" "spl.2mb" "spl.5mb"
>
>
2012 Aug 24
0
A question about GRAMMAR calculations in the FAM_MDR algorithm
Dear R developers:
I am a PHD candidate student in the school of public health of Peking
University and my major is genetic epidemiology. I am learning the FAM-MDR
algorithm, which is used to detect the gene-gene and gene-environment
interactions in the data of pedigree. The codes were written by Tom
Cattaert of the University of Liege. The algorithms and the sample datasets
are available at
2011 Jun 21
4
Re; Getting SNPS from PLINK to R
I a using plink on a large SNP dataset with a .map and .ped file.
I want to get some sort of file say a list of all the SNPs that plink is
saying that I have. ANyideas on how to do this?
--
Thanks,
Jim.
[[alternative HTML version deleted]]
2005 Apr 13
1
logistic regression weights problem
Hi All,
I have a problem with weighted logistic regression. I have a number of
SNPs and a case/control scenario, but not all genotypes are as
"guaranteed" as others, so I am using weights to downsample the
importance of individuals whose genotype has been heavily "inferred".
My data is quite big, but with a dummy example:
> status <- c(1,1,1,0,0)
> SNPs <-
2011 Jun 21
1
Getting SNPS from PLINK to R
snpMatrix package is quite nice (read.plink())
2006 Apr 06
4
Reshaping genetic data from long to wide
Bottom Line Up Front: How does one reshape genetic data from long to wide?
I currently have a lot of data. About 180 individuals (some
probands/patients, some parents, rare siblings) and SNP data from 6000 loci
on each. The standard formats seem to be something along the lines of Famid,
pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2,
locus2Allele1, locus2Allele2, etc
In other
2013 Oct 03
1
prcomp - surprising structure
Hello,
I did a pca with over 200000 snps for 340 observations (ids). If I plot the
eigenvectors (called rotation in prcomp) 2,3 and 4 (e.g. plot
(rotation[,2]) I see a strange "column" in my data (see attachment). I
suggest it is an artefact (but of what?).
Suggestion:
I used prcomp this way: prcomp (mat), where mat is a matrix with the column
means already substracted followed by a
2011 Dec 13
0
snpStats imputed SNP probabilities
Hi,
Does anybody know how to obtain the imputed SNP genotype probabilities from the snpStats package?
I am interested in using an imputation method implemented in R to be further used in a simulation study context.
I have found the snpStats package that seems to contain suitable functions to do so.
As far as I could find out from the package vignette examples and its help, it gives the
2010 Nov 09
0
haplotype and epistasis analysis using 3 or more SNPs?
Dear Mme/Mr.
Hope you are doing well. I am doing some genetic analysis using The R software and I have difficulties to find how I can perform an Interaction/epistasis analysis using 3 or more SNPs (=markers) ? (In the instructive manual, there is only an interaction/epistasis analysis with 2 markers).
In addition can you please inform me how I can perform Haplotype analysis and if there is an
2020 Oct 29
1
R: sim1000G
Hi,
I am using the sim1000G R package to simulate data for case/control study.
I can not figure out how to manipulate this code to be able to generate 10%
or 50% causal SNPs in R.
This is whole code provided as example on GitHub:
library(sim1000G)
vcf_file = "region-chr4-357-ANK2.vcf.gz" #nvariants = 442, ss=1000
vcf = readVCF( vcf_file, maxNumberOfVariants = 442 ,min_maf =
2011 Feb 03
1
bug in codetools/R CMD check?
Hi Mr Tierney,
I have noticed an error message from R 1.12.x's CMD check for a while (apparently prof Ripley completely rewrote CMD check in R 1.12+)
e.g.:
http://bioconductor.org/checkResults/2.7/bioc-LATEST/snpMatrix/lamb2-checksrc.html
----------------
* checking R code for possible problems ... NOTE
Warning: non-unique value when setting 'row.names': ?new?
Error in
2005 Mar 04
0
Is aggregate() what I need here?
I'm pretty new to R, and I've been given a script by a user who wants
some help with it. I know enough about the way R works to know that
this is a very inefficient way to do what the user wants (the
LSB_JOBINDEX stuff is added by me so that this can work on many
hundreds of input data files as LSF jobs - it's the nested loops I'm
really interested in):
2011 Jul 27
1
SNP Tables
Hello,
I have indicators for the present of absent of a snps in columns and the
categorey (case control column). I would like to extract ONLY the tables and
the indices (SNPS) that give me 2 x 3 tables. Some gives 2x 2 tables when
one of the allelle is missing. The data look like the matrix snpmat below:
so the first snp should give me the following table: (aa=0, Aa=1 and AA=2)
aa
2012 Mar 12
1
Speeding up lots of calls to GLM
Dear useRs,
First off, sorry about the long post. Figured it's better to give context
to get good answers (I hope!). Some time ago I wrote an R function that
will get all pairwise interactions of variables in a data frame. This
worked fine at the time, but now a colleague would like me to do this with
a much larger dataset. They don't know how many variables they are going to
have in the
2011 Jul 14
2
R package: pbatR
Dear All,
Does anybody have experience with R package pbatR
(http://cran.r-project.org/web/packages/pbatR/index.html)? I am trying to
use it to analyze the family-based case-control data, but the package
totally doesn?t work on my computer. I contacted the authors of the package,
but I haven?t heard anything from them.
Following the package manual, I tried the simple example as below: