thr3ads.net - similar to: "Request for help on manipulation large data sets"

Displaying 20 results from an estimated 9000 matches similar to: "Request for help on manipulation large data sets"

2011 May 31

correlatation matrix

Hi there, I wonder if there is a way of efficiently generating a correlation matrix of two expression matrices. I want to correlate miRNA and mRNA expression and used the following code: ##dat.mi miRNA expression matrix, dat.m mRNA expression matrix nc <- nrow(dat.mi) cor.mat <- data.frame(rep(NA,nrow(dat.m))) pval.mat <- data.frame(rep(NA,nrow(dat.m))) for(i in 1:nc) { cr <- vector()

Plotting question: how to plot SNP location data?

2007 Oct 30

Plotting question: how to plot SNP location data?

Hello, I would like to plot specific SNPs with their exact locations on a chromosome. Based on my genotyping results I would like to separate these SNPs in three different categories: 1, 2 and 3 and use different colours to represent these categories. The script below generates the sample data. I can plot these with the image function using the following: val <- 1:3 samp <- sample(val,

snpStats imputed SNP probabilities

2011 Dec 13

snpStats imputed SNP probabilities

Hi, Does anybody know how to obtain the imputed SNP genotype probabilities from the snpStats package? I am interested in using an imputation method implemented in R to be further used in a simulation study context. I have found the snpStats package that seems to contain suitable functions to do so. As far as I could find out from the package vignette examples and its help, it gives the

SNPRelate: Plink conversion

2013 Nov 08

SNPRelate: Plink conversion

Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as "samples" and in turn, the samples are recognised as SNPs: >snpsgdsSummary("chr2L") Some values of snp.position are invalid (should be > 0)! Some values of

2 D density plot interpretation and manipulating the data

2020 Oct 08

2 D density plot interpretation and manipulating the data

Hello, I have a data frame like this: > head(SNP) mean var sd FQC.10090295 0.0327 0.002678 0.0517 FQC.10119363 0.0220 0.000978 0.0313 FQC.10132112 0.0275 0.002088 0.0457 FQC.10201128 0.0169 0.000289 0.0170 FQC.10208432 0.0443 0.004081 0.0639 FQC.10218466 0.0116 0.000131 0.0115 ... and I am creating plot like this: s <- ggplot(SNP, mapping = aes(x = mean, y = var))

RSNPper SNPinfo and making it handle a vector

2007 Feb 05

RSNPper SNPinfo and making it handle a vector

If I run an analysis which generates statistical tests on many SNPs I would naturally want to get more details on the most significant SNPs. Directly from within R one can get the information by loading RSNPer (from Bioconductor) and simply issuing a command SNPinfo(2073285). Unfortunately, the command cannot handle a vector and therefore only wants to do one at a time. I tried the lapply and

Using PCA to correct p-values from snpMatrix

2011 Jan 03

Using PCA to correct p-values from snpMatrix

Hi R-help folks, I have been doing some single SNP association work using snpMatrix. This works well, but produces a lot of false positives, because of population structure in my data. I would like to correct the p-values (which snpMatrix gives me) for population structure, possibly using principle component analysis (PCA). My data is complicated, so here's a simple example of what

2 D density plot interpretation and manipulating the data

2020 Oct 09

2 D density plot interpretation and manipulating the data

My understanding is that this represents bivariate normal approximation of the data which uses the kernel density function to test for inclusion within a level set. (please correct me) In order to exclude the outlier to these ellipses/contours is it advisable to do something like this: SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. Median Mean 3rd

"drop if missing" command?

2010 Feb 12

"drop if missing" command?

This will probably seem very simple to experienced R programmers: I am doing a snp association analysis and am at the model-fitting stage. I am using the Stats package's "drop1" with the following code: ##geno is the dataset ## the dependent variable (casectrln) is dichotomous and coded 0,1 ## rs743572_2 is one of the snps (which is coded 0,1,2 for the 3 genotypes)

factor or character

2012 Oct 23

factor or character

Hi, The program below work very well. (snps = c('rs621782_G', 'rs8087639_G', 'rs8094221_T', 'rs7227515_A', 'rs537202_C')) Selec = todos[ , colnames(todos) %in% snps] head(Selec) But, I have a data set with 1.000 columns and I need extract 70 to use (like snps in command above). This 70 snps are in a file. So I create a file to extract them with

bug in codetools/R CMD check?

2011 Feb 03

bug in codetools/R CMD check?

Hi Mr Tierney, I have noticed an error message from R 1.12.x's CMD check for a while (apparently prof Ripley completely rewrote CMD check in R 1.12+) e.g.: http://bioconductor.org/checkResults/2.7/bioc-LATEST/snpMatrix/lamb2-checksrc.html ---------------- * checking R code for possible problems ... NOTE Warning: non-unique value when setting 'row.names': ?new? Error in

minor allele frequency comparison

2011 Dec 09

minor allele frequency comparison

Hi all, We are using two methods to identify SNPs. One is based on resequencing the genome and aligning the reads to the sequenced genome to identify SNPs (data available for 44 individuals). Another is based on SNP array with selected loci (30000 loci, 870 individuals). I want to compare the results from the resequencing based minor allele frequency and Array based minor allele frequency.

efficient code. how to reduce running time?

2007 Jan 21

efficient code. how to reduce running time?

Hi, I am new to R. and even though I've made my code to run and do what it needs to . It is taking forever and I can't use it like this. I was wondering if you could help me find ways to fix the code to run faster. Here are my codes.. the data set is a bunch of 0s and 1s in a data.frame. What I am doing is this. I pick a column and make up a new column Y with values associated with that

creating a loop for multiple file

2012 Feb 23

creating a loop for multiple file

Hi all, need help very urgently I did stepwise logistic regression for 35 covariates and added one SNP out of (500000) to get the best model for each model As my professor asked me using this command, outfiles <- paste(colnames(snps), ".txt", sep="") # list of output files for the best models for(i in 1:ncol(snps)) { model <- glm (Pheno~var1+var2+var3+..(all

SNP Tables

2011 Jul 27

SNP Tables

Hello, I have indicators for the present of absent of a snps in columns and the categorey (case control column). I would like to extract ONLY the tables and the indices (SNPS) that give me 2 x 3 tables. Some gives 2x 2 tables when one of the allelle is missing. The data look like the matrix snpmat below: so the first snp should give me the following table: (aa=0, Aa=1 and AA=2) aa

Needing a better solution to a lookup problem.

2012 Mar 14

Needing a better solution to a lookup problem.

I have a solution (actually a few) to this problem, but none are computationally efficient enough to be useful. I'm hoping someone can enlighten me to a better solution. I have data frame of chromosome/position pairs (along with other data for the location). For each pair I need to determine if it is with in a given data frame of ranges. I need to keep only the pairs that are within any of

how to use GenABEL genetic information??

2010 May 28

how to use GenABEL genetic information??

Does anyone use the R library GenABEL? I am using it to calculate SNP interactions. I have a list of 100 SNPs, I need to look at the interaction between each of two SNPs among the list. my question is how to perform this in GenABEL. I want to use the "lm" function, but don't know how to use the SNP information. for example: result <- (lm(y~SNP1+SNP2+SNP1*SNP2)) the problem here

2 D density plot interpretation and manipulating the data

2020 Oct 09

2 D density plot interpretation and manipulating the data

Hi Abby, thank you for getting back to me and for this useful information. I'm trying to detect the outliers in my distribution based of mean and variance. Can I see that from the plot I provided? Would outliers be outside of ellipses? If so how do I extract those from my data frame, based on which parameter? So I am trying to connect outliers based on what the plot is showing: s <-

2 D density plot interpretation and manipulating the data

2020 Oct 09

2 D density plot interpretation and manipulating the data

Hi Abby, Thanks for getting back to me, yes I believe I did that by doing this: SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 383 696 738 1170 1789 where get_density() is function from here: https://slowkow.com/notes/ggplot2-color-by-density/ and keep only entries with density > 400

cat bailing out in a for loop

2005 Apr 05

cat bailing out in a for loop

Dear All, I am trying to calculate the Hardy-Weinberg Equilibrium p-value for 42 SNPs. I am using the function HWE.exact from the package "genetics". In order not to do a lot of coding "by hand", I have a for loop that goes through each column (each column is one SNP) and gives me the p.value for HWE.exact. Unfortunately some SNP have reached fixation and HWE.exact requires a

similar to: Request for help on manipulation large data sets