thr3ads.net - search: "snp"

Memory problem on a linux cluster using a large data set

2006 Dec 18

1

Memory problem on a linux cluster using a large data set

...here a way to change the settings or processor under R? I want to run the function Random Forest on my large data set it should be able to cope with that amount. Perhaps someone has tried this before in R or is Fortram a better choice? I added my R script down below. Best regards, Iris Kolder SNP <- read.table("file.txt", header=FALSE, sep="") # read in data file SNP[SNP==9]<-NA # change missing values from a 9 to a NA SNP$total.NAs = rowSums(is.na(SN # calculate the number of NA per row and adds a colum with total Na...

help in R

2006 Apr 26

2

help in R

Hi, I cant understand where I am going wrong.Below is my code.I would really appreciate your help. Thanks. > genfile<-read.table("c:/tina/phd/bs871/hw/genfile.txt",skip=1) > > #read in SNP data > snp.dat <- as.matrix(genfile) > snp.name <- scan("c:/tina/phd/bs871/hw/genfile.txt",nline=1,what="character") Read 100 items > n.snp <- length(snp.name) > n.id <- 1 #number of fields for ids, sex and affection status > > ###form gntp using t...

2 x 3 Probability under the null

2011 Oct 27

3

2 x 3 Probability under the null

I have a 2 x 3 matrix called snp and I want to compute the following probability: choose(sum(snp[,1]), snp[1,1]) * choose(sum(snp[,2]), snp[1,2]) * choose(sum(snp[,3]), snp[1,3])/choose(sum(snp), sum(snp[1,])) but I keep getting Infs and NaNs. Is there a function that can do this in R? -- Thanks, Jim. [[alternative HTML ve...

Errors melt()ing data...

2008 Feb 28

1

Errors melt()ing data...

Hi, I'm trying to melt() some data for subsequent cast()ing and am encoutering errors. The overall process requires a couple of casts()s and melt()s. ########Start Session 1########## ## I have the data in a (fully) melted format and can cast it fine... > norm1[1:10,] Pool SNP Sample.Name variable value 1 1 rs1045485 CA0092 Height.1 0.003488853 2 1 rs1045485 CA0142 Height.2 0.333274200 3 1 rs1045485 CO0007 Height.2 0.396250961 4 1 rs1045485 CA0047 Height.2 0.535686831 5 1 rs1045485 CO0149 Height.2 0.296611673 6 1 rs1...

efficient code. how to reduce running time?

2007 Jan 21

2

efficient code. how to reduce running time?

...es ) { a <- anova(lm(newY~factor(newX[,i]))); F[i] <- a$`F value`[1]; } MSSid <- which (F == max(F)); # index of MSS (Most Significant Site) maxF = cbind(maxF,max(F)); } maxF; } # set the output file sink("/tmp/R.out.3932.100") # load the dataset snp = read.table(file("/tmp/msoutput.3932.100")) #print (snp); # pi: desired proportion of variation due to QTN pi = 0.05; print (paste("pi:", pi)); MAF = 0.05; print (paste("MAF:", MAF)); # S: number of segregating sites S = length(snp[1,]); # N: number of samples N = le...

Plotting question: how to plot SNP location data?

2007 Oct 30

0

Plotting question: how to plot SNP location data?

Hello, I would like to plot specific SNPs with their exact locations on a chromosome. Based on my genotyping results I would like to separate these SNPs in three different categories: 1, 2 and 3 and use different colours to represent these categories. The script below generates the sample data. I can plot these with the image function usi...

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

2007 Jan 10

1

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

...more sensitive to big-data issues and tracking down > unnecessary memory copying. > > > "cannot allocate vector size 1240 kb". I've searched through > > use traceback() or options(error=recover) to figure out where > this is actually occurring. > > > SNP <- read.table("file.txt", header=FALSE, sep="") # > read in data file > > This makes a data.frame, and data frames have several aspects > (e.g., automatic creation of row names on sub-setting) that > can be problematic in terms of memory use. Probably be...

Index out SNP position

2013 Jan 03

4

Index out SNP position

Dear R experts, I have 2 matix: A& B. I am trying to index B against A - (1) find out B rows that fall between the col 1 and 2 of A& put them into a new vector SNP.I made code as below, but I cannot think of a right way to do it. Could anyone help me with the code? Thanks,Jiang---- A <- matrix(c(35838396,35838674,36003908,36004090,36150188,36151202,35838584,35838674,36003908,36003992), ncol = 2) B <- matrix(c(36003918,35838399,35838589,36262559),ncol...

Memory problem on a linux cluster using a large data set [Broadcast]

2006 Dec 21

1

Memory problem on a linux cluster using a large data set [Broadcast]

...more sensitive to big-data issues and tracking down > unnecessary memory copying. > > > "cannot allocate vector size 1240 kb". I've searched through > > use traceback() or options(error=recover) to figure out where > this is actually occurring. > > > SNP <- read.table("file.txt", header=FALSE, sep="") # > read in data file > > This makes a data.frame, and data frames have several aspects > (e.g., automatic creation of row names on sub-setting) that > can be problematic in terms of memory use. Probably be...

splitting multiple data in one column into multiple rows with one entry per column

2009 Jul 26

1

splitting multiple data in one column into multiple rows with one entry per column

Dear R colleagues, I annotated a list of single nuclotide polymorphiosms (SNP) with the corresponding genes using biomaRt. The result is the following data.frame (pasted from R): snp ensembl_gene_id 1 rs8032583 2 rs1071600 ENSG00000101605 3 rs13406898 ENSG000001671...

Using PCA to correct p-values from snpMatrix

2011 Jan 03

0

Using PCA to correct p-values from snpMatrix

Hi R-help folks, I have been doing some single SNP association work using snpMatrix. This works well, but produces a lot of false positives, because of population structure in my data. I would like to correct the p-values (which snpMatrix gives me) for population structure, possibly using principle component analysis (PCA). My data is complica...

automating regression or correlations for many variables

2011 Apr 04

1

automating regression or correlations for many variables

Dear All, I have a large data frame with 10 rows and 82 columns. I want to apply the same function to all of the columns with a single command. e.g. zl <- lm (snp$a_109909 ~ snp$lat) will fit a linear model to the values in lat and a_109909. What I want to do is fit linear models for the values in each column against lat. I tried doing zl <- (snp[,2:82] ~ snp$lat[,1]) but got the following error message "Error in model.frame.default(formula = snp[,...

FW: Index out SNP position

2013 Jan 04

0

FW: Index out SNP position

...ect. That way you are not modifying B. This will be faster than checking the order of the columns in A each time you process a line from B. > Ax <- t(apply(A, 1, function(x) c(min(x), max(x)))) > indx <- sapply(1:nrow(B), function(i) any(B[i]>Ax[,1] & B[i]<Ax[,2])) > SNP <- B[indx] > SNP [1] 36003918 35838399 35838589 -------------------- David C > From: JiangZhengyu [mailto:zhyjiang2006 at hotmail.com] > Sent: Friday, January 04, 2013 9:03 AM > To: dcarlson at tamu.edu > Subject: RE: [R] Index out SNP position > > Hi David, > >...

SNPRelate: Plink conversion

2013 Nov 08

1

SNPRelate: Plink conversion

Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as "samples" and in turn, the samples are recognised as SNPs: >snpsgdsSummary("chr2L") Some values of snp.position are invalid (should be > 0)! Some values of snp.chromosome are invalid (should be finite and >=1)! Some of snp.allele are not standard! E.g,...

Order a data frame based on the order of another data frame

2012 Mar 05

1

Order a data frame based on the order of another data frame

Hi, I am trying to match the order of the rownames of a dataframe with the rownames of another dataframe (I can't simply sort both sets because I would have to change the order of many other connected datasets if I did that): Also, the second dataset (snp.matrix$fam) is a snp matrix slot: so for example: data_one: x y z sample_1110001 -0.3352623 -1.141462 -0.4032494 sample_1110005 0.1862424 0.015944 0.1329059 sample_1110420 0.1309120 0.004005596...

R coding to extract allele frequencies from NCBI for ALL alleles of one SNP?

2024 Nov 15

1

R coding to extract allele frequencies from NCBI for ALL alleles of one SNP?

Dear All, The following code extracts from NCBI very nice output for ONE allele of a SNP (often the allele with the second largest frequency - usually termed the minor allele). It gives an average minor allele frequency from all NCBI sources (which is what I want, except I'd like the addition of data for all the other alleles of one SNP) plus a table of minor allele frequencies fro...

Replacing multiple elements in a vector !

2009 Oct 22

2

Replacing multiple elements in a vector !

Hi, I have a vector with elements rs.id=c(''rs100'',''rs101'',''rs102'',''rs103'') And a dataframe ''snp.id'' 1 SNP_100 rs100 2 SNP_101 rs101 3 SNP_102 rs102 4 SNP_103 rs103 Task is to replace rs.id vector with corresponding ''SNP_'' ids in snp.id. Thanks in a...

2 D density plot interpretation and manipulating the data

2020 Oct 09

0

2 D density plot interpretation and manipulating the data

Hi Abby, Thanks for getting back to me, yes I believe I did that by doing this: SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 383 696 738 1170 1789 where get_density() is function from here: https://slowkow.com/notes/ggplot2-color-by-density/ and keep only entries with density...

2 D density plot interpretation and manipulating the data

2020 Oct 09

3

2 D density plot interpretation and manipulating the data

...that from the plot I provided? Would outliers be > >> outside of ellipses? If so how do I extract those from my data frame, > >> based on which parameter? > >> > >> So I am trying to connect outliers based on what the plot is showing: > >> s <- ggplot(SNP, mapping = aes(x = mean, y = var)) > >> s <- s + geom_density_2d() + geom_point() + my.theme + ggtitle("SNPs") > >> > >> versus what is in the data: > >> > >> > head(SNP) > >> mean var sd > >> FQ...

Problems reshaping data with cast()

2008 Feb 07

1

Problems reshaping data with cast()

Hi, I'm trying to cast() some data, but keep on getting the following error... > norm.all.melted.height <- transform(all.melted.height, + norm.height = value / ave(value, SNP, Pool, FUN = max) + ) Warning messages: 1: In FUN(X[[147L]], ...) : no non-missing arguments to max; returning -Inf 2: In FUN(X[[147L]], ...) : no non-missing arguments to max; returning -Inf 3: In FUN(X[[147L]], ...) : no non-missing arguments to max; retu...

search for: snp