thr3ads.net - similar to: "data extraction"

Displaying 20 results from an estimated 10000 matches similar to: "data extraction"

2006 May 22

editing a big file

I have a file that has 90 columns and 20,000 rows and looks like C/G CC GG CG G/T GG TT GT C/T CC TT CT A/G AA GG AG A/C AA CC AC A/T AA TT AT I want to write a code that will read through each row first the first looks at the first column and then replace the three columns with 12 if it is the same as the first column e.g. third column 11 if it is a repeat of the first alphabet like the

managing data

2006 Jun 17

managing data

Dear mailing list, may some one be kind to help me solve following problem. I am trying to write a code that will combine two tables "x" and "y". The first columns of both tables are unique identification for the rows. The first column of table "X" is a sub set of the first column of "Y". I need to find the matching rows in both tables by looking on their

spliting

2006 Jul 28

spliting

Dear mailing list, I have a big data frame and each element in the matrix has two alphabets. I want to split those alphabets into two so each element will have one alphabet and the number of my columns will be doubled . So can some one help with the code? Example of what I want is to split them. Input (three column) GG AG AG CC CC CC CC CC CC AG

transposing a big data file

2006 May 09

transposing a big data file

I HAVE A VERY BIG DATA OF 67 COLMS AND 25000 ROWS AND WOULD LIKE TO TRANSPOSE IT THE R HELP WAS NOT ENOUGH INFORMATION BECOUSE I AM NOT A PROGRAMMER AND FIRST TIME R USER. SO CAN YOU GIVE SOME HINTS OF CODING, AA TT GG GG CC AA TT GG GG CC AA TT GG GG CC AA TT GG GG CC AA TT GG GG CC TO AA AA AA AA AA TT TT TT TT TT GG GG GG GG GG GG GG GG GG GG CC CC CC CC CC [[alternative HTML

replacing a factor value in a data frame

2005 Oct 28

replacing a factor value in a data frame

Hi All, I have the following problem, that's driving me mad. I have a dataframe of factors, from a genetic scan of SNPs. I DO have NAs in the dataframe, which would look like: V4 V5 V6 V7 V8 V9 V10 1 TT GG TT AC AG AG TT 2 AT CC TT AA AA AA TT 3 AT CC TT AC AA <NA> TT 4 TT CC TT AA AA AA TT 5 AT CG TT CC AA AA TT 6 TT CC TT AA AA AA TT 7 AT CC

how to count "A","C","T","G" in each row in a big data.frame?

2013 Jan 09

how to count "A","C","T","G" in each row in a big data.frame?

Dear All I have a data.frame like that: structure(list(name = c("Gga_rs10722041", "Gga_rs10722249", "Gga_rs10722565", "Gga_rs10723082", "Gga_rs10723993", "Gga_rs10724555", "Gga_rs10726238", "Gga_rs10726461", "Gga_rs10726774", "Gga_rs10726967", "Gga_rs10727581", "Gga_rs10728004",

rho stat from a fasta sequence file

2012 Jan 16

rho stat from a fasta sequence file

Hi all, I have a sequence file (fasta format) and want to calculate the rho statistics for dinucleotide abundance value on my data.. the code which I use is (using seqinr library and current working directory) seq_info<-read.fasta("gene.txt") rho(seq_info[1],2) but it yields only the dinucleotides, not their rho values, i.e, > rho(seq_info[1],2) aa ac ag at ca cc cg ct ga gc

column to row

2006 Aug 14

column to row

Dear mailing list I have a data in two columns and how can i convert it to one row . thank you in advance inpute 1 2 3 4 5 6 7 8 9 1 out put 1 2 3 4 5 6 7 8 9 1 [[alternative HTML version deleted]]

Sum of character vector

2009 Mar 30

Sum of character vector

Dear list, I am trying to evaluate how many elements in a vector equal a certain value. The vectors are the columns of a data.frame, read in using read.table(): > dim(data) [1] 2600 742 > data[1:5,1:5] SNP001 SNP002 SNP003 SNP004 SNP005 1 GG AA TT TT GG 2 GG AA TC TT GG 3 GG AC CC TT GG 4 AG AA TT TT GG 5

replace string values with numbers

2012 Sep 26

replace string values with numbers

Hi everyone, I have a data frame Gene with SNPs eg. P1 P2 P3 CG CG GG -- -- AC -- AC CC AC -- AC I tried to replace all the GG with a value 3. Gene[Gene=="GG"]<-3 It always give me: Warning in `[<-.factor`(`*tmp*`, thisvar, value = 3) : invalid factor level, NAs generated Does any know if there is anything wrong with my code? Thanks, Zhengyu

Filling in empty arrays/lists from using "paste" function

2009 Aug 25

Filling in empty arrays/lists from using "paste" function

Dear R users, I am trying to fill in arrays (5 different according to distinct "id") from objects produced from arbitrary data set below. a <-

Help needed!

2011 Apr 20

Help needed!

Hi everyone, I have a question. Now I am reading the resource code of the package "ssfcov". The resource code is as following. I cannot find the resource code of the function "myss2d" anywhere in the package. Can anyone give me a hint how to find it in the package. Thanks a lot!!bv > ssfcov function (time, x, subject, nbasis = 5, centered = FALSE, noDiag = TRUE) {

Counting occurances of a letter by a factor

2010 Sep 10

Counting occurances of a letter by a factor

I'm trying to find a more elegant way of doing this. What I'm trying to accomplish is to count the frequency of letters (major / minor alleles) in a string grouped by the factor levels in another column of my data frame. Ex. > DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L",

overlay two histograms ggplot

2017 Dec 13

overlay two histograms ggplot

Hi all, How can I overlay these two histograms? ggplot(gg, aes(gg$Alz, fill = gg$veg)) + geom_histogram(alpha = 0.2) ggplot(tt, aes(tt$Cont, fill = tt$veg)) + geom_histogram(alpha = 0.2) thanks for any help! Elahe

Determining variance components of classed covariates

2009 Jan 12

Determining variance components of classed covariates

Hi - I am interested in solving variance components for the data below with respect to the response variable, Expression within R. However, the covariates aren't independent and they also have a class (of which the total variance explained by covariates in that class I am most interested in). Very naively, I have tried to look at each individual covariates variance like this >

error with adaboost: replacement has 186 rows, data has 62

2010 Mar 09

error with adaboost: replacement has 186 rows, data has 62

Hi, all, When running > AB.fit=adaboost(ylearn, xlearn, xtest, presel=0) I got the following error: Error in `[[<-.data.frame`(`*tmp*`, preds, value = c(4L, 6L, 6L, 6L, 3L, : replacement has 186 rows, data has 62 The data structure is attached below: [1] "ylearn" [1] 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 [40] 1 1 1 1 1 1 1 0

error when using svm routine: Error in if (any(co)) { : missing value where TRUE/FALSE needed

2010 Mar 08

error when using svm routine: Error in if (any(co)) { : missing value where TRUE/FALSE needed

Hi, I met with this error message with the following data set. Do you know how to resolve it? Thanks. > data<-read.table("c://temp3//abc.csv", sep = ",", header=T) > classwt<-c( 0.5806452, 0.4193548) > y<-data[,1] > x<-data[,2:ncol(data)] > print(y) [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 [36] 1 1 1 1 2 2 2 2 2 2

questions about string handling

2010 Aug 05

questions about string handling

Hi, I have a question about the data handling. I have a dataset as following: ID snp1 snp2 snp3 1001 0/0 1/1 1/1 1002 2/2 3/3 1/1 1003 4/4 3/3 2/2 I want to convert the dataset to the following format: ID snp1 snp2 snp3 1001 00 AA AA 1002 GG

Recoding variables based on reference values in data frame

2013 Jul 02

Recoding variables based on reference values in data frame

I'm new to R (previously used SAS primarily) and I have a genetics data frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3, ...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of the data is shown below: SNP_ID SNP1 SNP2 SNP3 SNP4 Maj_Allele C G C A Min_Allele T A T G ID1 CC GG CT AA ID2 CC GG CC AA ID3 CC GG nc AA

strsplit for multiple columns

2009 Jun 03

strsplit for multiple columns

Hi, I am trying to split multiple columns. One column works just fine, but I want to do it for multiple columns??? Example > a ID V2 V3 V4 V5 V6 V7 V8 V9 V10 1 PBBA0644 -- GG AA -- AA -- AA GG GG 2 PBBA1010 -- GG AA -- AA -- AA GG GG 3 0127ATPR -- GG AA -- AA -- AA GG GG 4 0128EHAB -- GG AA -- AG -- AA AG GG 5 PBBA0829 -- GG AA -- AA -- AA GG AG

similar to: data extraction