similar to: data extraction

Displaying 20 results from an estimated 10000 matches similar to: "data extraction"

2006 May 22
1
editing a big file
I have a file that has 90 columns and 20,000 rows and looks like C/G CC GG CG G/T GG TT GT C/T CC TT CT A/G AA GG AG A/C AA CC AC A/T AA TT AT I want to write a code that will read through each row first the first looks at the first column and then replace the three columns with 12 if it is the same as the first column e.g. third column 11 if it is a repeat of the first alphabet like the
2006 Jun 17
2
managing data
Dear mailing list, may some one be kind to help me solve following problem. I am trying to write a code that will combine two tables "x" and "y". The first columns of both tables are unique identification for the rows. The first column of table "X" is a sub set of the first column of "Y". I need to find the matching rows in both tables by looking on their
2006 Jul 28
1
spliting
Dear mailing list, I have a big data frame and each element in the matrix has two alphabets. I want to split those alphabets into two so each element will have one alphabet and the number of my columns will be doubled . So can some one help with the code? Example of what I want is to split them. Input (three column) GG AG AG CC CC CC CC CC CC AG
2006 May 09
1
transposing a big data file
I HAVE A VERY BIG DATA OF 67 COLMS AND 25000 ROWS AND WOULD LIKE TO TRANSPOSE IT THE R HELP WAS NOT ENOUGH INFORMATION BECOUSE I AM NOT A PROGRAMMER AND FIRST TIME R USER. SO CAN YOU GIVE SOME HINTS OF CODING, AA TT GG GG CC AA TT GG GG CC AA TT GG GG CC AA TT GG GG CC AA TT GG GG CC TO AA AA AA AA AA TT TT TT TT TT GG GG GG GG GG GG GG GG GG GG CC CC CC CC CC [[alternative HTML
2005 Oct 28
3
replacing a factor value in a data frame
Hi All, I have the following problem, that's driving me mad. I have a dataframe of factors, from a genetic scan of SNPs. I DO have NAs in the dataframe, which would look like: V4 V5 V6 V7 V8 V9 V10 1 TT GG TT AC AG AG TT 2 AT CC TT AA AA AA TT 3 AT CC TT AC AA <NA> TT 4 TT CC TT AA AA AA TT 5 AT CG TT CC AA AA TT 6 TT CC TT AA AA AA TT 7 AT CC
2013 Jan 09
4
how to count "A","C","T","G" in each row in a big data.frame?
Dear All I have a data.frame like that: structure(list(name = c("Gga_rs10722041", "Gga_rs10722249", "Gga_rs10722565", "Gga_rs10723082", "Gga_rs10723993", "Gga_rs10724555", "Gga_rs10726238", "Gga_rs10726461", "Gga_rs10726774", "Gga_rs10726967", "Gga_rs10727581", "Gga_rs10728004",
2012 Jan 16
1
rho stat from a fasta sequence file
Hi all, I have a sequence file (fasta format) and want to calculate the rho statistics for dinucleotide abundance value on my data.. the code which I use is (using seqinr library and current working directory) seq_info<-read.fasta("gene.txt") rho(seq_info[1],2) but it yields only the dinucleotides, not their rho values, i.e, > rho(seq_info[1],2) aa ac ag at ca cc cg ct ga gc
2006 Aug 14
3
column to row
Dear mailing list I have a data in two columns and how can i convert it to one row . thank you in advance inpute 1 2 3 4 5 6 7 8 9 1 out put 1 2 3 4 5 6 7 8 9 1 [[alternative HTML version deleted]]
2009 Mar 30
1
Sum of character vector
Dear list, I am trying to evaluate how many elements in a vector equal a certain value. The vectors are the columns of a data.frame, read in using read.table(): > dim(data) [1] 2600 742 > data[1:5,1:5] SNP001 SNP002 SNP003 SNP004 SNP005 1 GG AA TT TT GG 2 GG AA TC TT GG 3 GG AC CC TT GG 4 AG AA TT TT GG 5
2012 Sep 26
3
replace string values with numbers
Hi everyone, I have a data frame Gene with SNPs eg. P1 P2 P3 CG CG GG -- -- AC -- AC CC AC -- AC I tried to replace all the GG with a value 3. Gene[Gene=="GG"]<-3 It always give me: Warning in `[<-.factor`(`*tmp*`, thisvar, value = 3) : invalid factor level, NAs generated Does any know if there is anything wrong with my code? Thanks, Zhengyu
2009 Aug 25
1
Filling in empty arrays/lists from using "paste" function
Dear R users, I am trying to fill in arrays (5 different according to distinct "id") from objects produced from arbitrary data set below. a <-
2011 Apr 20
3
Help needed!
Hi everyone, I have a question. Now I am reading the resource code of the package "ssfcov". The resource code is as following. I cannot find the resource code of the function "myss2d" anywhere in the package. Can anyone give me a hint how to find it in the package. Thanks a lot!!bv > ssfcov function (time, x, subject, nbasis = 5, centered = FALSE, noDiag = TRUE) {
2010 Sep 10
4
Counting occurances of a letter by a factor
I'm trying to find a more elegant way of doing this. What I'm trying to accomplish is to count the frequency of letters (major / minor alleles) in a string grouped by the factor levels in another column of my data frame. Ex. > DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L",
2017 Dec 13
1
overlay two histograms ggplot
Hi all, How can I overlay these two histograms? ggplot(gg, aes(gg$Alz, fill = gg$veg)) + geom_histogram(alpha = 0.2) ggplot(tt, aes(tt$Cont, fill = tt$veg)) + geom_histogram(alpha = 0.2) thanks for any help! Elahe
2009 Jan 12
1
Determining variance components of classed covariates
Hi - I am interested in solving variance components for the data below with respect to the response variable, Expression within R. However, the covariates aren't independent and they also have a class (of which the total variance explained by covariates in that class I am most interested in). Very naively, I have tried to look at each individual covariates variance like this >
2010 Mar 09
0
error with adaboost: replacement has 186 rows, data has 62
Hi, all, When running > AB.fit=adaboost(ylearn, xlearn, xtest, presel=0) I got the following error: Error in `[[<-.data.frame`(`*tmp*`, preds, value = c(4L, 6L, 6L, 6L, 3L, : replacement has 186 rows, data has 62 The data structure is attached below: [1] "ylearn" [1] 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 [40] 1 1 1 1 1 1 1 0
2010 Mar 08
0
error when using svm routine: Error in if (any(co)) { : missing value where TRUE/FALSE needed
Hi, I met with this error message with the following data set. Do you know how to resolve it? Thanks. > data<-read.table("c://temp3//abc.csv", sep = ",", header=T) > classwt<-c( 0.5806452, 0.4193548) > y<-data[,1] > x<-data[,2:ncol(data)] > print(y) [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 [36] 1 1 1 1 2 2 2 2 2 2
2010 Aug 05
2
questions about string handling
Hi, I have a question about the data handling. I have a dataset as following: ID snp1 snp2 snp3 1001 0/0 1/1 1/1 1002 2/2 3/3 1/1 1003 4/4 3/3 2/2 I want to convert the dataset to the following format: ID snp1 snp2 snp3 1001 00 AA AA 1002 GG
2013 Jul 02
2
Recoding variables based on reference values in data frame
I'm new to R (previously used SAS primarily) and I have a genetics data frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3, ...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of the data is shown below: SNP_ID SNP1 SNP2 SNP3 SNP4 Maj_Allele C G C A Min_Allele T A T G ID1 CC GG CT AA ID2 CC GG CC AA ID3 CC GG nc AA
2009 Jun 03
1
strsplit for multiple columns
Hi, I am trying to split multiple columns. One column works just fine, but I want to do it for multiple columns??? Example > a ID V2 V3 V4 V5 V6 V7 V8 V9 V10 1 PBBA0644 -- GG AA -- AA -- AA GG GG 2 PBBA1010 -- GG AA -- AA -- AA GG GG 3 0127ATPR -- GG AA -- AA -- AA GG GG 4 0128EHAB -- GG AA -- AG -- AA AG GG 5 PBBA0829 -- GG AA -- AA -- AA GG AG