thr3ads.net - similar to: "Counting occurances of a letter by a factor"

Displaying 20 results from an estimated 10000 matches similar to: "Counting occurances of a letter by a factor"

Recoding variables based on reference values in data frame

2013 Jul 02

Recoding variables based on reference values in data frame

I'm new to R (previously used SAS primarily) and I have a genetics data frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3, ...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of the data is shown below: SNP_ID SNP1 SNP2 SNP3 SNP4 Maj_Allele C G C A Min_Allele T A T G ID1 CC GG CT AA ID2 CC GG CC AA ID3 CC GG nc AA

data extraction

2006 Jun 30

data extraction

Dear mailing list I have a data that have 20,000 rows and 20 columns. Io wonted to extract the 10th row only. Example the 10th, 20th, 30th 40th…..20000 th. can you please help me how do I do that.Than kyou. Example is below. Inpute: AG GG GG AG CC CC CC CC CT CC CT CT GG GG GG GG CC CC CC CC GG GG GG GG CC CC CC CC GG CG CG GG GG GG GG GG *CC CC CC CC* AA AG AG AA AA AA AA AA GG AG AG GG GG AG AG

difference of two data frames

2008 Sep 14

difference of two data frames

Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version

Selecting cases from matrices stored in lists

2011 Aug 22

Selecting cases from matrices stored in lists

Hi, I have two lists (c and h - see below) containing matrices with similar cases but different values. I want to split these matrices into multiple matrices based on the values in h. So, I did the following: years<-c(1997:1999) for (t in 1:length(years)) { year=as.character(years[t]) h[[year]]<-sapply(colnames(h[[year]]), function(var)

replace string values with numbers

2012 Sep 26

replace string values with numbers

Hi everyone, I have a data frame Gene with SNPs eg. P1 P2 P3 CG CG GG -- -- AC -- AC CC AC -- AC I tried to replace all the GG with a value 3. Gene[Gene=="GG"]<-3 It always give me: Warning in `[<-.factor`(`*tmp*`, thisvar, value = 3) : invalid factor level, NAs generated Does any know if there is anything wrong with my code? Thanks, Zhengyu

replacing a factor value in a data frame

2005 Oct 28

replacing a factor value in a data frame

Hi All, I have the following problem, that's driving me mad. I have a dataframe of factors, from a genetic scan of SNPs. I DO have NAs in the dataframe, which would look like: V4 V5 V6 V7 V8 V9 V10 1 TT GG TT AC AG AG TT 2 AT CC TT AA AA AA TT 3 AT CC TT AC AA <NA> TT 4 TT CC TT AA AA AA TT 5 AT CG TT CC AA AA TT 6 TT CC TT AA AA AA TT 7 AT CC

managing data

2006 Jun 17

managing data

Dear mailing list, may some one be kind to help me solve following problem. I am trying to write a code that will combine two tables "x" and "y". The first columns of both tables are unique identification for the rows. The first column of table "X" is a sub set of the first column of "Y". I need to find the matching rows in both tables by looking on their

editing a big file

2006 May 22

editing a big file

I have a file that has 90 columns and 20,000 rows and looks like C/G CC GG CG G/T GG TT GT C/T CC TT CT A/G AA GG AG A/C AA CC AC A/T AA TT AT I want to write a code that will read through each row first the first looks at the first column and then replace the three columns with 12 if it is the same as the first column e.g. third column 11 if it is a repeat of the first alphabet like the

strsplit for multiple columns

2009 Jun 03

strsplit for multiple columns

Hi, I am trying to split multiple columns. One column works just fine, but I want to do it for multiple columns??? Example > a ID V2 V3 V4 V5 V6 V7 V8 V9 V10 1 PBBA0644 -- GG AA -- AA -- AA GG GG 2 PBBA1010 -- GG AA -- AA -- AA GG GG 3 0127ATPR -- GG AA -- AA -- AA GG GG 4 0128EHAB -- GG AA -- AG -- AA AG GG 5 PBBA0829 -- GG AA -- AA -- AA GG AG

reshaping column items into rows per unique ID

2018 Feb 25

reshaping column items into rows per unique ID

Hi All I have a datafram which looks like this : CustomerID DietType 1 a 1 c 1 b 2 f 2 a 3 j 4 c 4 c 4 f And I would like to reshape this so I can

speeding up "sum of squared differences" calculation

2013 Oct 21

speeding up "sum of squared differences" calculation

All, I am using a sum of squared differences in the objective function of an optimization problem I am doing and I have managed to speed it up using the outer function versus the nested for loops, but my suspicion is that the calculation could be done even quicker. Please see the code below for a simple example. If anyone can point out a faster way I would appreciate it greatly. Thanks, Roger

Convert components of a list to separate columns in a data frame or matrix XXXX

2012 Jan 08

Convert components of a list to separate columns in a data frame or matrix XXXX

Hello everyone, What is the most efficient & simpliest way to convert all components of a list to separate columns in a matrix? Is there an easy way to programmatically "pad" the length of the resulting shorter character vectors so that they can be easily combined into a data frame? I have the following code that stores the 2 compoents (of differing lengths) in the same character

reshape dataframe

2009 Mar 20

reshape dataframe

Hi, I have a large dataset on which I would like to do the following: x<-data.frame(id=c(1,2,3), snp1=c("AA","GG", "AG"),snp2=c("GG","AG","GG"),snp3=c("GG","AG","AA")) > x id snp1 snp2 snp3 1 1 AA GG GG 2 2 GG AG AG 3 3 AG GG AA And then

R string functions

2011 Jun 15

R string functions

Hi, I have a string "GGGGGGCCCAATCGCAATTCCAATT" What I want to do is to count the percentage of each letter in the string, what string functions can I use to count the number of each letter appearing in the string? For example, the letter "A" appeared 6 times, letter "T" appeared 5 times, how can I use a string function to get the these number? thanks, karena

divide column in a dataframe based on a character

2010 Oct 26

divide column in a dataframe based on a character

Hello, If I have a dataframe: example(data.frame) zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll") ddd <- cbind(dd, group = zz) and I want to divide the column named group by the "_", how would I do this? so instead of the first row being x

question related to multiple regression

2010 Oct 09

question related to multiple regression

Hi, I am conducting an association analysis of genotype and a phenotype such as cholesterol level as an outcome and the genotype as a regressor using multiple linear regression. There are 3 possibilities for the genotype AA, AG, GG. There are 5 people with the AA genotype, 100 with the AG genotype and 900 with the GG genotype. I coded GG genotype as 1, AG as 2 and AA as 3 and the p-value for the

Count occurances in integers (or strings)

2011 Jun 15

Count occurances in integers (or strings)

Hi, I have a dataframe column from which I want to calculate the number of 1's in each entry. Some column values could, for example, be "0001001000" and "11110000111". To get the number of occurrences from a string I use this: sum(unlist(strsplit(mydata[,"my_column"], "")) == "1") However, as my data is not in string form.. How do I convert

reshaping column items into rows per unique ID

2018 Feb 25

reshaping column items into rows per unique ID

I believe you need to spend time with an R tutorial or two: a data frame (presumably the "table" data structure you describe) can *not* contain "blanks" -- all columns must be the same length, which means NA's are filled in as needed. Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any local version of R (maybe it would in some server version --

Can't understand syntax

2012 Jul 14

Can't understand syntax

OK, I need help!! I've been searching, but I don't understand the logic of some this dataframe addressing syntax. What is this type of code called? test [["v3"]] [is.na(test[["v2"]])] <-10 #choose column v3 where column v2 is == 4 and replace with 10 and where is it documented? The code below works for what I want to do (find the non-missing value in a row),

Regular Expressions

2008 May 13

Regular Expressions

Hi R, Again struck with regular expressions... Suppose, S=c("World_is_beautiful", "one_two_three_four","My_book") I need to extract the last but one element of the strings. So, my output should look like: Ans=c("is","three","My") gsub() can do this...but wondering how do I give the regular expression....

similar to: Counting occurances of a letter by a factor