Displaying 20 results from an estimated 10000 matches similar to: "Counting occurances of a letter by a factor"
2013 Jul 02
2
Recoding variables based on reference values in data frame
I'm new to R (previously used SAS primarily) and I have a genetics data
frame consisting of genotypes for each of 300+ subjects (ID1, ID2, ID3,
...) at 3000+ genetic locations (SNP1, SNP2, SNP3...). A small subset of
the data is shown below:
SNP_ID SNP1 SNP2 SNP3 SNP4 Maj_Allele C G C A Min_Allele T A T G ID1
CC GG CT AA ID2 CC GG CC AA ID3 CC GG
nc
AA
2006 Jun 30
3
data extraction
Dear mailing list I have a data that have 20,000 rows and 20 columns. Io
wonted to extract the 10th row only. Example the 10th, 20th, 30th 40th…..20000
th. can you please help me how do I do that.Than kyou.
Example is below.
Inpute:
AG GG GG AG
CC CC CC CC
CT CC CT CT
GG GG GG GG
CC CC CC CC
GG GG GG GG
CC CC CC CC
GG CG CG GG
GG GG GG GG
*CC CC CC CC*
AA AG AG AA
AA AA AA AA
GG AG AG GG
GG AG AG
2008 Sep 14
5
difference of two data frames
Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph
[[alternative HTML version
2011 Aug 22
1
Selecting cases from matrices stored in lists
Hi,
I have two lists (c and h - see below) containing matrices with similar
cases but different values. I want to split these matrices into multiple
matrices based on the values in h. So, I did the following:
years<-c(1997:1999)
for (t in 1:length(years))
{
year=as.character(years[t])
h[[year]]<-sapply(colnames(h[[year]]), function(var)
2012 Sep 26
3
replace string values with numbers
Hi everyone, I have a data frame Gene with SNPs eg. P1 P2 P3
CG CG GG
-- -- AC
-- AC CC
AC -- AC I tried to replace all the GG with a value 3. Gene[Gene=="GG"]<-3 It always give me: Warning in `[<-.factor`(`*tmp*`, thisvar, value = 3) :
invalid factor level, NAs generated Does any know if there is anything wrong with my code? Thanks, Zhengyu
2005 Oct 28
3
replacing a factor value in a data frame
Hi All,
I have the following problem, that's driving me mad.
I have a dataframe of factors, from a genetic scan of SNPs. I DO have
NAs in the dataframe, which would look like:
V4 V5 V6 V7 V8 V9 V10
1 TT GG TT AC AG AG TT
2 AT CC TT AA AA AA TT
3 AT CC TT AC AA <NA> TT
4 TT CC TT AA AA AA TT
5 AT CG TT CC AA AA TT
6 TT CC TT AA AA AA TT
7 AT CC
2006 Jun 17
2
managing data
Dear mailing list, may some one be kind to help me solve following problem.
I am trying to write a code that will combine two tables "x" and "y". The
first columns of both tables are unique identification for the rows. The
first column of table "X" is a sub set of the first column of "Y". I need to
find the matching rows in both tables by looking on their
2006 May 22
1
editing a big file
I have a file that has 90 columns and 20,000 rows and looks like
C/G CC GG CG G/T GG TT GT C/T CC TT CT A/G AA GG AG A/C AA CC AC A/T AA
TT AT
I want to write a code that will read through each row first the first looks
at the first column and then replace the three columns with 12 if it is the
same as the first column e.g. third column 11 if it is a repeat of the first
alphabet like the
2009 Jun 03
1
strsplit for multiple columns
Hi,
I am trying to split multiple columns. One column works just fine, but I
want to do it for multiple columns???
Example
> a
ID V2 V3 V4 V5 V6 V7 V8 V9 V10
1 PBBA0644 -- GG AA -- AA -- AA GG GG
2 PBBA1010 -- GG AA -- AA -- AA GG GG
3 0127ATPR -- GG AA -- AA -- AA GG GG
4 0128EHAB -- GG AA -- AG -- AA AG GG
5 PBBA0829 -- GG AA -- AA -- AA GG AG
2018 Feb 25
4
reshaping column items into rows per unique ID
Hi All
I have a datafram which looks like this :
CustomerID DietType
1 a
1 c
1 b
2 f
2 a
3 j
4 c
4 c
4 f
And I would like to reshape this so I can
2013 Oct 21
3
speeding up "sum of squared differences" calculation
All,
I am using a sum of squared differences in the objective function of an optimization problem I am doing and I have managed to speed it up using the outer function versus the nested for loops, but my suspicion is that the calculation could be done even quicker. Please see the code below for a simple example. If anyone can point out a faster way I would appreciate it greatly.
Thanks,
Roger
2012 Jan 08
2
Convert components of a list to separate columns in a data frame or matrix XXXX
Hello everyone,
What is the most efficient & simpliest way to convert all components of a
list to separate columns in a matrix?
Is there an easy way to programmatically "pad" the length of the resulting
shorter character vectors so that they can be easily combined into a data
frame?
I have the following code that stores the 2 compoents (of differing
lengths) in the same character
2009 Mar 20
1
reshape dataframe
Hi,
I have a large dataset on which I would like to do the following:
x<-data.frame(id=c(1,2,3), snp1=c("AA","GG",
"AG"),snp2=c("GG","AG","GG"),snp3=c("GG","AG","AA"))
> x
id snp1 snp2 snp3
1 1 AA GG GG
2 2 GG AG AG
3 3 AG GG AA
And then
2011 Jun 15
4
R string functions
Hi,
I have a string "GGGGGGCCCAATCGCAATTCCAATT"
What I want to do is to count the percentage of each letter in the string,
what string functions can I use to count the number of each letter appearing
in the string?
For example, the letter "A" appeared 6 times, letter "T" appeared 5 times,
how can I use a string function to get the these number?
thanks,
karena
2010 Oct 26
4
divide column in a dataframe based on a character
Hello,
If I have a dataframe:
example(data.frame)
zz<-c("aa_bb","bb_cc","cc_dd","dd_ee","ee_ff","ff_gg","gg_hh","ii_jj","jj_kk","kk_ll")
ddd <- cbind(dd, group = zz)
and I want to divide the column named group by the "_", how would I do this?
so instead of the first row being
x
2010 Oct 09
1
question related to multiple regression
Hi,
I am conducting an association analysis of genotype and a phenotype such as
cholesterol level as an outcome and the genotype as a regressor using
multiple linear regression. There are 3 possibilities for the genotype AA,
AG, GG. There are 5 people with the AA genotype, 100 with the AG genotype
and 900 with the GG genotype. I coded GG genotype as 1, AG as 2 and AA as 3
and the p-value for the
2011 Jun 15
2
Count occurances in integers (or strings)
Hi,
I have a dataframe column from which I want to calculate the number of
1's in each entry. Some column values could, for example, be
"0001001000" and "11110000111".
To get the number of occurrences from a string I use this:
sum(unlist(strsplit(mydata[,"my_column"], "")) == "1")
However, as my data is not in string form.. How do I convert
2018 Feb 25
0
reshaping column items into rows per unique ID
I believe you need to spend time with an R tutorial or two: a data frame
(presumably the "table" data structure you describe) can *not* contain
"blanks" -- all columns must be the same length, which means NA's are
filled in as needed.
Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any
local version of R (maybe it would in some server version --
2012 Jul 14
3
Can't understand syntax
OK, I need help!!
I've been searching, but I don't understand the logic of some this
dataframe addressing syntax.
What is this type of code called?
test [["v3"]] [is.na(test[["v2"]])] <-10 #choose column v3 where column v2
is == 4 and replace with 10
and where is it documented?
The code below works for what I want to do (find the non-missing value in a
row),
2008 May 13
3
Regular Expressions
Hi R,
Again struck with regular expressions...
Suppose,
S=c("World_is_beautiful", "one_two_three_four","My_book")
I need to extract the last but one element of the strings. So, my output should look like:
Ans=c("is","three","My")
gsub() can do this...but wondering how do I give the regular expression....