Dear all, I am doing a project on variant calling using R.I am working on pileup file.There are 10 columns in my data frame and I want to count the number of A,C,G and T in each row for column 9.example of column 9 is given below- .a,g,, .t,t,, .,c,c, .,a,,, .,t,t,t .c,,g,^!. .g,ggg.^!, .$,,,,,., a,g,,t, ,,,,,.,^!. ,$,,,,.,. This is a bit confusing for me as these characters are in one column and how can we scan them for each row to print number of A,C,G and T for each row. Most of the rows have . and , and other symbols but we will ignore them.I just want to run a loop with a counter which will count the number of A,C,G and T for each row and will give output something like this- A C G T 1 0 1 0 0 0 0 2 0 2 0 0 1 0 0 0 0 0 0 3 This output is for first 5 rows from the example given above. I am new to R can you please help me.I will be very thankful to you. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London
On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote:> Dear all, > > I am doing a project on variant calling using R.I am working on > pileup file.There are 10 columns in my data frame and I want to > count the number of A,C,G and T in each row for column 9.example of > column 9 is given below- > > .a,g,, > .t,t,, > .,c,c, > .,a,,, > .,t,t,t > .c,,g,^!. > .g,ggg.^!, > .$,,,,,., > a,g,,t, > ,,,,,.,^!. > ,$,,,,.,. > > This is a bit confusing for me as these characters are in one column > and how can we scan them for each row to print number of A,C,G and T > for each row.Seems a bit clunky but this does the job (first the data): > txt <- " .a,g,, + .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,." > txtvec <- readLines(textConnection(txt)) Now the clunky solution, Basically subtracts 1 from the counts of "fragments" that result from splitting on each letter in turn. Could be made prettier with a function that did the job. > data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a"), length) , "-", 1)), + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), length) , "-", 1)), + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), length) , "-", 1)), + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), length) , "-", 1)) ) A C G T .a,g,, 1 0 1 0 .t,t,, 0 0 0 2 .,c,c, 0 2 0 0 .,a,,, 1 0 0 0 .,t,t,t 0 0 0 2 .c,,g,^!. 0 1 1 0 .g,ggg.^!, 0 0 4 0 .$,,,,,., 0 0 0 0 a,g,,t, 1 0 1 1 ,,,,,.,^!. 0 0 0 0 ,$,,,,.,. 0 0 0 0 Has the advantage that the input data ends up as rownames, which was a surprise. If you wanted to count "A" and "a" as equivalent, then the split argument should be "a|A"> Most of the rows have . and , and other symbols > but we will ignore them.I just want to run a loop with a counter > which will count the number of A,C,G and T for each row and will > give output something like this- > > > A C G T > 1 0 1 0 > 0 0 0 2 > 0 2 0 0 > 1 0 0 0 > 0 0 0 3 > > This output is for first 5 rows from the example given above. > > I am new to R can you please help me.I will be very thankful to you. > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
On Fri, Jul 1, 2011 at 12:47 PM, Bansal, Vikas <vikas.bansal at kcl.ac.uk> wrote:> Dear all, > > I am doing a project on variant calling using R.I am working on pileup file.There are 10 columns in my data frame and I want to count the number of A,C,G and T in each row for column 9.example of column 9 is given below- > > ? ? ? ? ? ?.a,g,, > ? ? ? ? ? ?.t,t,, > ? ? ? ? ? ?.,c,c, > ? ? ? ? ? ?.,a,,, > ? ? ? ? ? ?.,t,t,t > ? ? ? ? ? ?.c,,g,^!. > ? ? ? ? ? ?.g,ggg.^!, > ? ? ? ? ? ?.$,,,,,., > ? ? ? ? ? ?a,g,,t, > ? ? ? ? ? ?,,,,,.,^!. > ? ? ? ? ? ?,$,,,,.,. > > This is a bit confusing for me as these characters are in one column and how can we scan them for each row to print number of A,C,G and T for each row. > Most of the rows have ? ? ?. ? ? ? ? and ? ? ?, ? ?and other symbols but we will ignore them.I just want to run a loop with a counter which will count the number of A,C,G and T for each row and will give output something like this- > > > A ? C ? G ?T > 1 ? 0 ? 1 ?0 > 0 ? 0 ? 0 ?2 > 0 ? 2 ? 0 ?0 > 1 ? 0 ? 0 ?0 > 0 ? 0 ? 0 ?3 > > This output is for first 5 rows from the example given above. >Read the lines into L and then remove all but each of a, c, g and t computing the number of characters in the remaining character strings: Lines <- ".a,g,, .t,t,, .,c,c, .,a,,, .,t,t,t .c,,g,^!. .g,ggg.^!, .$,,,,,., a,g,,t, ,,,,,.,^!. ,$,,,,.,." L <- readLines(textConnection(Lines)) data.frame(a = nchar(gsub("[^a]", "", L)), c = nchar(gsub("[^c]", "", L)), g = nchar(gsub("[^g]", "", L)), t = nchar(gsub("[^t]", "", L)) ) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Dear David, Thanks for your reply.I tried your code it is running but as I mentioned in my mail,I am working on pileup file.So I used a command- mydf=read.table("Case2.pileup",fill=T,sep="\t") to read pileup file to have data frame i:e mydf.Now the problem is it has 10 columns and have to count the number of A C G T which is in 9th column. In your mail we input data like this> txt <- " .a,g,,+ .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,." but how I should input my data(in column 9) from dataframe mydf using txt command because there are thousands of rows? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Friday, July 01, 2011 11:25 PM To: Bansal, Vikas Cc: r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote:> Dear all, > > I am doing a project on variant calling using R.I am working on > pileup file.There are 10 columns in my data frame and I want to > count the number of A,C,G and T in each row for column 9.example of > column 9 is given below- > > .a,g,, > .t,t,, > .,c,c, > .,a,,, > .,t,t,t > .c,,g,^!. > .g,ggg.^!, > .$,,,,,., > a,g,,t, > ,,,,,.,^!. > ,$,,,,.,. > > This is a bit confusing for me as these characters are in one column > and how can we scan them for each row to print number of A,C,G and T > for each row.Seems a bit clunky but this does the job (first the data): > txt <- " .a,g,, + .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,." > txtvec <- readLines(textConnection(txt)) Now the clunky solution, Basically subtracts 1 from the counts of "fragments" that result from splitting on each letter in turn. Could be made prettier with a function that did the job. > data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a"), length) , "-", 1)), + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), length) , "-", 1)), + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), length) , "-", 1)), + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), length) , "-", 1)) ) A C G T .a,g,, 1 0 1 0 .t,t,, 0 0 0 2 .,c,c, 0 2 0 0 .,a,,, 1 0 0 0 .,t,t,t 0 0 0 2 .c,,g,^!. 0 1 1 0 .g,ggg.^!, 0 0 4 0 .$,,,,,., 0 0 0 0 a,g,,t, 1 0 1 1 ,,,,,.,^!. 0 0 0 0 ,$,,,,.,. 0 0 0 0 Has the advantage that the input data ends up as rownames, which was a surprise. If you wanted to count "A" and "a" as equivalent, then the split argument should be "a|A"> Most of the rows have . and , and other symbols > but we will ignore them.I just want to run a loop with a counter > which will count the number of A,C,G and T for each row and will > give output something like this- > > > A C G T > 1 0 1 0 > 0 0 0 2 > 0 2 0 0 > 1 0 0 0 > 0 0 0 3 > > This output is for first 5 rows from the example given above. > > I am new to R can you please help me.I will be very thankful to you. > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Dear David, it is showing this error- data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit, + split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="c|C"), Error: unexpected ',' in: "data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit, split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5],"> length) , "-", 1)),G = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="g|G"),Error: unexpected ')' in "length)"> length) , "-", 1)),T = unlist(lapply( lapply( sapply(mydf[,5], strsplit, split="t|T"),Error: unexpected ')' in "length)" What should I do? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Saturday, July 02, 2011 2:07 AM To: Bansal, Vikas Subject: Re: [R] For help in R coding On Jul 1, 2011, at 8:01 PM, Bansal, Vikas wrote:> Dear David, > > Thanks for your reply.I tried your code it is running but as I > mentioned in my mail,I am working on pileup file.So I used a command- > mydf=read.table( > to read pileup file to have data frame i:e mydf.Now the problem is > it has 10 columns and have to count the number of A C G T which is > in 9th column. > In your mail we input data like this >> txt <- " .a,g,, > + .t,t,, > + .,c,c, > + .,a,,, > + .,t,t,t > + .c,,g,^!. > + .g,ggg.^!, > + .$,,,,,., > + a,g,,t, > + ,,,,,.,^!. > + ,$,,,,.,." > > but how I should input my data from dataframe mydf using txt command > because there are thousands of rows?Just sent mydf[ , 9] as the argument in place of testvec.> > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Friday, July 01, 2011 11:25 PM > To: Bansal, Vikas > Cc: r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote: > >> Dear all, >> >> I am doing a project on variant calling using R.I am working on >> pileup file.There are 10 columns in my data frame and I want to >> count the number of A,C,G and T in each row for column 9.example of >> column 9 is given below- >> >> .a,g,, >> .t,t,, >> .,c,c, >> .,a,,, >> .,t,t,t >> .c,,g,^!. >> .g,ggg.^!, >> .$,,,,,., >> a,g,,t, >> ,,,,,.,^!. >> ,$,,,,.,. >> >> This is a bit confusing for me as these characters are in one column >> and how can we scan them for each row to print number of A,C,G and T >> for each row. > > Seems a bit clunky but this does the job (first the data): >> txt <- " .a,g,, > + .t,t,, > + .,c,c, > + .,a,,, > + .,t,t,t > + .c,,g,^!. > + .g,ggg.^!, > + .$,,,,,., > + a,g,,t, > + ,,,,,.,^!. > + ,$,,,,.,." > >> txtvec <- readLines(textConnection(txt)) > > Now the clunky solution, Basically subtracts 1 from the counts of > "fragments" that result from splitting on each letter in turn. Could > be made prettier with a function that did the job. > >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, > split="a"), length) , "-", 1)), > + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), > length) , "-", 1)), > + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), > length) , "-", 1)), > + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), > length) , "-", 1)) ) > A C G T > .a,g,, 1 0 1 0 > .t,t,, 0 0 0 2 > .,c,c, 0 2 0 0 > .,a,,, 1 0 0 0 > .,t,t,t 0 0 0 2 > .c,,g,^!. 0 1 1 0 > .g,ggg.^!, 0 0 4 0 > .$,,,,,., 0 0 0 0 > a,g,,t, 1 0 1 1 > ,,,,,.,^!. 0 0 0 0 > ,$,,,,.,. 0 0 0 0 > > Has the advantage that the input data ends up as rownames, which was a > surprise. > > If you wanted to count "A" and "a" as equivalent, then the split > argument should be "a|A" > > >> Most of the rows have . and , and other symbols >> but we will ignore them.I just want to run a loop with a counter >> which will count the number of A,C,G and T for each row and will >> give output something like this- >> >> >> A C G T >> 1 0 1 0 >> 0 0 0 2 >> 0 2 0 0 >> 1 0 0 0 >> 0 0 0 3 >> >> This output is for first 5 rows from the example given above. >> >> I am new to R can you please help me.I will be very thankful to you. >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > > > > >David Winsemius, MD West Hartford, CT
On Jul 1, 2011, at 9:18 PM, Bansal, Vikas wrote:> Dear David, > > it is showing this error-Looks like a syntax error rather than a semantic error.> > data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit, > + split="a|A"), length) , "-", 1)),C = > unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="c|C"), > Error: unexpected ',' in: > "data.frame(A = unlist(lapply( lapply( sapply(, strsplit,There seems to be a missing object to the first argument of sapply...? You should supply str(mydf[,5]) or at least see if the error occurs on mydf[1:20, 5] and supply str on that it the error persists. -- David.> split="a|A"), length) , "-", 1)),C = > unlist(lapply( lapply( sapply((mydf[,5]," >> length) , "-", 1)),G = unlist(lapply( lapply( sapply((mydf[,5], >> strsplit, split="g|G"), > Error: unexpected ')' in "length)" >> length) , "-", 1)),T = unlist(lapply( lapply( sapply(mydf[,5], >> strsplit, split="t|T"), > Error: unexpected ')' in "length)" > > What should I do? > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Saturday, July 02, 2011 2:07 AM > To: Bansal, Vikas > Subject: Re: [R] For help in R coding > > On Jul 1, 2011, at 8:01 PM, Bansal, Vikas wrote: > >> Dear David, >> >> Thanks for your reply.I tried your code it is running but as I >> mentioned in my mail,I am working on pileup file.So I used a command- >> mydf=read.table( >> to read pileup file to have data frame i:e mydf.Now the problem is >> it has 10 columns and have to count the number of A C G T which is >> in 9th column. >> In your mail we input data like this >>> txt <- " .a,g,, >> + .t,t,, >> + .,c,c, >> + .,a,,, >> + .,t,t,t >> + .c,,g,^!. >> + .g,ggg.^!, >> + .$,,,,,., >> + a,g,,t, >> + ,,,,,.,^!. >> + ,$,,,,.,." >> >> but how I should input my data from dataframe mydf using txt command >> because there are thousands of rows? > > Just sent mydf[ , 9] as the argument in place of testvec. > >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Friday, July 01, 2011 11:25 PM >> To: Bansal, Vikas >> Cc: r-help at r-project.org >> Subject: Re: [R] For help in R coding >> >> On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote: >> >>> Dear all, >>> >>> I am doing a project on variant calling using R.I am working on >>> pileup file.There are 10 columns in my data frame and I want to >>> count the number of A,C,G and T in each row for column 9.example of >>> column 9 is given below- >>> >>> .a,g,, >>> .t,t,, >>> .,c,c, >>> .,a,,, >>> .,t,t,t >>> .c,,g,^!. >>> .g,ggg.^!, >>> .$,,,,,., >>> a,g,,t, >>> ,,,,,.,^!. >>> ,$,,,,.,. >>> >>> This is a bit confusing for me as these characters are in one column >>> and how can we scan them for each row to print number of A,C,G and T >>> for each row. >> >> Seems a bit clunky but this does the job (first the data): >>> txt <- " .a,g,, >> + .t,t,, >> + .,c,c, >> + .,a,,, >> + .,t,t,t >> + .c,,g,^!. >> + .g,ggg.^!, >> + .$,,,,,., >> + a,g,,t, >> + ,,,,,.,^!. >> + ,$,,,,.,." >> >>> txtvec <- readLines(textConnection(txt)) >> >> Now the clunky solution, Basically subtracts 1 from the counts of >> "fragments" that result from splitting on each letter in turn. Could >> be made prettier with a function that did the job. >> >>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split="a"), length) , "-", 1)), >> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >> length) , "-", 1)), >> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >> length) , "-", 1)), >> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >> length) , "-", 1)) ) >> A C G T >> .a,g,, 1 0 1 0 >> .t,t,, 0 0 0 2 >> .,c,c, 0 2 0 0 >> .,a,,, 1 0 0 0 >> .,t,t,t 0 0 0 2 >> .c,,g,^!. 0 1 1 0 >> .g,ggg.^!, 0 0 4 0 >> .$,,,,,., 0 0 0 0 >> a,g,,t, 1 0 1 1 >> ,,,,,.,^!. 0 0 0 0 >> ,$,,,,.,. 0 0 0 0 >> >> Has the advantage that the input data ends up as rownames, which >> was a >> surprise. >> >> If you wanted to count "A" and "a" as equivalent, then the split >> argument should be "a|A" >> >> >>> Most of the rows have . and , and other symbols >>> but we will ignore them.I just want to run a loop with a counter >>> which will count the number of A,C,G and T for each row and will >>> give output something like this- >>> >>> >>> A C G T >>> 1 0 1 0 >>> 0 0 0 2 >>> 0 2 0 0 >>> 1 0 0 0 >>> 0 0 0 3 >>> >>> This output is for first 5 rows from the example given above. >>> >>> I am new to R can you please help me.I will be very thankful to you. >>> >>> >>> >>> Thanking you, >>> Warm Regards >>> Vikas Bansal >>> Msc Bioinformatics >>> Kings College London >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> >> >> >> >> >> > > David Winsemius, MD > West Hartford, CT >David Winsemius, MD West Hartford, CT
>> Dear all, >> >> I am doing a project on variant calling using R.I am working on >> pileup file.There are 10 columns in my data frame and I want to >> count the number of A,C,G and T in each row for column 9.example of >> column 9 is given below- >> >> .a,g,, >> .t,t,, >> .,c,c, >> .,a,,, >> .,t,t,t >> .c,,g,^!. >> .g,ggg.^!, >> .$,,,,,., >> a,g,,t, >> ,,,,,.,^!. >> ,$,,,,.,. >> >> This is a bit confusing for me as these characters are in one column >> and how can we scan them for each row to print number of A,C,G and T >> for each row. > > Seems a bit clunky but this does the job (first the data): >> txt <- " .a,g,, > + .t,t,, > + .,c,c, > + .,a,,, > + .,t,t,t > + .c,,g,^!. > + .g,ggg.^!, > + .$,,,,,., > + a,g,,t, > + ,,,,,.,^!. > + ,$,,,,.,." > >> txtvec <- readLines(textConnection(txt)) > > Now the clunky solution, Basically subtracts 1 from the counts of > "fragments" that result from splitting on each letter in turn. Could > be made prettier with a function that did the job. > >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, > split="a"), length) , "-", 1)), > + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), > length) , "-", 1)), > + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), > length) , "-", 1)), > + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), > length) , "-", 1)) ) > A C G T > .a,g,, 1 0 1 0 > .t,t,, 0 0 0 2 > .,c,c, 0 2 0 0 > .,a,,, 1 0 0 0 > .,t,t,t 0 0 0 2 > .c,,g,^!. 0 1 1 0 > .g,ggg.^!, 0 0 4 0 > .$,,,,,., 0 0 0 0 > a,g,,t, 1 0 1 1 > ,,,,,.,^!. 0 0 0 0 > ,$,,,,.,. 0 0 0 0 > > Has the advantage that the input data ends up as rownames, which was a > surprise. > > If you wanted to count "A" and "a" as equivalent, then the split > argument should be "a|A" > >>>AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE THIS.BUT CAN I COUNT . AND , ALSO USING- data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split=".|,"), length) , "-", 1)), I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN CALCULATING AND JUST SHOWING 0.>> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > > > > >David Winsemius, MD West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Say you have the string x in a matrix x<-c('a,..gGGtTaac<!T','caaGGTT,,.!!@CC') x<-matrix(x) remove all punctuation: x1<-gsub('[[:punct:]]','',x) x1 convert all letter to lowercase x2<-gsub('(\\w*)','\\L\\1',x1,perl=T) x2 now for each row split the string and table it. apply over all rows in the matrix apply(x2,1,function(x) table(strsplit(x,''))) HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/For-help-in-R-coding-tp3639413p3642655.html Sent from the R help mailing list archive at Nabble.com.
Dear all, I have one problem and did not find any solution.Please I want your help. I have two data frames and I want to concatenate them.But the thing is- two data frames are like this- V1 V2 A C G T 10 135344109 0 0 1 0 10 135344110 0 1 0 0 10 135344111 0 0 1 0 10 135344112 0 0 1 0 10 135344113 0 0 1 0 10 135344114 1 0 0 0 10 135344115 1 0 0 0 10 135344116 0 0 0 1 10 135344117 0 1 0 0 10 135344118 0 0 0 1 second data frame- V1 V2 A C G T 10 135344111 1 0 1 0 10 135344113 0 0 1 0 10 135344109 0 3 1 0 10 135344114 1 0 0 0 10 145344115 1 0 0 0 10 135344116 1 0 0 1 10 132344117 0 1 0 0 10 135344118 0 0 0 1 10 135344110 0 1 0 0 now i have to create a new data frame which has insert column 3,4,5 and 6 of second data frame in first data frame if the value in second column is same in both the data frames (values in V2 column).So the output(new data frame) should be- V1 V2 A C G T A C G T 10 135344109 0 0 1 0 0 3 1 0 10 135344110 0 1 0 0 0 1 0 0 10 135344111 0 0 1 0 1 0 1 0 10 135344113 0 0 1 0 0 0 1 0 10 135344114 1 0 0 0 1 0 0 0 10 135344116 0 0 0 1 1 0 0 1 10 135344118 0 0 0 1 0 0 0 1 I f you see the output, second column values- V2 135344109 135344110 135344111 135344113 135344114 135344116 135344118 these values are common in both input dataframes. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Monday, July 04, 2011 12:11 AM To: Bansal, Vikas Cc: Dennis Murphy; r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote:> > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Sunday, July 03, 2011 7:08 PM>> > > the code is same i just want to add a condition so that it should > check that if in column 3, the character is A then make number of A > equal to total number of . and , > > Should I explain better or can you please tell me which thing is not > clear?My second posting today had a solution.> >> > -- > David. >> >> >> >> Can you please help me how to use this if condition in your coding >> or we can also do it by using some other condition rather than if >> condition? >> >David Winsemius, MD West Hartford, CT
Dear Vikas, Have at look at ?merge() Best regards, Thierry> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Bansal, Vikas > Verzonden: dinsdag 5 juli 2011 16:51 > Aan: David Winsemius > CC: r-help at r-project.org > Onderwerp: Re: [R] For help in R coding > > Dear all, > > I have one problem and did not find any solution.Please I want your help. > > I have two data frames and I want to concatenate them.But the thing is- > > two data frames are like this- > > V1 V2 A C G T > 10 135344109 0 0 1 0 > 10 135344110 0 1 0 0 > 10 135344111 0 0 1 0 > 10 135344112 0 0 1 0 > 10 135344113 0 0 1 0 > 10 135344114 1 0 0 0 > 10 135344115 1 0 0 0 > 10 135344116 0 0 0 1 > 10 135344117 0 1 0 0 > 10 135344118 0 0 0 1 > > second data frame- > > V1 V2 A C G T > 10 135344111 1 0 1 0 > 10 135344113 0 0 1 0 > 10 135344109 0 3 1 0 > 10 135344114 1 0 0 0 > 10 145344115 1 0 0 0 > 10 135344116 1 0 0 1 > 10 132344117 0 1 0 0 > 10 135344118 0 0 0 1 > 10 135344110 0 1 0 0 > > now i have to create a new data frame which has insert column 3,4,5 and 6 of > second data frame in first data frame if the value in second column is same in > both the data frames (values in V2 column).So the output(new data frame) > should be- > > V1 V2 A C G T A C G T > 10 135344109 0 0 1 0 0 3 1 0 > 10 135344110 0 1 0 0 0 1 0 0 > 10 135344111 0 0 1 0 1 0 1 0 > 10 135344113 0 0 1 0 0 0 1 0 > 10 135344114 1 0 0 0 1 0 0 0 > 10 135344116 0 0 0 1 1 0 0 1 > 10 135344118 0 0 0 1 0 0 0 1 > > I f you see the output, second column values- > > V2 > 135344109 > 135344110 > 135344111 > 135344113 > 135344114 > 135344116 > 135344118 > > these values are common in both input dataframes. > > > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Monday, July 04, 2011 12:11 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > > > > ________________________________________ > > From: David Winsemius [dwinsemius at comcast.net] > > Sent: Sunday, July 03, 2011 7:08 PM > > >> > > > > the code is same i just want to add a condition so that it should > > check that if in column 3, the character is A then make number of A > > equal to total number of . and , > > > > Should I explain better or can you please tell me which thing is not > > clear? > > My second posting today had a solution. > > > > > >> > > -- > > David. > >> > >> > >> > >> Can you please help me how to use this if condition in your coding or > >> we can also do it by using some other condition rather than if > >> condition? > >> > > > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Yes sir.I have already looked at merge() but as I am new to R,I was not able to understand the argument that how should i create a code for the logic i gave in previous mail . Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] Sent: Tuesday, July 05, 2011 3:53 PM To: Bansal, Vikas; David Winsemius Cc: r-help at r-project.org Subject: RE: [R] For help in R coding Dear Vikas, Have at look at ?merge() Best regards, Thierry> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Bansal, Vikas > Verzonden: dinsdag 5 juli 2011 16:51 > Aan: David Winsemius > CC: r-help at r-project.org > Onderwerp: Re: [R] For help in R coding > > Dear all, > > I have one problem and did not find any solution.Please I want your help. > > I have two data frames and I want to concatenate them.But the thing is- > > two data frames are like this- > > V1 V2 A C G T > 10 135344109 0 0 1 0 > 10 135344110 0 1 0 0 > 10 135344111 0 0 1 0 > 10 135344112 0 0 1 0 > 10 135344113 0 0 1 0 > 10 135344114 1 0 0 0 > 10 135344115 1 0 0 0 > 10 135344116 0 0 0 1 > 10 135344117 0 1 0 0 > 10 135344118 0 0 0 1 > > second data frame- > > V1 V2 A C G T > 10 135344111 1 0 1 0 > 10 135344113 0 0 1 0 > 10 135344109 0 3 1 0 > 10 135344114 1 0 0 0 > 10 145344115 1 0 0 0 > 10 135344116 1 0 0 1 > 10 132344117 0 1 0 0 > 10 135344118 0 0 0 1 > 10 135344110 0 1 0 0 > > now i have to create a new data frame which has insert column 3,4,5 and 6 of > second data frame in first data frame if the value in second column is same in > both the data frames (values in V2 column).So the output(new data frame) > should be- > > V1 V2 A C G T A C G T > 10 135344109 0 0 1 0 0 3 1 0 > 10 135344110 0 1 0 0 0 1 0 0 > 10 135344111 0 0 1 0 1 0 1 0 > 10 135344113 0 0 1 0 0 0 1 0 > 10 135344114 1 0 0 0 1 0 0 0 > 10 135344116 0 0 0 1 1 0 0 1 > 10 135344118 0 0 0 1 0 0 0 1 > > I f you see the output, second column values- > > V2 > 135344109 > 135344110 > 135344111 > 135344113 > 135344114 > 135344116 > 135344118 > > these values are common in both input dataframes. > > > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Monday, July 04, 2011 12:11 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > > > > ________________________________________ > > From: David Winsemius [dwinsemius at comcast.net] > > Sent: Sunday, July 03, 2011 7:08 PM > > >> > > > > the code is same i just want to add a condition so that it should > > check that if in column 3, the character is A then make number of A > > equal to total number of . and , > > > > Should I explain better or can you please tell me which thing is not > > clear? > > My second posting today had a solution. > > > > > >> > > -- > > David. > >> > >> > >> > >> Can you please help me how to use this if condition in your coding or > >> we can also do it by using some other condition rather than if > >> condition? > >> > > > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi I sorted out a little bit- I am using this code- vi=(m1 <- merge(blaa, daf, by.x = "V2", by.y = "V2")) (m2 <- merge(daf, blaa, by.x = "V2", by.y = "V2")) results are also coming fine. but i dont know i got another code- stopifnot(as.character(m1[,1]) == as.character(m2[,1]), all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]), dim(merge(m1, m2, by = integer(0))) == c(36, 10)) Can you tell me what this code will do???? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] Sent: Tuesday, July 05, 2011 3:53 PM To: Bansal, Vikas; David Winsemius Cc: r-help at r-project.org Subject: RE: [R] For help in R coding Dear Vikas, Have at look at ?merge() Best regards, Thierry> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Bansal, Vikas > Verzonden: dinsdag 5 juli 2011 16:51 > Aan: David Winsemius > CC: r-help at r-project.org > Onderwerp: Re: [R] For help in R coding > > Dear all, > > I have one problem and did not find any solution.Please I want your help. > > I have two data frames and I want to concatenate them.But the thing is- > > two data frames are like this- > > V1 V2 A C G T > 10 135344109 0 0 1 0 > 10 135344110 0 1 0 0 > 10 135344111 0 0 1 0 > 10 135344112 0 0 1 0 > 10 135344113 0 0 1 0 > 10 135344114 1 0 0 0 > 10 135344115 1 0 0 0 > 10 135344116 0 0 0 1 > 10 135344117 0 1 0 0 > 10 135344118 0 0 0 1 > > second data frame- > > V1 V2 A C G T > 10 135344111 1 0 1 0 > 10 135344113 0 0 1 0 > 10 135344109 0 3 1 0 > 10 135344114 1 0 0 0 > 10 145344115 1 0 0 0 > 10 135344116 1 0 0 1 > 10 132344117 0 1 0 0 > 10 135344118 0 0 0 1 > 10 135344110 0 1 0 0 > > now i have to create a new data frame which has insert column 3,4,5 and 6 of > second data frame in first data frame if the value in second column is same in > both the data frames (values in V2 column).So the output(new data frame) > should be- > > V1 V2 A C G T A C G T > 10 135344109 0 0 1 0 0 3 1 0 > 10 135344110 0 1 0 0 0 1 0 0 > 10 135344111 0 0 1 0 1 0 1 0 > 10 135344113 0 0 1 0 0 0 1 0 > 10 135344114 1 0 0 0 1 0 0 0 > 10 135344116 0 0 0 1 1 0 0 1 > 10 135344118 0 0 0 1 0 0 0 1 > > I f you see the output, second column values- > > V2 > 135344109 > 135344110 > 135344111 > 135344113 > 135344114 > 135344116 > 135344118 > > these values are common in both input dataframes. > > > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Monday, July 04, 2011 12:11 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > > > > ________________________________________ > > From: David Winsemius [dwinsemius at comcast.net] > > Sent: Sunday, July 03, 2011 7:08 PM > > >> > > > > the code is same i just want to add a condition so that it should > > check that if in column 3, the character is A then make number of A > > equal to total number of . and , > > > > Should I explain better or can you please tell me which thing is not > > clear? > > My second posting today had a solution. > > > > > >> > > -- > > David. > >> > >> > >> > >> Can you please help me how to use this if condition in your coding or > >> we can also do it by using some other condition rather than if > >> condition? > >> > > > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
If I understand correctly, you want to keep the rows from each table which have common values in the second column. In which case, merge will work for this, such as in this example. Say you have these data frames:> frame1x A C G 1 0 -1 2 2 -1 0 -1 3 0 0 -1 4 1 1 -1 5 0 1 0 6 -1 1 -1 7 0 -1 1 8 0 0 -1 9 1 -3 0 10 0 0 0> frame2x A C G 2 0 1 0 4 -1 0 -1 6 1 0 0 8 -1 -1 -1 10 -1 0 -1 12 1 -1 0 14 0 -2 0 16 0 -2 0 18 1 0 -1 20 0 -1 2 and you want to combine these tables and keep the rows which have values of column x in common or getting something like this. x A C G A C G 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which shows that the common values of x between the two tables is 2,4,6,8, and 10. The merge command to do this is:> merge(frame1,frame2,by="x")x A.x C.x G.x A.y C.y G.y 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which is the same as the desired output from before except for the column names. When there are common names between the two data frames being merged, R will add an extension to the column name to make it easy to determine which columns came from which data frame. In this case, the .x extension are columns from the first data frame (frame1 in this example) and the .y extension are columns from the second data frame (frame 2 in this example). I hope this helps! Adrienne On Tue, Jul 5, 2011 at 11:02 AM, Bansal, Vikas <vikas.bansal at kcl.ac.uk> wrote:> > Yes sir.I have already looked at merge() > but as I am new to R,I was not able to understand the argument that how should i create a code for the logic i gave in previous mail . > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] > Sent: Tuesday, July 05, 2011 3:53 PM > To: Bansal, Vikas; David Winsemius > Cc: r-help at r-project.org > Subject: RE: [R] For help in R coding > > Dear Vikas, > > Have at look at ?merge() > > Best regards, > > Thierry > >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> Namens Bansal, Vikas >> Verzonden: dinsdag 5 juli 2011 16:51 >> Aan: David Winsemius >> CC: r-help at r-project.org >> Onderwerp: Re: [R] For help in R coding >> >> Dear all, >> >> I have one problem and did not find any solution.Please I want your help. >> >> I have two data frames and I want to concatenate them.But the thing is- >> >> two data frames are like this- >> >> V1 ? ? ? ? ? V2 ? ? ? ? ? ? ? A ? ? ? C ? ? ? G ? ? ? T >> 10 ? ?135344109 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344110 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344111 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344112 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344113 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344114 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344115 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344116 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> 10 ? ?135344117 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344118 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> >> second data frame- >> >> V1 ? ? ? ? ? V2 ? ? ? ? ? ? ? A ? ? ? C ? ? ? G ? ? ? T >> 10 ? ?135344111 ? ? ? 1 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344113 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344109 ? ? ? 0 ? ? ? 3 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344114 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?145344115 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344116 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> 10 ? ?132344117 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344118 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> 10 ? ?135344110 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> >> now i have to create a new data frame which has ?insert column 3,4,5 and 6 of >> second data frame in first data frame if the value in second column is same in >> both the data frames (values in V2 column).So the output(new data frame) >> should be- >> >> V1 ? ? ? ? ? V2 ? ? ? ? ? ? ? A ? ? ? C ? ? ? G ? ? ? T ? ? A ? ? ? C ? ? G ? ? ?T >> 10 ? ?135344109 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ?0 ? ? ? ?3 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344110 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ?0 ? ? ? ?1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344111 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ?1 ? ? ? ?0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344113 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ?0 ? ? ? ?0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344114 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344116 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 1 >> 10 ? ?135344118 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ?0 ? ? ? ?0 ? ? ? 0 ? ? ? 1 >> >> I f you see the output, second column values- >> >> ? ? ? ? ? ? ?V2 >> ? ? ? 135344109 >> ? ? ? 135344110 >> ? ? ? 135344111 >> ? ? ? 135344113 >> ? ? ? 135344114 >> ? ? ? 135344116 >> ? ? ? 135344118 >> >> these values are common in both input dataframes. >> >> >> >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Monday, July 04, 2011 12:11 AM >> To: Bansal, Vikas >> Cc: Dennis Murphy; r-help at r-project.org >> Subject: Re: [R] For help in R coding >> >> On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: >> >> > >> > ________________________________________ >> > From: David Winsemius [dwinsemius at comcast.net] >> > Sent: Sunday, July 03, 2011 7:08 PM >> >> >> >> > >> > the code is same i just want to add a condition so that ?it should >> > check that if in column 3, the character is A then make number of A >> > equal to total number of . and , >> > >> > Should I explain better or can you please tell me which thing is not >> > clear? >> >> My second posting today had a solution. >> >> >> > >> >> >> > -- >> > David. >> >> >> >> >> >> >> >> Can you please help me how to use this if condition in your coding or >> >> we can also do it by using some other condition rather than if >> >> condition? >> >> >> > >> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Adrienne Wootten Graduate Research Assistant State Climate Office of North Carolina Department of Marine, Earth and Atmospheric Sciences North Carolina State University
Yes.this is perfect.but can i use the and (&) operator to check if two column have same value.Like- merge(frame1,frame2,by="x&G") according to your data given in previous mail. so the output should be- x G A.x C.x A.y C.y 4 -1 1 1 -1 0 8 -1 0 0 -1 -1 Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: wootten.adrienne at gmail.com [wootten.adrienne at gmail.com] On Behalf Of Adrienne Wootten [amwootte at ncsu.edu] Sent: Tuesday, July 05, 2011 4:49 PM To: Bansal, Vikas Cc: r-help at r-project.org Subject: Re: [R] For help in R coding If I understand correctly, you want to keep the rows from each table which have common values in the second column. In which case, merge will work for this, such as in this example. Say you have these data frames:> frame1x A C G 1 0 -1 2 2 -1 0 -1 3 0 0 -1 4 1 1 -1 5 0 1 0 6 -1 1 -1 7 0 -1 1 8 0 0 -1 9 1 -3 0 10 0 0 0> frame2x A C G 2 0 1 0 4 -1 0 -1 6 1 0 0 8 -1 -1 -1 10 -1 0 -1 12 1 -1 0 14 0 -2 0 16 0 -2 0 18 1 0 -1 20 0 -1 2 and you want to combine these tables and keep the rows which have values of column x in common or getting something like this. x A C G A C G 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which shows that the common values of x between the two tables is 2,4,6,8, and 10. The merge command to do this is:> merge(frame1,frame2,by="x")x A.x C.x G.x A.y C.y G.y 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which is the same as the desired output from before except for the column names. When there are common names between the two data frames being merged, R will add an extension to the column name to make it easy to determine which columns came from which data frame. In this case, the .x extension are columns from the first data frame (frame1 in this example) and the .y extension are columns from the second data frame (frame 2 in this example). I hope this helps! Adrienne On Tue, Jul 5, 2011 at 11:02 AM, Bansal, Vikas <vikas.bansal at kcl.ac.uk> wrote:> > Yes sir.I have already looked at merge() > but as I am new to R,I was not able to understand the argument that how should i create a code for the logic i gave in previous mail . > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] > Sent: Tuesday, July 05, 2011 3:53 PM > To: Bansal, Vikas; David Winsemius > Cc: r-help at r-project.org > Subject: RE: [R] For help in R coding > > Dear Vikas, > > Have at look at ?merge() > > Best regards, > > Thierry > >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> Namens Bansal, Vikas >> Verzonden: dinsdag 5 juli 2011 16:51 >> Aan: David Winsemius >> CC: r-help at r-project.org >> Onderwerp: Re: [R] For help in R coding >> >> Dear all, >> >> I have one problem and did not find any solution.Please I want your help. >> >> I have two data frames and I want to concatenate them.But the thing is- >> >> two data frames are like this- >> >> V1 V2 A C G T >> 10 135344109 0 0 1 0 >> 10 135344110 0 1 0 0 >> 10 135344111 0 0 1 0 >> 10 135344112 0 0 1 0 >> 10 135344113 0 0 1 0 >> 10 135344114 1 0 0 0 >> 10 135344115 1 0 0 0 >> 10 135344116 0 0 0 1 >> 10 135344117 0 1 0 0 >> 10 135344118 0 0 0 1 >> >> second data frame- >> >> V1 V2 A C G T >> 10 135344111 1 0 1 0 >> 10 135344113 0 0 1 0 >> 10 135344109 0 3 1 0 >> 10 135344114 1 0 0 0 >> 10 145344115 1 0 0 0 >> 10 135344116 1 0 0 1 >> 10 132344117 0 1 0 0 >> 10 135344118 0 0 0 1 >> 10 135344110 0 1 0 0 >> >> now i have to create a new data frame which has insert column 3,4,5 and 6 of >> second data frame in first data frame if the value in second column is same in >> both the data frames (values in V2 column).So the output(new data frame) >> should be- >> >> V1 V2 A C G T A C G T >> 10 135344109 0 0 1 0 0 3 1 0 >> 10 135344110 0 1 0 0 0 1 0 0 >> 10 135344111 0 0 1 0 1 0 1 0 >> 10 135344113 0 0 1 0 0 0 1 0 >> 10 135344114 1 0 0 0 1 0 0 0 >> 10 135344116 0 0 0 1 1 0 0 1 >> 10 135344118 0 0 0 1 0 0 0 1 >> >> I f you see the output, second column values- >> >> V2 >> 135344109 >> 135344110 >> 135344111 >> 135344113 >> 135344114 >> 135344116 >> 135344118 >> >> these values are common in both input dataframes. >> >> >> >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Monday, July 04, 2011 12:11 AM >> To: Bansal, Vikas >> Cc: Dennis Murphy; r-help at r-project.org >> Subject: Re: [R] For help in R coding >> >> On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: >> >> > >> > ________________________________________ >> > From: David Winsemius [dwinsemius at comcast.net] >> > Sent: Sunday, July 03, 2011 7:08 PM >> >> >> >> > >> > the code is same i just want to add a condition so that it should >> > check that if in column 3, the character is A then make number of A >> > equal to total number of . and , >> > >> > Should I explain better or can you please tell me which thing is not >> > clear? >> >> My second posting today had a solution. >> >> >> > >> >> >> > -- >> > David. >> >> >> >> >> >> >> >> Can you please help me how to use this if condition in your coding or >> >> we can also do it by using some other condition rather than if >> >> condition? >> >> >> > >> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Adrienne Wootten Graduate Research Assistant State Climate Office of North Carolina Department of Marine, Earth and Atmospheric Sciences North Carolina State University