thr3ads.net - R help - [R] Mismatch distribution [Jan 2019]

If this information is useful, please help other people find it:
Share via:

Myriam Croze

2019-Jan-22 01:28 UTC

[R] Mismatch distribution

Hello!

I need your help. I am trying to calculate the pairwise differences between
sequences from several fasta files.
I would like for each of my DNA alignments (fasta files), calculate the
pairwise differences and then:
- 1. Combine all the data of each file to have one file and one histogram
(mismatch distribution)
- 2. calculate the mean for each difference for all the file and again make
a mismatch distribution plot

Here the script that I wrote:

library("pegas")> library("seqinr")
> library("ggplot2")
>
>
> Files <- list.files(pattern="fas")
> nb_files <- length(Files)
>
>
> for (i in 1:nb_files) {
>         Dist <-  as.numeric(dist.gene(read.dna(Files[i],
"fasta"), method
> = "pairwise",
>                            pairwise.deletion = FALSE, variance = FALSE))
>
>         Data <- merge(Data, Dist, by=c("x"), all=T)
>     }
>
> hist(Data, prob=TRUE)
> lines(density(Data), col="blue", lwd=2)
>
However, the script does not work and I do not know what to change to make
it working.
Thanks in advance for your help.

Myriam

-- 
Myriam Croze, PhD
Post-doctorante
Division of EcoScience,
Ewha Womans University
Seoul, South Korea

Email: myriam.croze07 at gmail.com

	[[alternative HTML version deleted]]

Bert Gunter

2019-Jan-22 02:08 UTC

head link

[R] Mismatch distribution

"Do not work" does not work (in providing sufficient info). See the
Posting
guide  linked below for how to post an intelligible question.

HOWEVER, I suspect you would do better posting on te Bioconductor list
where they are much more likely to know what "fasta" files look like
and
might even have software already developed to do what you want. You could
well be trying to reinvent wheels.

Cheers,
Bert


On Mon, Jan 21, 2019 at 5:35 PM Myriam Croze <myriam.croze07 at gmail.com>
wrote:
> Hello!
>
> I need your help. I am trying to calculate the pairwise differences between
> sequences from several fasta files.
> I would like for each of my DNA alignments (fasta files), calculate the
> pairwise differences and then:
> - 1. Combine all the data of each file to have one file and one histogram
> (mismatch distribution)
> - 2. calculate the mean for each difference for all the file and again make
> a mismatch distribution plot
>
> Here the script that I wrote:
>
> library("pegas")
> > library("seqinr")
> > library("ggplot2")
> >
> >
>
> > Files <- list.files(pattern="fas")
> > nb_files <- length(Files)
> >
> >
> > for (i in 1:nb_files) {
> >         Dist <-  as.numeric(dist.gene(read.dna(Files[i],
"fasta"), method
> > = "pairwise",
> >                            pairwise.deletion = FALSE, variance =
FALSE))
> >
> >         Data <- merge(Data, Dist, by=c("x"), all=T)
> >     }
> >
>
>
> > hist(Data, prob=TRUE)
> > lines(density(Data), col="blue", lwd=2)
> >
>
> However, the script does not work and I do not know what to change to make
> it working.
> Thanks in advance for your help.
>
> Myriam
>
> --
> Myriam Croze, PhD
> Post-doctorante
> Division of EcoScience,
> Ewha Womans University
> Seoul, South Korea
>
> Email: myriam.croze07 at gmail.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Boris Steipe

2019-Jan-22 02:52 UTC

head link

[R] Mismatch distribution

Myriam -

This is the right list in principle, all the packages you use are CRAN packages,
not Bioconductor.

However I am at a loss as to how you wrote your code: both pegas and seqinr have
"read.<something>()" functions, but neither has read.dna();
similarly both pegas and seqinr have "dist.<something>()"
functions, but neither has dist.gene(). Did you just extrapolate those function
names and parameters from other function calls?

In any case: please start from a minimal, reproducible example that comes close
to what you are trying to achieve, then post again. Here are the three URLs we
usually recommend to get things started. Use a small number of small example
files, don't nest your expressions until you are sure they produce what you
think they do, and take it step by step.

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
http://adv-r.had.co.nz/Reproducibility.html
https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)

Cheers,
B

-


> On 2019-01-21, at 21:08, Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> 
> "Do not work" does not work (in providing sufficient info). See
the Posting
> guide  linked below for how to post an intelligible question.
> 
> HOWEVER, I suspect you would do better posting on te Bioconductor list
> where they are much more likely to know what "fasta" files look
like and
> might even have software already developed to do what you want. You could
> well be trying to reinvent wheels.
> 
> Cheers,
> Bert
> 
> 
> On Mon, Jan 21, 2019 at 5:35 PM Myriam Croze <myriam.croze07 at
gmail.com>
> wrote:
> 
>> Hello!
>> 
>> I need your help. I am trying to calculate the pairwise differences
between
>> sequences from several fasta files.
>> I would like for each of my DNA alignments (fasta files), calculate the
>> pairwise differences and then:
>> - 1. Combine all the data of each file to have one file and one
histogram
>> (mismatch distribution)
>> - 2. calculate the mean for each difference for all the file and again
make
>> a mismatch distribution plot
>> 
>> Here the script that I wrote:
>> 
>> library("pegas")
>>> library("seqinr")
>>> library("ggplot2")
>>> 
>>> 
>> 
>>> Files <- list.files(pattern="fas")
>>> nb_files <- length(Files)
>>> 
>>> 
>>> for (i in 1:nb_files) {
>>>        Dist <-  as.numeric(dist.gene(read.dna(Files[i],
"fasta"), method
>>> = "pairwise",
>>>                           pairwise.deletion = FALSE, variance =
FALSE))
>>> 
>>>        Data <- merge(Data, Dist, by=c("x"), all=T)
>>>    }
>>> 
>> 
>> 
>>> hist(Data, prob=TRUE)
>>> lines(density(Data), col="blue", lwd=2)
>>> 
>> 
>> However, the script does not work and I do not know what to change to
make
>> it working.
>> Thanks in advance for your help.
>> 
>> Myriam
>> 
>> --
>> Myriam Croze, PhD
>> Post-doctorante
>> Division of EcoScience,
>> Ewha Womans University
>> Seoul, South Korea
>> 
>> Email: myriam.croze07 at gmail.com
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Jan 2019 - Mismatch distribution

[R] Mismatch distribution

[R] Mismatch distribution

[R] Mismatch distribution