On Tue, Oct 12, 2010 at 8:54 PM, Eric Hu <Eric.Hu at gilead.com>
wrote:> Hi,
>
> I am trying to see if I can use R to perform more rigorous regression
> analysis. I wonder if the fingerprint package is able to handle pipeline
> pilot fingerprints (ECFC6 etc) now.
Currently no - does Pipeline Pilot out put their ECFP's in a standard
format? if so can you send me an example file? (Asuming they output
fp's for a single molecule on a single row, you could implement your
own line parse and supply it via the lf argument in fp.read. See
cdk.lf, moe.lf or bci.lf for examples)
The other issue is how one evaluates similarity between variable
length feature fingerprints, such as ECFPs. One approach is to map the
features into a fixed length bit string. Another approach is to just
look at intersections and unions of features to evaluate the Tanimoto
score. It seems to me that the former leads to loss of resolution and
that the latter could lead to generally low Tanimoto scores.
Do you know what Pipeline Pilot does?
--
Rajarshi Guha
NIH Chemical Genomics Center