David Wild Update June 2006

From Chemical Informatics and Cyberinfrastructure Collaboratory

Contents

Web Service Infrastructure

  • New services
    • DTP Database Screen Search
    • Protein Database
    • ToxTree
    • New VOTABLES services (TAB->VOTABLE, VOTABLE->TAB)
    • Distributed Drug Discovery in progress
  • Decision to use tab files (for the moment)
    • Mk.1 Tab File, Mk.2 VOTABLES, Mk.3 CML?

Workflows

  • Implemented key HTS Data Analysis workflow (Workflow 2):
SCREEN SEARCH -> FILTER -> (TOXTREE) -> DIVKMEANS -> VOTABLES -> VOPLOT
  • Taverna 1.4 now available including visual workflow builder
  • Pilot project with Plale/Gannon to illustrate how our Taverna framework can be easily integrated with other environments
  • See Workflows for more information

Data Mining of DTP Database

  • Major review of current work, and identification of areas of opportunity
  • Characterization of database in terms of diversity, compound profiles, similarity to HTS datasets
  • Collaboration with Faming Zhang

Visualization and Interaction Tools

  • First portlet application in Gridsphere - BCI clustering portlet
  • Developments to PubChemSR - similarity searching
  • Usability experiment on Pubchem/Chmoogle/ChemDB completed. Developing wider package of usability experiments (outside grant...)

Methods

  • Demonstrated ability to cluster PubChem in 5-6 hours on AVIDD (20 procs)

Outreach

  • Begun Collaborating with Michigan MACE
  • Potential connection with OpenEye
  • Potential connection with Jake Chen
  • ACS presentation accepted for September
  • Microsoft presentation September/October
  • Publication (Wild, Wiggins) in Drug Discovery Today, May 2006, on chemoinformatics education (in addition to JCIM article)
  • Publicaton (Wild, Wu, Zaharevitz) on DTP Data Mining, submit July
  • Publication on Web Service Infrastructure, submit July

Six Month Outlook

  • Summary so far
    • Implemented 12 chemoinformatics web services (each service can provide multiple functions)
    • Implemented 2 drug discovery related workflows using these services
    • Several prototype visualization / interaction tools
    • Started mining the DTP Tumor Cell Line Set
    • Demonstrated feasibility of clustering entire PubChem dataset
    • Begun collaborations with PMR group, Michigan MACE, Faming Zhang, DTP
    • Posters at spring ACS, presentations at fall ACS and Microsoft, 2 papers in preparation
  • Must have
    • Scientific successes using workflows
      • Faming Zhang Kinase collaboration
      • DTP data mining with PDB
      • Analysis of MLSCN data in PubChem (Workflow 2)
    • Expanded Workflow 2
      • Add in 2D structure viewer to VOPlot
      • ToxTree
      • Any QSAR models available
      • PDBBind
    • 2 more workflows
    • A non-trivial portlet interface talking multiple workflow streams
    • OSCAR success with PMR collaboration (Matt Stahl concurs)
    • Publications on DTP Data Mining and Web Service Infrastructure
    • Some success with Michigan collaboration (PDBBind, Workshop, Chemoinformatics Course)
  • Nice to have
    • Execution environment for Taverna, including ability to wrap workflows as web services and possibly BEPL support
    • R statistical and QSAR services
    • Something with a .NET interface
    • Scientific Workflow using reaction database from Distributed Drug Discovery
    • Publication on DivKmeans

Points for discussion

  • Use of OEChem and potential for NIH buyout
  • Joint workshop with MACE