CICC Web Resources

From Chemical Informatics and Cyberinfrastructure Collaboratory

Contents

Introduction

This page consolidates all of our Web Service Infrastructure, Web Portal Development and other Web Clients, Workflows, and downloadable software. In summary,

  • A Web Service provides an online capability that is invokable through a machine-readable XML interface.
  • Web Service clients provide example human user interfaces to these services. Services can have more than one client interface. Clients can be aggregated into portals.
  • Workflows are clients to collections of services that are executed in a particular order chosen by the workflow author. A workflow typically encodes a scientific use case. We build workflows of services with both Taverna (free software, examples below) and Pipeline Pilot.

Web Service Clients

We first list example client interfaces since these are the most user friendly. Services that they invoke follow.

Web Portal

We are developing JSR 168 compliant portals and portlet components to provide user interfaces to Web Services and workflows. Our portal is available from http://www.chembiogrid.org/gridsphere and includes publicly available services (i.e. no login required).

Standalone Web Client Interfaces

We have developed numerous standalone Web interfaces to our services. These can be aggregated into portals.

As a general rule, these are placed under http://www.chembiogrid.org/cheminfo/, so you can browse the listing there for services. Some specific examples include

  • PubDock Docking Database - an interface to the results of docking (using Fred) a drug-like subset of PubChem into a series of targets. The aim is to eventually perform docking runs againt 1700 targets for which a binding site is known. Currently we have results for seven targets. The database stores the best ligand pose, the protein target as well as the components of four scoring functions implemented in Fred
  • Pub3D 3D structure Database - an interface to the 3D structures of nearly 10M PubChem compounds, obtained using MMFF94. We currently only store a single low energy conformer for each structure.
  • PubChem Frequent Hitters - get a summary of the PubChem assays in which a set of compounds are active and inactive. This service does not depend on any underlying model of promiscuous compounds and simply reports in which assays a given compound is active (as well as the number of compounds it is inactive) in.
  • Scripps MLSCN Toxicity Predictions - obtain toxicity predictions based on a random forest ensemble using the Scripps MLSCN cytotoxicity data. The model is actually an ensemble of 10 random forest models which allows us to avoid a severly imbalanced classification problem.
  • NTP DTP Anti-Cancer Activity Predictions - obtain predictions of whether a compounds will exhibit anti-cancer activity against the first 40 cell lines from the NCI DTP. Each cell line is modeled by a single random forest model.
  • Generate RSS feeds for PubChem queries. We currently provide feeds for synonym searches and docking searches. In the latter one can enter a synonym and get a feed where each item is a PubChem compound that contains the specified synonym. The docking feed returns a list of PubChem compounds that match the users specifications in terms of score values, score functions etc. Each feed contains 2D information (and 3D if available) which can be viewed in Bioclipse or Jmol

Varuna Web Client Examples

We are developing a system called Varuna to demonstrate the feasibility of a large scale cyberinfrastructure enabled computational chemistry database. This initial prototype is specific to the Jaguar program, but later versions will be fully generalized. The following example clients are for file conversion and information extraction.

Clients for several more Varuna services are under development (see Web Service Infrastructure).

  • File operation (converting file formats for chemical computation)

http://129.79.139.29/filecon/Default.aspx

  • Result analysis (extracting useful information from chemical computation)

http://129.79.139.29/utilityclient/Default.aspx

VOTables Clients

VOTables is an XML representation for tabular data originally developed by the National Virtual Observatory community. VOTable services create and manipulate these tables, which can be imported and exported as Excel spreadsheets.

CICC Web Services

The following links connect to WSDL application interfaces, which can be used to generate client code for both Workflows and Web Applications. Note these Web Services include service wrappers around both commercial (BCI and OpenEye) applications and free code. The former are licensed and available to Indiana University researchers.

Database Services

More details on these services can be found at Databases_Projects.

    • Local NIH Database
    • Local PubChem and derivative databases. These services are essentially wrapped queries. Naturally there may be queries that you'd like to see but are not present. If so let me know.
      • PubChem Structure (Usage) Provides methods to get Pubchem Compound information
      • PubChem Synonyms (Usage) Provides methods to get synonyms given a compound or substance ID. SMILES support coming
      • PubChem Derived properties (Usage) Get calculated properties (SLogP and SMRef) given a compound ID. Can also search via exact values and ranges (but this is very slow at the moment).
      • PuBChem Docking results (Usage) Provides methods to get the docked structures (for a given target) for PubChem compounds based on CID, sorted score values or by SMARTS patterns. Ligands are returned in SDF format. Currently only the ligand structures are accessible and the actual score values are coming soon.
      • PubChem 3D structures (Usage) This service provides access to MMFF94 optimized 3D structures for PubChem compounds. Structures are returned in SD format and can be accessed by CID or by SMARTS patterns
    • Distributed Drug Discovery Database

Commercial Application Services

  • OpenEye: These URLs are private and are available on request.
    • FRED Docking: this service is the basis for the public results reported in PubDock.
    • FILTER Property Calculation and Filtering: used to filter PubChem to identify drug-like molecules.
    • OMEGA 2D-3D Conversion
  • BCI: these services are private and available on request.
    • Various BCI Clustering services (Usage)

Cheminformatics Services

  • OSCAR3 Web service: invokes OSCAR3 chemical informatics text analysis application to extract chemical information from text, which is summarized in an XML document. This service is based on the OSCAR3 application developed by Peter Murray Rust's group at Cambridge.

Statistics/Math Web Services

If there are algorithms or features you'd like to see or if you find bugs go to the Sourceforge page and submit a feature request or bug report.

Varuna Computational Chemistry Services

These services allow the submission and querying of Jaguar Quantum Mechanics and Molecular Mechanics data in our Varuna Database.

  • The file operation utilities help to convert file formats frequently used for Quantum mechanics and Molecular mechanics computation. These web services accept the input files and return the converted result in a string array. Some services need the support of the Openbabel. http://129.79.139.29/FileOper/FileOperation.asmx
  • The upload web service help user to interact with the Avidd, Varuna and local PC through SFTP or SCP protocol by using the open source SharpSSH library. These web services accept the input files, submit files to the Avidd and store the information into Varuna. http://129.79.139.29/RemoteOper/Service.asmx

Workflows

We are developing computational workflows using our web service infrastructure and the open-source Taverna workflow tool. The emphasis is on developing workflows which encapsulate important processes in chemoinformatics and drug design, which use diverse kinds of information together in novel ways, and which are of demonstrated scientific merit.

Below are descriptions of some of the workflows that we have developed, along with example output.

A Taverna Tutorial

A simple, "getting started" tutorial for Taverna is available from http://communitygrids.blogspot.com/2006/01/getting-started-with-taverna.html. A movie of this is available from http://www.chembiogrid.org/presentations/Movies/TavernaStrContDemo.avi.

Workflow 1 - FInding relationships between compounds and proteins

NIH SIM SEARCH -> FILTER -> OMEGA -> FRED -> JMOL/HTML

Examples of workflow output

This workflow is a sequence of performing a similarity search on the NIH DTP Human Tumor data, filtering the results based on Pharmacokinetic properties (FILTER), converting to 3D (OMEGA), docking into a pre-defined protein (FRED) and visualizing (JMOL). This workflow opens up various possibilities, including:

  • Finding similar structures in the DTP to existing ligands for tumor-related proteins from the PDB, and correlation of docking scores with cell-line assay results. Resultant hypothesisizing about which proteins are involved in which tumors
  • Testing the possible effectiveness of DTP compounds in other areas (e.g. Alzheimer's disease - see Alzheimer's Workflow) by docking structures to PDB proteins from that therapeutic area.
  • Integration of this workflow with other tools such as Sentient Desktop - see example of using Workflow 1 with Alzheimers Disease in Sentient.

Workflow 2 - HTS data organization and flagging

NIH SCREEN RETRIEVE -> FILTER -> TOXICITY FLAG -> SERIES GENERATION (Divkm) -> VISUALIZATION (VOPlot, 2Dviewer)

Example of Workflow Output

An AVI movie of this workflow running in Taverna is available from here.

This workflow demonstrates how screening data can be flagged and organized for human analysis. The compounds and data values for a particular screen are retrieved, and then are filtered to remove compounds with reactive groups, etc. ToxTree is used to flag the potential toxicities of compounds. Divkmeans is used to add a column of cluster numbers. Finally, the results are visualized using VOPlot and the 2D viewer applet.

Sample Workflows - Some Part of the Above, Prototypes, or Test Workflows

An example of the workflow2 for HTS data organization and flagging

ToxTreeBrief of ToxTreeServer

ToxTreeVerbose of ToxTreeServer

Stand Alone Programs and Packages

These programs are generally available as source code and in some cases have not been fully packaged. Some of these utilize or access PubChem and our derived databases via web services.

  • PubDock plugin for Chimera which allows you to visualize the contents of the docking database within Chimera. Uses the web service interface to the docking database to retrieve structure information. Also utilizes the CDK 2D structure diagram service.