CICC Web Resources
From Chemical Informatics and Cyberinfrastructure Collaboratory
Contents |
Introduction
This page consolidates all of our Web Service Infrastructure, Web Portal Development and other Web Clients, Workflows, and downloadable software. In summary,
- A Web Service provides an online capability that is invokable through a machine-readable XML interface.
- Web Service clients provide example human user interfaces to these services. Services can have more than one client interface. Clients can be aggregated into portals.
- Workflows are clients to collections of services that are executed in a particular order chosen by the workflow author. A workflow typically encodes a scientific use case. We build workflows of services with both Taverna (free software, examples below) and Pipeline Pilot.
Web Service Clients
We first list example client interfaces since these are the most user friendly. Services that they invoke follow.
Web Portal
We are developing JSR 168 compliant portals and portlet components to provide user interfaces to Web Services and workflows. Our portal is available from http://www.chembiogrid.org/gridsphere and includes publicly available services (i.e. no login required).
Standalone Web Client Interfaces
We have developed numerous standalone Web interfaces to our services. These can be aggregated into portals.
As a general rule, these are placed under http://www.chembiogrid.org/cheminfo/, so you can browse the listing there for services. Some specific examples include
- PubDock Docking Database - an interface to the results of docking (using Fred) a drug-like subset of PubChem into a series of targets. The aim is to eventually perform docking runs againt 1700 targets for which a binding site is known. Currently we have results for seven targets. The database stores the best ligand pose, the protein target as well as the components of four scoring functions implemented in Fred
- Pub3D 3D structure Database - an interface to the 3D structures of nearly 10M PubChem compounds, obtained using MMFF94. We currently only store a single low energy conformer for each structure.
- PubChem Frequent Hitters - get a summary of the PubChem assays in which a set of compounds are active and inactive. This service does not depend on any underlying model of promiscuous compounds and simply reports in which assays a given compound is active (as well as the number of compounds it is inactive) in.
- Scripps MLSCN Toxicity Predictions - obtain toxicity predictions based on a random forest ensemble using the Scripps MLSCN cytotoxicity data. The model is actually an ensemble of 10 random forest models which allows us to avoid a severly imbalanced classification problem.
- ToxTree toxicity hazard predictions. This is based on a decision tree algorithm
- NTP DTP Anti-Cancer Activity Predictions - obtain predictions of whether a compounds will exhibit anti-cancer activity against the first 40 cell lines from the NCI DTP. Each cell line is modeled by a single random forest model.
- Ames Mutagenecity Predictions - predict whether a molecule will be mutagenic or not as measured by the Ames test, based on a random forest model
- Feature selection for OLS models - Identify a subset of features out of a larger pool, that will lead to good OLS model
- Automated model generation for a given set of descriptors and dependent variable
- Pharmacokinetic Parameter Calculator - calculate pharmacokinetic parameters for drug like compounds
- Kemo - a natural language interface to PubChem
- Generate RSS feeds for PubChem queries. We currently provide feeds for synonym searches and docking searches. In the latter one can enter a synonym and get a feed where each item is a PubChem compound that contains the specified synonym. The docking feed returns a list of PubChem compounds that match the users specifications in terms of score values, score functions etc. Each feed contains 2D information (and 3D if available) which can be viewed in Bioclipse or Jmol
- Statistical Model Downloads - download predictive models in the R binary format. Allows you to download a predictive model built in R and load it into your local R session
- Miscellaneous cheminformatics web service clients - includes TPSA, 2D structure diagrams, and 2D similarity
Varuna Web Client Examples
We are developing a system called Varuna to demonstrate the feasibility of a large scale cyberinfrastructure enabled computational chemistry database. This initial prototype is specific to the Jaguar program, but later versions will be fully generalized. The following example clients are for file conversion and information extraction.
Clients for several more Varuna services are under development (see Web Service Infrastructure).
- File operation (converting file formats for chemical computation)
http://129.79.139.29/filecon/Default.aspx
- Result analysis (extracting useful information from chemical computation)
http://129.79.139.29/utilityclient/Default.aspx
VOTables Clients
VOTables is an XML representation for tabular data originally developed by the National Virtual Observatory community. VOTable services create and manipulate these tables, which can be imported and exported as Excel spreadsheets.
- Sample VOPlot applet (demo currently requires Internet Explorer running on Windows PC)
- Combine VOTables and R web services to generate an OLS model from data in a Excel file
CICC Web Services
The following links connect to WSDL application interfaces, which can be used to generate client code for both Workflows and Web Applications. Note these Web Services include service wrappers around both commercial (BCI and OpenEye) applications and free code. The former are licensed and available to Indiana University researchers.
Database Services
More details on these services can be found at Databases_Projects.
- Local NIH Database
- Local PubChem and derivative databases. These services are essentially wrapped queries. Naturally there may be queries that you'd like to see but are not present. If so let me know.
- PubChem Structure (Usage) Provides methods to get Pubchem Compound information
- PubChem Synonyms (Usage) Provides methods to get synonyms given a compound or substance ID. SMILES support coming
- PubChem Derived properties (Usage) Get calculated properties (SLogP and SMRef) given a compound ID. Can also search via exact values and ranges (but this is very slow at the moment).
- PuBChem Docking results (Usage) Provides methods to get the docked structures (for a given target) for PubChem compounds based on CID, sorted score values or by SMARTS patterns. Ligands are returned in SDF format. Currently only the ligand structures are accessible and the actual score values are coming soon.
- PubChem 3D structures (Usage) This service provides access to MMFF94 optimized 3D structures for PubChem compounds. Structures are returned in SD format and can be accessed by CID or by SMARTS patterns
- Distributed Drug Discovery Database
Commercial Application Services
- OpenEye: These URLs are private and are available on request.
- FRED Docking: this service is the basis for the public results reported in PubDock.
- FILTER Property Calculation and Filtering: used to filter PubChem to identify drug-like molecules.
- OMEGA 2D-3D Conversion
- BCI: these services are private and available on request.
- Various BCI Clustering services (Usage)
Cheminformatics Services
- VOTables: these are service interfaces for creating and manipulating VOTables instances. Sample clients are in the previous section.
- OSCAR3 Web service: invokes OSCAR3 chemical informatics text analysis application to extract chemical information from text, which is summarized in an XML document. This service is based on the OSCAR3 application developed by Peter Murray Rust's group at Cambridge.
- Other Cambridge/WWMM Services These services were developed in collaboration with Peter Murray Rust's group at Cambridge.
- ToxTree services
- MACCS keys
- CDK web services Rajarshi Guha You can access individual servces via PHP clients here. If there are algorithms or features you'd like to see or if you find bugs go to the Sourceforge page and submit a feature request or bug report
- Molecular Similarity (Usage)
- Pair wise 2D similarity
- 2D similarity matrix
- 3D similarity
- Distance moments for use in 3D similarity calculations
- Molecular Descriptors (Usage)
- TPSA
- XLogP
- Surface area
- Arbitrary descriptors
- 2D Structure Diagrams (Usage)
- Druglikeness Methods (Usage)
- Cheminformatics Utility Methods (Usage)
- Fingerprints
- Molecular weight and formulae
- SDF to PDB/CML conversion
- 2D coordinate generation
- Molecular Similarity (Usage)
Statistics/Math Web Services
If there are algorithms or features you'd like to see or if you find bugs go to the Sourceforge page and submit a feature request or bug report.
- Sampling distributions (Usage) - Sample from various well known distributions (normal, exponential, weibull etc)
- Linear Regression (Usage)
- CNN Regression (Usage)
- RF Regression (Usage)
- LDA Classification (Usage)
- K-means Clustering (Usage)
- Feature Selection (Usage) - Stepwise and exhaustive feature selection for linear regression models
- Model Generation(Usage) - Automated model generation for a given set of descriptors. Currently will not do feature selection
- t-Test (Usage)
- XY Plots (Usage) - Generates simple scatter plots
- Histogram Plots (Usage) - Generates simple histogram plots
Varuna Computational Chemistry Services
These services allow the submission and querying of Jaguar Quantum Mechanics and Molecular Mechanics data in our Varuna Database.
- The file operation utilities help to convert file formats frequently used for Quantum mechanics and Molecular mechanics computation. These web services accept the input files and return the converted result in a string array. Some services need the support of the Openbabel. http://129.79.139.29/FileOper/FileOperation.asmx
- The result analysis utilities accept the computed result from Jaguar and ADF package and return the frequency and geometry optimization information. http://129.79.139.29/Utilites/Service.asmx
- The database query service helps the user to locate the project information in the Varuna database. This service also include executing command on the Avidd. http://129.79.139.29/RemoteOper/RemoteCmd.asmx
- The upload web service help user to interact with the Avidd, Varuna and local PC through SFTP or SCP protocol by using the open source SharpSSH library. These web services accept the input files, submit files to the Avidd and store the information into Varuna. http://129.79.139.29/RemoteOper/Service.asmx
Workflows
We are developing computational workflows using our web service infrastructure and the open-source Taverna workflow tool. The emphasis is on developing workflows which encapsulate important processes in chemoinformatics and drug design, which use diverse kinds of information together in novel ways, and which are of demonstrated scientific merit.
Below are descriptions of some of the workflows that we have developed, along with example output.
A Taverna Tutorial
A simple, "getting started" tutorial for Taverna is available from http://communitygrids.blogspot.com/2006/01/getting-started-with-taverna.html. A movie of this is available from http://www.chembiogrid.org/presentations/Movies/TavernaStrContDemo.avi.
Workflow 1 - FInding relationships between compounds and proteins
NIH SIM SEARCH -> FILTER -> OMEGA -> FRED -> JMOL/HTML
This workflow is a sequence of performing a similarity search on the NIH DTP Human Tumor data, filtering the results based on Pharmacokinetic properties (FILTER), converting to 3D (OMEGA), docking into a pre-defined protein (FRED) and visualizing (JMOL). This workflow opens up various possibilities, including:
- Finding similar structures in the DTP to existing ligands for tumor-related proteins from the PDB, and correlation of docking scores with cell-line assay results. Resultant hypothesisizing about which proteins are involved in which tumors
- Testing the possible effectiveness of DTP compounds in other areas (e.g. Alzheimer's disease - see Alzheimer's Workflow) by docking structures to PDB proteins from that therapeutic area.
- Integration of this workflow with other tools such as Sentient Desktop - see example of using Workflow 1 with Alzheimers Disease in Sentient.
Workflow 2 - HTS data organization and flagging
NIH SCREEN RETRIEVE -> FILTER -> TOXICITY FLAG -> SERIES GENERATION (Divkm) -> VISUALIZATION (VOPlot, 2Dviewer)
An AVI movie of this workflow running in Taverna is available from here.
This workflow demonstrates how screening data can be flagged and organized for human analysis. The compounds and data values for a particular screen are retrieved, and then are filtered to remove compounds with reactive groups, etc. ToxTree is used to flag the potential toxicities of compounds. Divkmeans is used to add a column of cluster numbers. Finally, the results are visualized using VOPlot and the 2D viewer applet.
Sample Workflows - Some Part of the Above, Prototypes, or Test Workflows
An example of the workflow2 for HTS data organization and flagging
ToxTreeVerbose of ToxTreeServer
Stand Alone Programs and Packages
These programs are generally available as source code and in some cases have not been fully packaged. Some of these utilize or access PubChem and our derived databases via web services.
- PubchemSR, a .NET program for PC's to allow access to PubChem and integration with Microsoft Excel
- SMILES to 3D coordinate generator. The generator consists of two part. The first program converts SMILES to a set of rough 3D coordinates (using stochastc proximity embedding). The second program optimizes the rough coordinates using MMFF94. Currently the program is not packaged for an easy install, but is easy to build and requires the Scons tool.
- R packages. These can be installed directly fom CRAN using install.packages()
- Fingerprint manipulation supporting CDK, BCI and MOE formats - http://cran.r-project.org/src/contrib/Descriptions/fingerprint.html
- Integration of the CDK with R - http://cran.r-project.org/src/contrib/Descriptions/rcdk.html
- Accessing PubChem from within R - http://cran.r-project.org/src/contrib/Descriptions/rpubchem.html
- PubDock plugin for Chimera which allows you to visualize the contents of the docking database within Chimera. Uses the web service interface to the docking database to retrieve structure information. Also utilizes the CDK 2D structure diagram service.
