Visualization of Structure-Activity Relationships of 

Large Scale Chemical Database



The problems of similarity and diversity sampling and compound clustering are fundamental research problems in database analysis. Their solutions have great potential in benefitting computer-aided drug design and other bioinformatics applications. The goal of this project is to develop a program package for visualizing, analyzing, and managing large biochemical databases. The package will include a web-based graphic interface program and a built-in cross-link connecting data, structures, and reactions so that it can be widely used in similarity and diversity applications to drug design.

The core part of the package is a visualization algorithm, which has been studied extensively in our previous work. In this project, we will extend the capability of the algorithm for handling very large databases, raise the accuracy of the two-dimensional (2D or 3D) mapping in retaining the original structure-activity relationships, and speed up the generation process of the 2D mapping by using advanced numerical technologies
(such as domain-decomposition, fast Fourier transform, and preconditioning). Further, based on the 2D mapping of database generated by the visualization algorithm, efficient sampling algorithms will be developed for sampling similar and diverse chemical compounds.

We will perform this project in close collaboration with Prof. Tamar Schlick at the New York University and two medicinal scientists at the Merck Research Laboratory in New Jersey. The Merck scientists will provide real drug datasets to us throughout the project. They will also implement our package for their drug discovery efforts at Merck, which will ensure the commercial utility of the program package.