Visualization of Structure-Activity Relationships of
Large Scale Chemical Database
The problems of similarity and diversity sampling and compound clustering are fundamental research problems in database analysis.
Their solutions have great potential in benefitting computer-aided drug design and other bioinformatics applications. The goal of
this project is to develop a program package for visualizing, analyzing, and managing large biochemical databases. The package
will include a web-based graphic interface program and a built-in cross-link connecting data, structures, and reactions so that it
can be widely used in similarity and diversity applications to drug design.
The core part of the package is a visualization algorithm, which has been studied extensively in our previous work. In this
project, we will extend the capability of the algorithm for handling very large databases, raise the accuracy of the
two-dimensional (2D or 3D) mapping in retaining the original structure-activity relationships, and speed up the generation
process of the 2D mapping by using advanced numerical technologies
(such as domain-decomposition, fast Fourier transform, and preconditioning). Further, based on the 2D mapping of database
generated by the visualization algorithm, efficient sampling algorithms will be developed for sampling similar and diverse
chemical compounds.
We will perform this project in close collaboration with Prof. Tamar Schlick at the New York University and two medicinal
scientists at the Merck Research Laboratory in New Jersey. The Merck scientists will provide real drug datasets to us throughout
the project. They will also implement our package for their drug discovery efforts at Merck, which will ensure the commercial
utility of the program package.