1st Edition
Clustering in Bioinformatics and Drug Discovery
With a DVD of color figures, Clustering in Bioinformatics and Drug Discovery provides an expert guide on extracting the most pertinent information from pharmaceutical and biomedical data. It offers a concise overview of common and recent clustering methods used in bioinformatics and drug discovery.
Setting the stage for subsequent material, the first three chapters of the book introduce statistical learning theory, exploratory data analysis, clustering algorithms, different types of data, graph theory, and various clustering forms. In the following chapters on partitional, cluster sampling, and hierarchical algorithms, the book provides readers with enough detail to obtain a basic understanding of cluster analysis for bioinformatics and drug discovery. The remaining chapters cover more advanced methods, such as hybrid and parallel algorithms, as well as details related to specific types of data, including asymmetry, ambiguity, validation measures, and visualization.
This book explores the application of cluster analysis in the areas of bioinformatics and cheminformatics as they relate to drug discovery. Clarifying the use and misuse of clustering methods, it helps readers understand the relative merits of these methods and evaluate results so that useful hypotheses can be developed and tested.
Introduction
History
Bioinformatics and Drug Discovery
Statistical Learning Theory and Exploratory Data Analysis
Clustering Algorithms
Computational Complexity
Data
Types
Normalization and Scaling
Transformations
Formats
Data Matrices
Measures of Similarity
Proximity Matrices
Symmetric Matrices
Dimensionality, Components, Discriminants
Graph Theory
Clustering Forms
Partitional
Hierarchical
Mixture Models
Sampling
Overlapping
Fuzzy
Self-Organizing
Hybrids
Partitional Algorithms
K-Means
Jarvis–Patrick
Spectral Clustering
Self-Organizing Maps
Cluster Sampling Algorithms
Leader Algorithms
Taylor–Butina Algorithm
Hierarchical Algorithms
Agglomerative
Divisive
Hybrid Algorithms
Self-Organizing Tree Algorithm
Divisive Hierarchical K-Means
Exclusion Region Hierarchies
Biclustering
Asymmetry
Measures
Algorithms
Ambiguity
Discrete Valued Data Types
Precision
Ties in Proximity
Measure Probability and Distributions
Algorithm Decision Ambiguity
Overlapping Clustering Algorithms Based on Ambiguity
Validation
Validation Measures
Visualization
Example
Large Scale and Parallel Algorithms
Leader and Leader-Follower Algorithms
Taylor–Butina
K-Means and Variants
Examples
Appendices
Bibliography
A Glossary and Exercises appear at the end of each chapter.
Biography
John D. MacCuish is the founder and president of Mesa Analytics & Computing, Inc. He has co-authored several software patents and has worked on many image processing, data mining, and statistical modeling applications, including IRS fraud detection, credit card fraud detection, and automated reasoning systems for drug discovery.
Norah E. MacCuish is the chief science officer of Mesa Analytics & Computing, Inc., where she acts as a consultant in the areas of drug design and compound acquisition and as a developer of commercial chemical information software products. She earned her Ph.D. in theoretical physical chemistry from Cornell University.
John trained in computer science and has been involved with data mining and statistical analysis; Norah trained as a theoretical physical chemist and has mostly worked for pharmaceutical companies on drug discovery. They run a company that merges their fields, and it is that overlap that they describe here. They explain how cluster analysis, an exploratory data analysis tool, is used in bioinformatics and cheminformatics as they relate to drug discovery. The goal is for practitioners to be aware of the relative merits of clustering methods with the data they have at hand.
—SciTech Book News, February 2011… In this volume, the authors present sufficient options so that the user can choose the appropriate method for their data. … Practitioners in the pharmaceutical industry need an expert guide, which the authors of this book provide, to extract the most information from their data. Those of us who learned their clustering from Anderberg, Sokal and Sneath, and Willett now have a valuable additional resource suitable for the 21st century.
—From the Foreword by John Bradshaw, Barley, Hertfordshire, UK