1st Edition

Clustering in Bioinformatics and Drug Discovery

    244 Pages 63 B/W Illustrations
    by CRC Press

    244 Pages 63 B/W Illustrations
    by CRC Press

    With a DVD of color figures, Clustering in Bioinformatics and Drug Discovery provides an expert guide on extracting the most pertinent information from pharmaceutical and biomedical data. It offers a concise overview of common and recent clustering methods used in bioinformatics and drug discovery.

    Setting the stage for subsequent material, the first three chapters of the book introduce statistical learning theory, exploratory data analysis, clustering algorithms, different types of data, graph theory, and various clustering forms. In the following chapters on partitional, cluster sampling, and hierarchical algorithms, the book provides readers with enough detail to obtain a basic understanding of cluster analysis for bioinformatics and drug discovery. The remaining chapters cover more advanced methods, such as hybrid and parallel algorithms, as well as details related to specific types of data, including asymmetry, ambiguity, validation measures, and visualization.

    This book explores the application of cluster analysis in the areas of bioinformatics and cheminformatics as they relate to drug discovery. Clarifying the use and misuse of clustering methods, it helps readers understand the relative merits of these methods and evaluate results so that useful hypotheses can be developed and tested.

    Introduction
    History
    Bioinformatics and Drug Discovery
    Statistical Learning Theory and Exploratory Data Analysis
    Clustering Algorithms
    Computational Complexity

    Data
    Types
    Normalization and Scaling
    Transformations
    Formats
    Data Matrices
    Measures of Similarity
    Proximity Matrices
    Symmetric Matrices
    Dimensionality, Components, Discriminants
    Graph Theory

    Clustering Forms
    Partitional
    Hierarchical
    Mixture Models
    Sampling
    Overlapping
    Fuzzy
    Self-Organizing
    Hybrids

    Partitional Algorithms
    K-Means
    Jarvis–Patrick
    Spectral Clustering
    Self-Organizing Maps

    Cluster Sampling Algorithms
    Leader Algorithms
    Taylor–Butina Algorithm

    Hierarchical Algorithms
    Agglomerative
    Divisive

    Hybrid Algorithms
    Self-Organizing Tree Algorithm
    Divisive Hierarchical K-Means
    Exclusion Region Hierarchies
    Biclustering

    Asymmetry
    Measures
    Algorithms

    Ambiguity
    Discrete Valued Data Types
    Precision
    Ties in Proximity
    Measure Probability and Distributions
    Algorithm Decision Ambiguity
    Overlapping Clustering Algorithms Based on Ambiguity

    Validation
    Validation Measures
    Visualization
    Example

    Large Scale and Parallel Algorithms
    Leader and Leader-Follower Algorithms
    Taylor–Butina
    K-Means and Variants
    Examples

    Appendices

    Bibliography

    A Glossary and Exercises appear at the end of each chapter.

    Biography

    John D. MacCuish is the founder and president of Mesa Analytics & Computing, Inc. He has co-authored several software patents and has worked on many image processing, data mining, and statistical modeling applications, including IRS fraud detection, credit card fraud detection, and automated reasoning systems for drug discovery.

    Norah E. MacCuish is the chief science officer of Mesa Analytics & Computing, Inc., where she acts as a consultant in the areas of drug design and compound acquisition and as a developer of commercial chemical information software products. She earned her Ph.D. in theoretical physical chemistry from Cornell University.

    John trained in computer science and has been involved with data mining and statistical analysis; Norah trained as a theoretical physical chemist and has mostly worked for pharmaceutical companies on drug discovery. They run a company that merges their fields, and it is that overlap that they describe here. They explain how cluster analysis, an exploratory data analysis tool, is used in bioinformatics and cheminformatics as they relate to drug discovery. The goal is for practitioners to be aware of the relative merits of clustering methods with the data they have at hand.
    SciTech Book News, February 2011

    … In this volume, the authors present sufficient options so that the user can choose the appropriate method for their data. … Practitioners in the pharmaceutical industry need an expert guide, which the authors of this book provide, to extract the most information from their data. Those of us who learned their clustering from Anderberg, Sokal and Sneath, and Willett now have a valuable additional resource suitable for the 21st century.
    —From the Foreword by John Bradshaw, Barley, Hertfordshire, UK