Quality-based distance measures and applications to clustering

Darin M. Taverna, Marcel Brun, Edward R. Dougherty, Yidong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice of clustering algorithm. Furthermore, the biological relevance of the clustering results is biased by the variation among the data points themselves. We introduce a mathematical quality-based distance metric which will allow all data, regardless of its error, to be included in analysis without the need to introduce a cutoff. This removes the need to exclude points or to change the dimensionality. The advantage of this approach is shown by clustering simulated data with added noise.

Original languageEnglish (US)
Title of host publication2006 IEEE/NLM Life Science Systems and Applications Workshop, LiSA 2006
DOIs
StatePublished - 2006
Externally publishedYes
Event2006 IEEE/NLM Life Science Systems and Applications Workshop, LiSA 2006 - Bethesda, MD, United States
Duration: Jul 13 2006Jul 14 2006

Publication series

Name2006 IEEE/NLM Life Science Systems and Applications Workshop, LiSA 2006

Other

Other2006 IEEE/NLM Life Science Systems and Applications Workshop, LiSA 2006
Country/TerritoryUnited States
CityBethesda, MD
Period7/13/067/14/06

ASJC Scopus subject areas

  • Health(social science)
  • Assessment and Diagnosis
  • General Medicine
  • Health Information Management
  • Electrical and Electronic Engineering
  • Human-Computer Interaction
  • Computer Science Applications
  • Signal Processing

Fingerprint

Dive into the research topics of 'Quality-based distance measures and applications to clustering'. Together they form a unique fingerprint.

Cite this