TY - GEN
T1 - Quality-based distance measures and applications to clustering
AU - Taverna, Darin M.
AU - Brun, Marcel
AU - Dougherty, Edward R.
AU - Chen, Yidong
PY - 2006
Y1 - 2006
N2 - When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice of clustering algorithm. Furthermore, the biological relevance of the clustering results is biased by the variation among the data points themselves. We introduce a mathematical quality-based distance metric which will allow all data, regardless of its error, to be included in analysis without the need to introduce a cutoff. This removes the need to exclude points or to change the dimensionality. The advantage of this approach is shown by clustering simulated data with added noise.
AB - When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice of clustering algorithm. Furthermore, the biological relevance of the clustering results is biased by the variation among the data points themselves. We introduce a mathematical quality-based distance metric which will allow all data, regardless of its error, to be included in analysis without the need to introduce a cutoff. This removes the need to exclude points or to change the dimensionality. The advantage of this approach is shown by clustering simulated data with added noise.
UR - http://www.scopus.com/inward/record.url?scp=42749104065&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=42749104065&partnerID=8YFLogxK
U2 - 10.1109/LSSA.2006.250390
DO - 10.1109/LSSA.2006.250390
M3 - Conference contribution
AN - SCOPUS:42749104065
SN - 1424402786
SN - 9781424402786
T3 - 2006 IEEE/NLM Life Science Systems and Applications Workshop, LiSA 2006
BT - 2006 IEEE/NLM Life Science Systems and Applications Workshop, LiSA 2006
T2 - 2006 IEEE/NLM Life Science Systems and Applications Workshop, LiSA 2006
Y2 - 13 July 2006 through 14 July 2006
ER -