A service provided by the WU Library and the WU IT-Services

Assessing and quantifying clusteredness: The OPTICS Cordillera

Rusch, Thomas and Hornik, Kurt and Mair, Patrick (2016) Assessing and quantifying clusteredness: The OPTICS Cordillera. Discussion Paper Series / Center for Empirical Research Methods, 2016/1. WU Vienna University of Economics and Business, Vienna.

[img]
Preview
PDF
Download (380Kb) | Preview

Abstract

Data representations in low dimensions such as results from unsupervised dimensionality reduction methods are often visually interpreted to find clusters of observations. To identify clusters the result must be appreciably clustered. This property of a result may be called "clusteredness". When judged visually, the appreciation of clusteredness is highly subjective. In this paper we suggest an objective way to assess clusteredness in data representations. We provide a definition of clusteredness that captures important aspects of a clustered appearance. We characterize these aspects and define the extremes rigorously. For this characterization of clusteredness we suggest an index to assess the degree of clusteredness, coined the OPTICS Cordillera. It makes only weak assumptions and is a property of the result, invariant for different partitionings or cluster assignments. We provide bounds and a normalization for the index, and prove that it represents the aspects of clusteredness. Our index is parsimonious with respect to mandatory parameters but also exible by allowing optional parameters to be tuned. The index can be used as a descriptive goodness-of-clusteredness statistic or to compare different results. For illustration we use a data set of handwritten digits which are very differently represented in two dimensions by various popular dimensionality reduction results. Empirically, observers had a hard time to visually judge the clusteredness in these representations but our index provides a clear and easy characterisation of the clusteredness of each result. (authors' abstract)

Item Type: Paper
Keywords: clusteredness / index / dimensionality reduction / clustering / unsupervised learning
Depositing User: ePub Administrator
Date Deposited: 07 Jan 2016 11:53
Last Modified: 19 Jan 2016 10:20
Related URLs:
FIDES Link: https://bach.wu.ac.at/d/research/results/74283/
URI: http://epub.wu.ac.at/id/eprint/4789

Actions

View Item