Space structure and clustering of categorical data.

Authors: Yuhua Qian, Feijiang Li, Jiye Liang, Bing Liu, Chuangyin Dang

Abstract:

Authors:Yuhua Qian, Feijiang Li, Jiye Liang, Bing Liu, Chuangyin Dang
Abstract:Learning from categorical data plays a fundamental role in such areas as pattern recognition, machine learning, data mining, and knowledge discovery. To effectively discover the group structure inherent in a set of categorical objects, many categorical clustering algorithms have been developed in the literature, among which k-modes-type algorithms are very representative because of their good performance. Nevertheless, there is still much room for improving their clustering performance in comparison with the clustering algorithms for the numeric data. This may arise from the fact that the categorical data lack a clear space structure as that of the numeric data. To address this issue, we propose, in this paper, a novel data-representation scheme for the categorical data, which maps a set of categorical objects into a Euclidean space. Based on the data-representation scheme, a general framework for space structure based categorical clustering algorithms (SBC) is designed. This framework together with the applications of two kinds of dissimilarities leads two versions of the SBC-type algorithms. To verify the performance of the SBC-type algorithms, we employ as references four representative algorithms of the k-modes-type algorithms. Experiments show that the proposed SBC-type algorithms significantly outperform the k-modes-type algorithms. Index Terms—Categorical data, clustering, dissimilarity, k-modes-type algorithms, space structure.

Index Terms— Categorical data, clustering, dissimilarity, k-modes-type algorithms, space structure.
Space structure and clustering of categorical data.

 

Keywords:

space structure and clustering of categorical data.pdf

Sat Jul 02 00:00:00 CST 2016