A generalized multi-aspect distance metric for mixed-type data clustering

(2023) A generalized multi-aspect distance metric for mixed-type data clustering. Pattern Recognition. p. 12. ISSN 0031-3203

Full text not available from this repository.

Abstract

Distance calculation is straightforward when working with pure categorical or pure numerical data sets. Defining a unified distance to improve the clustering performance for a mixed data set composed of nom-inal, ordinal, and numerical attributes is very challenging due to the attributes' different natures. In this study, we proposed a new measure of distance for a mixed-type data set that regards inter-attribute in-formation and intra-attribute information depending on the type of attributes. In this regard, entropy and Jensen-Shannon divergence concepts were used to exploit the inter-attribute information of categorical -categorical and categorical-numerical attributes, respectively. Also, a modified version of Mahalanobis dis-tance was proposed to consider the intra-and inter-attribute information of numerical attributes. We also introduced a unified framework based on mutual information to control attributes' contribution to dis-tance measurement. The proposed distance in conjunction with spectral clustering was extensively eval-uated concerning various categorical, numerical, and mixed-type benchmark data sets, and the results demonstrated the efficacy of the proposed method.(c) 2023 Elsevier Ltd. All rights reserved.

Item Type: Article
Keywords: Clustering Mixed data Ordinal and nominal attribute Inter -dependency Intra-attribute information Mutual information association algorithm Computer Science Engineering
Page Range: p. 12
Journal or Publication Title: Pattern Recognition
Journal Index: ISI
Volume: 138
Identification Number: https://doi.org/10.1016/j.patcog.2023.109353
ISSN: 0031-3203
Depositing User: خانم ناهید ضیائی
URI: http://eprints.mui.ac.ir/id/eprint/26599

Actions (login required)

View Item View Item