Computing Reviews, the leading online review service for computing literature.

Search

A rapid hybrid clustering algorithm for large volumes of high dimensional data
Rathore P., Kumar D., Bezdek J., Rajasegarar S., Palaniswami M. IEEE Transactions on Knowledge and Data Engineering31 (4):641-654,2019.Type:Article

Date Reviewed: Mar 10 2020

FensiVAT is a rapid hybrid clustering algorithm that identifies clusters in large datasets characterized by many instances (N) and multiple features (p) in each instance. FensiVAT is an improvement over popular algorithms based on random sampling, such as clustering large applications (CLARA) using k-means, clustering using representatives (CURE), and clustering with improved visual assessment of tendency (clusiVAT), or using dimensionality reduction by projecting data on a lower dimension space, such as CLIQUE and PROCLUS. These approaches suffer from space and/or time complexity issues. FensiVAT integrates techniques for random projection and the visual assessment of cluster tendency by random sampling matrices, obtained by random projection of the dataset in a lower dimension space and aggregating multiple distances using principal component analysis (PCA) and linear discriminant analysis (LDA), called maximin and random sampling (MMRS). The authors’ ten-step algorithm includes input, dataset generation in downspace, near-MMRS sampling, reduced image (iVAT) generation, application of VAT/iVAT to distance matrices, clustering, and extension in down-space. They apply FensiVAT in the analysis of US Census 1990, KDD CUP, FOREST, MiniBoone, MNIST, and ACT datasets. FensiVAT is an order of magnitude faster than clusiVAT and several orders of magnitude faster than the other approaches without compromising accuracy. This well-written paper has 55 references and will interest the big data community.

Reviewer: Anoop Malaviya	Review #: CR146925 (2007-0171)

Clustering (H.3.3 ... )

Would you recommend this review?

yes

Other reviews under "Clustering":	Date

Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases Can F. (ed), Ozkarahan E. ACM Transactions on Database Systems 15(3): 483-517, 1990. Type: Article	Dec 1 1992

A parallel algorithm for record clustering Omiecinski E., Scheuermann P. ACM Transactions on Database Systems 15(3): 599-624, 1990. Type: Article	Nov 1 1992

Organization of clustered files for consecutive retrieval Deogun J., Raghavan V., Tsou T. ACM Transactions on Database Systems 9(4): 646-671, 1984. Type: Article	Jun 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy