Computing Reviews, the leading online review service for computing literature.

Search

Regularized non-negative matrix factorization for identifying differentially expressed genes and clustering samples: a survey
Liu J., Wang D., Gao Y., Zheng C., Xu Y., Yu J. IEEE/ACM Transactions on Computational Biology and Bioinformatics15 (3):974-987,2018.Type:Article

Date Reviewed: Apr 5 2019

This paper surveys the application of non-negative matrix factorization (NMF), a well-known dimensionality reduction technique in bioinformatics “for identifying differentially expressed genes and clustering samples.” The idea behind NMF is to factorize a given matrix without negative elements into two matrices that also have no negative elements. The dimensions of the factor matrices can be significantly lower than the original one, which makes the method practically interesting. Yet another reason why NMF is popular is its applicability in data analysis--“in the real world, many data are always non-negative.” Given an m-by-n matrix X, the task of NMF is to identify a non-negative m-by-r basis matrix A and a non-negative r-by-n coefficient matrix Y such that their product approximates the given matrix: X ≈ AY. To characterize the quality of approximation, special error functions are introduced that depend on some kind of distance measure between matrices. NMF can then be formulated as an optimization problem, which asks to minimize the error of the approximation of X with AY. As the joint optimization task of Y and A, it is a non-convex problem. All current NMF methods “converge to only a local minimum.” The basic NMF algorithm provides the most simple and intuitive solution, but many practical applications require more efficient and elaborated versions. In the context of gene expressions, the rows of the given matrix X “correspond to expression levels of genes and the columns correspond to samples, and each entry corresponds to the expression level of a given gene in a given sample.” Hence, the rows of X “contain the expression levels of m genes in the n samples.” The rows of A define metagenes and the rows of Y represent the metapattern of the corresponding samples. The paper reviews some existing algorithms for NMF that are improvements of the basic one and are suitable for gene expressions and sample clustering. They are classified into three categories: sparse NMF, graph NMF, and generalized NMF. In gene processing, data may contain redundant information. Therefore, dense basis and coefficient matrices are not adequate. This is why the sparse NMF algorithms have been introduced. They incorporate constraints to obtain sparse solutions. Issues such as intuitiveness, efficiency, and robustness under sparsity constraints are addressed. Despite the advantages of sparse NMF methods, they do not perform well in discovering the “geometric and discriminating structure of the data space, which is ... useful for gene expression analysis.” Graph NME methods have been developed to address this problem. Future work includes the development of a method that can simultaneously address the geometric structure, discriminative power, robustness, and sparsity (that is, it combines the advantages of sparse and graph NMF methods). The generalized NMF methods are discussed in less detail than the previous ones, briefly mentioning their characterizations. The last part of the paper presents the results of the experiment the authors carried out on tumor datasets to analyze the performance of NMF and its extended variants. The goal was twofold: “to identify differentially expressed genes and to cluster samples.” The results show that the improved algorithms perform better than the basic NMF algorithm. Another experiment analyzes “the ColoRectal Cancer (CRC) data in the TCGA dataset.” The paper ends with a discussion of future work, mentioning in particular the need for global optimization techniques (since the current methods find only local optimal solutions); improved scalability; “a deep understanding of the clustering capability of NMF”; and applications in diverse areas. The intended audience of the survey seems to be experts. It is not quite suitable for nonspecialists who want to get a general overview of the field.

Reviewer: Temur Kutsia	Review #: CR146516 (1906-0253)

Biology And Genetics (J.3 ... )

Constrained Optimization (G.1.6 ... )

Data Mining (H.2.8 ... )

Pattern Matching (F.2.2 ... )

Would you recommend this review?

yes

Other reviews under "Biology And Genetics":	Date

Discovering the secrets of DNA Friedland P., Kedes L. Communications of the ACM 28(11): 1164-1186, 1985. Type: Article	May 1 1986

The formation of three-dimensional biological structures: computer uses and future needs Levinthal C. Computer culture: the scientific, intellectual, and social impact of the computer (, New York,1801984. Type: Proceedings	Sep 1 1986

Computer techniques in neuroanatomy Capowski J., Plenum Press, New York, NY, 1989. Type: Book (9789780306432637)	Nov 1 1990

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy