High-dimensional data analysis boils down to statistical regression, or selecting the variables that guarantee the most stable results across a wide variety of situations: as the number of variables increases, so do the possible outcomes, because more and more variables are involved and the interactions among them become unpredictable. Thus, the least absolute shrinkage and selection operator (LASSO) variable selection method lowers the contribution of many variables to the overall solution to almost negligible levels, which is particularly useful in most cases.
This paper performs a comparative study of the strengths and weaknesses of algorithms for high-dimensional data analysis. It analyzes five algorithms implementing the LASSO method: coordinate descent (CD), CD with active shooting, majorization-minimization using local
quadratic approximation (MM-LQA), alternating direction method of multipliers (ADMM), and the fast iterative shrinkage thresholding algorithm (FISTA). The paper highlights the factors affecting their performance and convergence toward stable results. It starts with their description, both in mathematical and pseudocode terms. Then it presents a method, developed by the authors, to compare their sensitivity; this method measures the performance of every algorithm in terms of number of iterations required to converge to a stable solution, its corresponding computation time, and the value of the objective function it converges to. The next step consists of presenting the results with detailed tables plotting the convergence of each individual algorithm, for different parameter settings and for different number of iterations; no clear winner emerges, as each algorithm performs best in different situations. The same methodology is then used when the algorithms are applied to a real-world scenario (cancer biomarker discovery), with comparable results presented with the same accuracy. A final table summarizes the strengths and weaknesses of each algorithm.
The paper is objective: it is a comparative study, thus it does not need to prove the merits of an algorithm over the others, and this constitutes its strength.