Finding frequently occurring patterns of transactions in transactional databases, such as products that customers often purchase together, can be extremely useful in identifying bestselling products and co-promoting products. However, “frequency is not always the best measure to find interesting patterns” since it treats all transactions the same. For instance, bread and milk may be frequently sold together, but the quantity of bread and milk sold in each transaction is not taken into account. Also, frequently purchased products may not be as profitable as those less frequently purchased. For instance, bottles of wine and caviar may be less frequently sold than loaves of bread and gallons of milk, but may generate higher profits.
High-utility pattern or itemset (HUI) mining generalizes the problem of frequent pattern mining, identifying patterns or sets of items that meet certain criteria, for example, frequently sold together or generate high profit. A criterion such as “frequently sold together” or “generating a high profit” is called utility and is specified by a decision maker indicating relative importance of a dataset within a given database. When specifying a utility such as “generating a high profit,” a decision maker also needs to provide a utility threshold quantifying what it means to generate “high profit” (for instance, is a $10 profit high?). This book provides comprehensive treatment of the research on, and advances in, HUI mining.
The book consists of 12 chapters. The first chapter begins with a survey of the HUI mining field: the HUI mining problem; the main techniques employed by HUI mining algorithms for exploring the search space of sets of items in a database; extensions to these algorithms for overcoming limitations such as handling dynamic databases; and research opportunities related to the application of pattern mining algorithms, enhancing their performance, and extending pattern mining to include more complex data and patterns. Additional chapters (4, 5, 6, 9, 10, and 11) describe the HUI-Miner, HUI-Miner*, and CHUI-Mine algorithms, which are “several orders of magnitude faster” and consume significantly less memory than state-of-the-art HUI mining algorithms, and metaheuristics-based methods that deal with efficient ways of searching “very large search spaces to find near-optimal solutions in reasonable time.” Visual analytics is also presented, in chapter 12, as an additional aid in this decision-making process.
While HUI mining allows a decision maker to incorporate his or her notion of utility into the pattern mining process, choosing a minimum utility threshold value can be a challenging task. The top-k high-utility itemset (THUI) mining problem is a variant of the HUI problem that allows a decision maker to specify a desired number of HUIs rather than HUIs that meet a certain minimum threshold. Chapter 2 systematically reviews and analyzes THUI mining methods, comparing their performance and outlining future research opportunities.
HUI mining algorithms have issues with scalability when dealing with big data and when identifying high-utility sequences in quantitative databases where the sequential ordering of itemsets is important (for instance, “sequences of purchases made by customers over a long period of time”). Chapter 3 covers HUI pattern mining for big data; recent advances in parallel and scalable HUI pattern mining; and open problems and future directions. It also discusses the problem of high-utility sequence (HUS) mining, techniques for pruning HUS search space, HUS mining algorithms, and extensions to overcome some of their limitations.
Besides the utility or relative importance of patterns, there is also the regularity/irregularity of these patterns. High-utility regular itemset (HURI) mining algorithms are used for discovering sets of items that have high utility and occur regularly in a database, for instance, customers buying highly profitable sets of products regularly. Chapter 7 discusses high-utility irregular itemset (HUII) mining, an alternative for identifying high-utility sets of items with irregular occurrence, for instance, sets of products that yield high profits even if customers do not purchase them regularly.
When mining data, however, one needs to be mindful of how sensitive and important information is handled. Chapter 8 provides a comprehensive survey of privacy-preserving utility mining concepts, techniques, and algorithms, which deal with hiding sensitive information; challenges and research opportunities are also covered.
This book offers a comprehensive treatment of HUI mining. Researchers will find it invaluable not only for understanding the state of the art, but also for gaining new insights into additional research opportunities. Besides market basket analysis, HUI mining has also been applied to website clickstream analysis, stock market analysis, and bioinformatics. Academics, graduate students, and practitioners interested in HUI mining applications will find this book to be a great resource and can experiment with the algorithms using the SPMF open-source data mining software (http://www.philippe-fournier-viger.com/spmf).