Computing Reviews

Discovering user behavioral features to enhance information search on big data
Cassavia N., Masciari E., Pulice C., Saccà D. ACM Transactions on Interactive Intelligent Systems7(2):1-33,2017.Type:Article
Date Reviewed: 11/10/17

Cassavia et al. reduce theory into practice in this paper. All too often, the disciplines we live in exist abstractly with theories or dense algorithms. The authors set a great example by showing how their modeling works in practice; it is an interesting perspective.

The context is large-scale social systems where much of the content is user originated and interactions (for example, sharing, posting, hearting, and so on) are also a trail of data. In this context, how might the search experience be adjusted to take into consideration search keywords, term clustering, and user influence? The authors take a perspective on this challenge, proposing a process, formulas, and architecture that support such a system.

One of the core aspects of their solution is their discovery layer, which includes: (1) clustering algorithms, (2) author influence filtering, and (3) data enrichment. One of the challenges in creating a more effective search experience is successfully using user-generated content as a seed for taxonomy and synonym development. The challenge is how much of what users write is the bag-of-words a system should enrich content with. Cassavia et al. propose that social influence is a reasonable filter. The assumption is that those with the most influence know the words and qualities best--essentially, the influencers are the experts.

While implementing the approach, the authors tackle the challenges of effectively assembling the various datasets, running them through a pipeline and adjusting the user experience based on these various models, formulas, and algorithms. The trend is to move to a real-time processing model as an approach to even things that are not time sensitive. The challenge that practitioners face is the best way to deal with varying data shapes and velocities. They describe a staging area where a more traditional extract, transform, load (ETL) path sets the data up for effective processing. It is here where things get interesting in that the clustering will change over time, from the new keywords employed by users to the new set of influencers. Enabling a platform to constantly update the content enrichment to stay true to emerging developments--volume (that is, popularity of a topic), language, and influencers. One of the troubles in allowing the community to drive the expertise is that it reflects the current version of community. Influence tends to carry inertia, which often suppresses the up and coming or just as important and underrepresented thinker.

Finally, the authors offer an end-to-end story of how to think of enriching search experiences with user-created artifacts; they one-up all of the theorists and formula and algorithm junkies and reduce their ideas to practice, helping blend more advanced concepts with practitioner concerns. Everyone interested in this domain, even those attracted to a singular aspect of it, will enjoy the work. Moreover, this work represents the opportunity to understand those aspects in a more complete context.

Reviewer:  Brian D. Goodman Review #: CR145652 (1801-0012)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy