Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Scalable and efficient data analytics and mining with Lemonade
dos Santos W., Avelar G., Ribeiro M., Guedes D., Meira Jr. W. Proceedings of the VLDB Endowment11 (12):2070-2073,2018.Type:Article
Date Reviewed: May 21 2019

It is unreasonable to assume that data analysis experts are adept at computer programming. However, many tools that a data analyst may want to use require significant programming skills. Moreover, with the amount of information available to be analyzed reaching gazillions of bytes, processing power and security are major concerns. Lemonade is a new platform that fills these gaps by providing a web interface for mining nontrivial amounts of data over the cloud. If you stumbled upon Lemonade while looking for cost-effective data analysis tools, and your programming skills are dwarfed by your data’s massive scale and sensitivity, this paper will serve you well.

Popular tools fall short in one aspect or another. While Apache Hadoop, Apache Spark, and COMPSs require a reasonable amount of programming knowledge, visual data flow tools like RapidMiner, Orange, and KNIME fall short when it comes to processing data at scale. Web-based tools charge exorbitant fees. Though Microsoft Azure Machine Learning Studio (MAML) and CrowdFlows come close, the former charges heavy licensing fees and the latter lacks quality of service (QoS) and authentication, authorization, and accounting (AAA) features. This paper briefly describes the functionality of seven components comprising the Lemonade platform, which integrate together to provide this elegant feature set. It also provides a demo account use case on classifying fake news from regular news items.

The platform’s demo flow, which fact-checking agencies may find useful, consists of two parts: data processing followed by model creation and validation. The demo also allows users to plug in additional boxes to the flow or modify the execution parameters to visualize the impacts of these changes. The demo uses Kaggle’s “Getting Real about Fake News” dataset mixed with sources from real news pieces and fake news--text and metadata scraped from 244 websites tagged by Chrome’s BS Detector extension. The main features of the news piece used in the classifier are the text of the body and its title.

Lemonade’s idea is to be accessible to users with varying skills in distributed processing, allowing nonexperts to perform data analytics tasks in a scalable and secure setting. The platform was used successfully on several projects in Brazilian institutions, confirming that domain experts could use the platform for extracting insight from large volumes of sensitive data.

Reviewer:  Subash Tirupachur Comerica Review #: CR146575 (1908-0313)
Bookmark and Share
 
Data Mining (H.2.8 ... )
 
 
Security and Protection (D.4.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy