Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Social media data processing infrastructure by using Apache Spark big data platform: Twitter data analysis
Podhoranyi M., Vojacek L.  CCIOT 2019 (Proceedings of the 2019 4th International Conference on Cloud Computing and Internet of Things, Tokyo, Japan,  Sep 20-22, 2019) 1-6. 2019. Type: Proceedings
Date Reviewed: Mar 17 2021

What would we do without social media? What would the world look like if there weren’t continuous data streams? If we refer back to our history, the first big breakthrough was around 1440, when Johannes Gutenberg started his printing technology. This was the beginning of information spreading around the world. At first, this was done very slowly, but then in the early 17th century (from 1605), a new and interesting idea appeared: the newspaper. Information could now spread more quickly. And when Gauss, Weber, Wheatstone, and Morse introduced their various telegraphs, it was clear that information spreads even faster with a pair of wires. After the discoveries of Tesla and Marconi, with their new invention called the radio, the speed of information spreading reached almost the speed of light. However, this communication was rather one-directional, and its flow was very limited: the reader/listener had no or limited opportunity to create/respond and quickly spread news. As an example: the citizens band (CB) radio was not so popular. Still, all of these inventions were only a prelude to what we have today.

An increase in information flow intensity has been observed since the 1970s, when it became clear that the idea of a global market was not a dream but a reality. Starting from Sydney, through Tokyo, Bombay, Frankfurt, Paris, and London, and reaching the New York Stock Exchange, the markets worked almost the whole day with constant data flow about changes in share prices. It was only a matter of time before this situation became the norm, though in different dimensions.

This is possible thanks to the Internet: one of its applications--social media--has taken over the world, generating flows of information. Different social media services generate data streams of “information with different levels of sensitivity, validity, and accuracy.” The main contribution of this paper is an architecture that is able to process Twitter’s data streams.

The authors propose a five-component system: (1) data ingestion based on Apache Flume; (2) data storage on the Hadoop Distributed File System (HDFS), where tweets are broken into separate blocks and distributed to nodes; (3) a data warehouse: Apache HIVE with HiveQL to store data in the form of a table for further analysis; (4) a resource manager for job scheduling with yet another resource negotiator (YARN); and (5) the SPARK processing engine.

Data from Twitter is very easily available with application programming interface (API) access. As an experiment, the authors apply the word frequency method (n-grams) to two datasets: 1,000 tweets with the keyword “flood” (completed on April 10, 2018) and 10,000 tweets with the keyword “flood” (completed on April 25, 2018). The proposed architecture works very well “to uncover the content ... in the tweets.”

It should be noted that the processing of social media data is not trivial, but a novel attempt to show how the Twitter data stream can be processed by the Apache Spark big data platform.

Reviewer:  Dominik Strzalka Review #: CR147218 (2106-0160)
Bookmark and Share
  Featured Reviewer  
Social Networking (H.3.4 ... )
Architectures (H.5.4 ... )
Other Architecture Styles (C.1.3 )
General (C.0 )
Would you recommend this review?
Other reviews under "Social Networking": Date
Detection and resolution of rumours in social media: a survey
Zubiaga A., Aker A., Bontcheva K., Liakata M., Procter R.  ACM Computing Surveys 51(2): 1-36, 2018. Type: Article
Mar 30 2022
Detecting fake news on social media
Shu K., Liu H.,  Morgan&Claypool Publishers, San Rafael, CA, 2019. 130 pp. Type: Book (978-1-681735-82-5)
Mar 3 2020
Epistemology in the era of fake news: an exploration of information verification behaviors among social networking site users
Torres R., Gerhart N., Negahban A.  ACM SIGMIS Database 49(3): 78-97, 2018. Type: Article
Mar 2 2020

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2022 ThinkLoud, Inc.
Terms of Use
| Privacy Policy