Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Mining the social Web : analyzing data from Facebook, Twitter, LinkedIn, and other social media sites
Russell M., O’Reilly Media, Inc., Sebastopol, CA, 2011. 360 pp. Type: Book
Date Reviewed: Jan 18 2012

This nice introduction to social data mining methods is essentially a manual. It describes, step by step, tools and techniques for accessing data from social networks and extracting information from it. It is intended to be read while working on a computer connected to the Internet, since the content is full of example code and has many hyperlinks as references.

Another characteristic of the book is that it is open-source oriented, encouraging collaboration for extending and improving its examples. In general, it is easy to read, with several examples, and is informally and humorously written. In several cases, it provides ideas for combining techniques so that readers can pose their own questions and implement solutions. However, it is not a purely technical text. The author frequently discusses the role of social networking in everyday life and highlights the wealth of information that can be extracted from it. The basic idea is that the social Web is a graph of people, activities, events, and concepts, which is getting richer with social data.

The target audience is made up of data miners and computer scientists. As a prerequisite, readers should have experience with the programming language Python. However, readers who are not so advanced in programming but have at least a basic background can also use it. The book covers a wide range of topics at an introductory level, so each chapter is autonomous and can be read separately; however, chapters that cover the same social network should be studied together.

The material is organized into ten chapters. While this may seem like a small number, there is so much information in the hyperlinks that it takes quite a long time to explore each chapter’s various tools. (Since these very useful hyperlinks are hidden in the text, it would have been helpful to include a list of them for each chapter.)

Chapter 1 provides guidelines for installing Python, and then describes tools for the collection and manipulation of data from Twitter. It then presents lexical diversity and frequency analysis using the Natural Language Toolkit. Visualization tools address questions about the subject of current discussions and the extraction of relationships.

Chapter 2 discusses microformats, a set of simple, open-data formats built upon existing and widely adopted standards. There is a very interesting discussion about their role and a list of existing technologies. The rest of the chapter provides examples of how to use microformats such as Extensible Hypertext Markup Language (XHTML) Friends Network (XFN), Geo, and hRecipe.

Chapter 3 is an introduction to the analysis of mailboxes. It deals mainly with mbox files, the document-oriented database CouchDB, and how it can be used to manage and analyze data. Very interesting open-source tools for visualizing mail data are presented, such as the SIMILE timeline.

Chapters 4 and 5 focus again on Twitter. Specifically, chapter 4 presents several possible ways to analyze relationships; measure metrics and concepts such as similarity, influence, and friendship; discover cliques; and represent relationships visually. Chapter 5 focuses on the content of the discussions and the entities in the tweets. CouchDB is used for the collection of tweets, and there are several ideas and examples for answering questions and analyzing and visualizing tweets. There are also very interesting examples concerning the comparison of different databases and the visualization of community structures.

Chapter 6 discusses how LinkedIn differs from other social networks. It introduces basic clustering methodologies for professional network data. The visualization techniques and the use of geographical information are especially interesting here.

The subject of chapter 7 is more general: fundamental text mining theories and techniques. It is essentially a chapter that can be used in combination with any other chapter for the analysis of unstructured text. The chapter discusses term frequency-inverse document frequency (tf-idf), the vector space model, cosine similarity and the ways of visualizing similarity, and the concept of collocation.

Chapter 8 introduces natural language processing (NLP) as applied to unstructured data from blogs. Basic concepts are discussed, including end-of-sentence detection, tokenization, part-of-speech tagging, chunking, entity extraction, document summarization, and quality of analysis.

Chapter 9 is devoted to Facebook. It discusses its complex nature--the wealth of information available and its powerful application program interfaces (APIs)--and different ways of analyzing and visualizing data.

Finally, chapter 10 rather briefly presents the semantic Web, relevant models and languages (such as the resource description framework (RDF) and OWL), and the logical reasoning system FuXi.

In conclusion, this book for Python programmers offers a nice introduction to techniques and methods for mining data from social networks. Researchers and teachers will find it useful for assigning projects to students.

Reviewer:  Lefteris Angelis Review #: CR139785 (1206-0569)
Bookmark and Share
  Reviewer Selected
 
 
Web 2.0 (H.3.4 ... )
 
 
Data Mining (H.2.8 ... )
 
 
Social Networking (H.3.4 ... )
 
 
General (H.3.0 )
 
 
Data Storage Representations (E.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Web 2.0": Date
Social software and Web 2.0 technology trends
Deans P., Information Science Reference - Imprint of: IGI Publishing, Hershey, PA, 2008.  250, Type: Book (9781605661223)
Jun 30 2009
The social factor: innovate, ignite, and win through mass collaboration and social networking
Azua M., IBM Press, Upper Saddle River, NJ, 2009.  272, Type: Book (9780137018901)
Sep 30 2010
Social networking technology and the virtues
Vallor S. Ethics and Information Technology 12(2): 157-170, 2010. Type: Article
Oct 29 2010
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy