Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data wrangling with Python : tips and tools to make your life easier
Kazil J., Jarmul K., O’Reilly Media, Inc., 2016. 508 pp. Type: Book (978-1-491948-81-1)
Date Reviewed: Sep 22 2016

“Wrangle” is a new version of an old word that has evolved in some interesting and revealing ways. Originally, wrangle meant to argue or dispute. For example, you might wrangle with your neighbor over a property line. The meaning drifted over time so that one might wrangle something to turn it in a different direction or bring order to it. This is not incongruous with the original meaning, as wrangling with your neighbor might turn into a resolution. In the Old West, the term was applied to cattle drivers who would wrangle the herd, keeping it orderly. The meaning at this point drifted to where wrangling was a turning toward orderliness. Now, data wrangling is the process of converting messy, poorly structured data into more useful structured formats that facilitate analysis with software tools. One can see the parallels between turning a chaotic herd of cattle into an orderly herd and turning chaotic data into orderly data. Occasionally, things do seem to make sense. Data wrangling is becoming increasingly more popular as the bulk of data used in data analysis and data visualization efforts does not come in a structured format or does not come in a format suitable to the software tools available for the desired exploitation.

This book is about data wrangling using the Python programming language, and it may be the most appropriately titled book I have seen in a long time. The words are not in the title to sell books. Rather, they actually describe the content. It is definitely about data wrangling and all that this term entails, and it is about the Python libraries that can be used to facilitate the task of data wrangling.

After the initial pleasantries, the book gets underway with an introduction to Python programming. I stumbled a bit when I read this part. The authors state: “This book is definitely not for experienced Python programmers who already know which libraries and techniques to use for their data wrangling.” And yet, their introduction to the language is more like a review of key features than an introduction to the language. This was puzzling because somebody with no background in Python might struggle a bit trying to get through the book. Plenty of code examples are provided. But if a code example does not work for some reason, or hits a snag on a particular dataset, the inexperienced programmer might be at a loss as to what to do.

Nonetheless, after this brief introduction to Python, the focus turns to data wrangling. This part, the bulk of the book, includes finding data and parsing common data formats such as CSV, JSON, XML, or document formats such as Excel and PDF. After the data is parsed, the following chapters cover formatting or reformatting, cleaning up the data, and preparing it for data analysis or data visualization, which are also covered. It really is a soup to nuts treatment of data wrangling.

Along the way, more Python features and libraries are introduced as needed, and the intent of the authors becomes clearer. Earlier in the introduction, they say: “Most people do not master programming; instead they master the process of getting unstuck.” This reflects a growing position on programming. One does not need to master a language. They only need to know enough to get started and how to search online for answers when they get stuck. Since more people who do programming today are not full-time programmers but people using a programming language in the course of other work they are doing, this perspective makes sense.

For someone interested in using Python for data wrangling, this is the book to get. For someone interested in data wrangling in general, it is still the book to get as Python is one of the best languages for this. For someone interested in learning the Python programming language, there are better choices. Even though the authors state that the book is not for experienced Python programmers (with some qualifications), I would suggest that the book is a good choice for experienced Python programmers as well. The qualification really says that the book is not for you if you already know everything in the book. But this qualification goes without saying and experienced Python programmers will learn a lot about data wrangling from this book.

More reviews about this item: Amazon

Reviewer:  J. M. Artz Review #: CR144785 (1612-0865)
Bookmark and Share
  Featured Reviewer  
 
Python (D.3.2 ... )
 
 
General (D.1.0 )
 
 
Reference (A.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Python": Date
Practical Python
Hetland M., APress, LP, 2002.  648, Type: Book (9781590590065)
Mar 28 2003
Python programming: an introduction to computer science
Zelle J., Franklin B, 2003. Type: Book (9781887902991)
Dec 2 2004
Foundations of Python network programming
Goerzen J., APress, LP, Berkeley, CA, 2004.  512, Type: Book (9781590593714)
Dec 26 2004
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy