
Pandas (the name comes from the term “panel data”) is an open-source Python library for manipulating large datasets. Tasks include data cleansing, analysis, and visualization. Pandas extends Python with two new data types: Series and DataFrames. The DataFrame represents the rectangular data (row-column) whereas the Series represents a single column of the DataFrame.
Pandas was originally developed by Wes McKinney, who started working on it in 2008. It became open source in 2009. It was developed as a tool to clean-up, analyze, and present large data in a readable and relevant format. Pandas is currently a major data science library.
A quick skim through Chen’s Pandas for everyone displays a logical flow of information. It covers the major functionalities of the library by increasing complexity. The book is divided into five parts.
Part 1 (chapters 1 through 5) begins with how to import the Pandas library and explains the basic operations for loading and plotting data. Part 2 (chapters 6 through 8) covers data processing with Pandas, including assembly and normalization. Part 3 (chapters 9 through 12) discusses the different data types along with an introductory chapter on how missing data is represented in Pandas. Part 4 (chapters 13 through 18) follows up on the previous chapters on data manipulation using Pandas. It covers the different techniques for data modeling and serves as an introduction to machine learning. Part 5 is not a summary of the book, but rather a useful couple of chapters. The first chapter (19) enumerates the different libraries and frameworks that complement Pandas, and the second chapter (20) lists the most interesting resources for learning Pandas, including major conferences and podcasts.
Pandas for everyone is a comprehensive tutorial on the Pandas library. It is quite clear that the author made every effort to make it comprehensive. It begins with basic functionalities and progresses slowly on more complex uses of the library.
The writing style is quite direct with short definitions and concise arguments. There are no practice exercises per se, but each concept is explained with code examples. Compared to other similar titles on the market, it offers no additional features. However, the book is still a great guide to learning Pandas for Python programmers. For those already familiar with the library, it is also a useful reference.