Pivot Tables
Pivot tables are a powerful tool for summarizing and analyzing data. In Pandas, the pivot_table() method allows you to group data and calculate aggregated values in a tabular format. This tutorial demonstrates how to create and customize pivot tables for various use cases.
Creating a Pivot Table
To create a pivot table, specify the index, columns, and values parameters. The default aggregation is the mean, but you can specify other functions using the aggfunc parameter. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {
"City": ["Chennai", "Chennai", "Madurai", "Madurai", "Trichy"],
"Month": ["Jan", "Feb", "Jan", "Feb", "Jan"],
"Sales": [200, 250, 150, 180, 90]
}
df = pd.DataFrame(data)
# Create a pivot table
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc="sum")
print(pivot)
Output
| Month | Jan | Feb |
|---|---|---|
| Chennai | 200 | 250 |
| Madurai | 150 | 180 |
| Trichy | 90 | NaN |
Explanation: The pivot_table() method groups data by the City column and aggregates sales for each Month. Missing values are displayed as NaN.
Customizing Aggregation
You can use different aggregation functions like sum, mean, or count to customize the pivot table. Here’s an example:
# Use multiple aggregation functions
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc=["sum", "mean"])
print(pivot)
Output
| aggfunc | sum | mean | ||
|---|---|---|---|---|
| Month | Jan | Feb | Jan | Feb |
| Chennai | 200 | 250 | 200.0 | 250.0 |
| Madurai | 150 | 180 | 150.0 | 180.0 |
| Trichy | 90 | NaN | 90.0 | NaN |
Explanation: By using the aggfunc parameter, you can apply multiple aggregation functions to the pivot table. In this example, both sum and mean are calculated for sales grouped by city and month.
Filling Missing Values
Missing values in pivot tables can be replaced using the fill_value parameter. Here’s an example:
# Replace missing values with 0
pivot = pd.pivot_table(df, index="City", columns="Month", values="Sales", aggfunc="sum", fill_value=0)
print(pivot)
Output
| Month | Jan | Feb |
|---|---|---|
| Chennai | 200 | 250 |
| Madurai | 150 | 180 |
| Trichy | 90 | 0 |
Explanation: The fill_value parameter replaces missing values in the pivot table with the specified value, such as 0 in this example.
Key Takeaways
- Pivot Tables: Use
pivot_table()to group and summarize data. - Custom Aggregations: Apply custom or multiple aggregation functions using
aggfunc. - Handle Missing Data: Replace missing values with
fill_value. - Insights: Pivot tables simplify data analysis by organizing data into a readable format.