Replacing Values in Pandas
The replace() method in Pandas provides a versatile way to substitute values in a DataFrame or Series. This method can handle single values, lists, and even dictionaries for more complex replacements. This tutorial demonstrates the key use cases for replace().
Replacing Single Values
Replace a specific value in a column or across the entire DataFrame. Here’s an example:
import pandas as pd
# Create a sample DataFrame
data = {
"Name": ["Karthick", "Durai", "Praveen"],
"City": ["Chennai", "Coimbatore", "Chennai"]
}
df = pd.DataFrame(data)
# Replace 'Chennai' with 'Chennai Metro'
df["City"] = df["City"].replace("Chennai", "Chennai Metro")
print(df)
Output:
| Name | City |
|---|---|
| Karthick | Chennai Metro |
| Durai | Coimbatore |
| Praveen | Chennai Metro |
Explanation: The replace() method substitutes all occurrences of 'Chennai' with 'Chennai Metro' in the City column.
Replacing Multiple Values
Replace multiple values using a list or dictionary. Here’s an example:
# Replace multiple values using a dictionary
replacement_dict = {
"Chennai Metro": "Chennai",
"Coimbatore": "Kovai"
}
df["City"] = df["City"].replace(replacement_dict)
print(df)
Output:
| Name | City |
|---|---|
| Karthick | Chennai |
| Durai | Kovai |
| Praveen | Chennai |
Explanation: The dictionary in replace() maps values to their replacements, converting 'Chennai Metro' to 'Chennai' and 'Coimbatore' to 'Kovai'.
Replacing Missing Values
Use replace() to handle missing values (e.g., NaN) in a DataFrame. Here’s an example:
import numpy as np
# Add missing values
df.loc[1, "City"] = np.nan
# Replace NaN with 'Unknown'
df["City"] = df["City"].replace(np.nan, "Unknown")
print(df)
Output:
| Name | City |
|---|---|
| Karthick | Chennai |
| Durai | Unknown |
| Praveen | Chennai |
Explanation: The replace() method substitutes NaN values with 'Unknown' in the City column, ensuring no missing data remains.
Key Takeaways
- Versatility: The
replace()method handles single values, multiple values, and missing data efficiently. - Customization: Use dictionaries for complex replacements and lists for bulk replacements.
- Missing Data: Replace
NaNvalues to ensure clean and complete datasets. - Scalability:
replace()works seamlessly on large datasets with diverse replacement needs.