Keeping Your Data Clean: Deleting Rows in Pandas DataFrames
Libraries:
- pandas: This is a Python library specifically designed for data analysis and manipulation. It offers powerful tools for working with DataFrames, which are essentially two-dimensional tables with labeled axes.
Process:
- Import pandas: You'll need to import the pandas library at the beginning of your Python script. Here's how to do it:
import pandas as pd
Delete Rows: Here's how you can delete the first three rows:
- Using iloc: The
iloc
property allows you to access rows and columns of a DataFrame by position. To delete the first three rows, you can use slicing notation withiloc
:
df = df.iloc[3:] # This selects rows from index 3 (fourth row) onwards
Explanation:
df
: This refers to your existing DataFrame..iloc[3:]
: This part selects rows. The3
specifies the starting index (fourth row) and the colon:
indicates all rows after that (excluding the first three).
- Using iloc: The
Important Note:
- The
iloc
method modifies the original DataFrame by default. If you want to create a new DataFrame without modifying the original one, you can assign the result ofdf.iloc[3:]
to a new variable.
Example:
import pandas as pd
# Create a sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['A', 'B', 'C', 'D', 'E']}
df = pd.DataFrame(data)
# Print the original DataFrame
print(df)
# Delete the first three rows
df_modified = df.iloc[3:]
# Print the DataFrame after deleting rows
print(df_modified)
This code will first create a DataFrame with two columns, then delete the first three rows using iloc
, and finally print both the original and modified DataFrames.
import pandas as pd
# Create a sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['A', 'B', 'C', 'D', 'E']}
df = pd.DataFrame(data)
# Print the original DataFrame
print("Original DataFrame:")
print(df)
# Delete the first three rows (Option 1: Modifying the original DataFrame)
df = df.iloc[3:] # This modifies the original DataFrame
# Print the DataFrame after deleting rows (using the modified DataFrame)
print("\nDataFrame after deleting first 3 rows (modifying original):")
print(df)
# Option 2: Creating a new DataFrame without modifying the original
df_new = df.iloc[3:] # Create a new DataFrame without modifying the original
# Print the DataFrame after deleting rows (using the new DataFrame)
print("\nDataFrame after deleting first 3 rows (new DataFrame):")
print(df_new)
- Import pandas: We import the pandas library as
pd
. - Create DataFrame: We create a sample DataFrame
df
with two columns (col1
andcol2
). - Print Original DataFrame: We print the original DataFrame to see the initial data.
- Delete Rows (Option 1):
- Delete Rows (Option 2):
- Print New DataFrame: We print the newly created
df_new
to demonstrate how you can work with the modified data without affecting the original DataFrame.
This code provides two options: modifying the original DataFrame and creating a new one. You can choose the approach that best suits your needs.
Using .drop() with boolean indexing:
import pandas as pd
# Create a sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['A', 'B', 'C', 'D', 'E']}
df = pd.DataFrame(data)
# Delete the first three rows using boolean indexing
df_filtered = df[~df.index.isin(range(3))] # Select rows where index is not in range(3)
# Print the DataFrame after deleting rows
print(df_filtered)
- We create a boolean mask using
~df.index.isin(range(3))
. This creates a Series whereTrue
indicates rows to keep (not in the first three indices). - We use this mask with
.drop()
to filter the DataFrame and keep only the desired rows.
Using .head() (for illustrative purposes):
# This method only keeps the first N rows, not for deleting
df_kept = df.head(n=len(df) - 3) # Keep all rows except the first three
Note:
.head()
is typically used to retrieve the first N rows, not for deleting. However, it can be helpful for illustrative purposes to understand how to keep a specific number of rows from the end.
Remember to choose the method that best suits your needs and coding style.
python pandas