Extracting Data with Ease: How to Get the Last N Rows in a pandas DataFrame (Python)
Methods to Extract Last N Rows:
There are two primary methods to achieve this in pandas:
- tail() method: This is the most straightforward approach. It takes an optional argument
n
(number of rows) and returns the lastn
rows of the DataFrame.
import pandas as pd
# Sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
# Get the last 2 rows
last_two_rows = df.tail(2)
print(last_two_rows)
- Slicing with iloc: This method offers more flexibility for integer-based indexing. You can use negative indexing to select rows from the end.
# Get the last 3 rows using iloc
last_three_rows = df.iloc[-3:] # Select rows from -3 (inclusive) to the end
print(last_three_rows)
Key Points:
- Both methods return a new DataFrame containing the last
n
rows. - If
n
is greater than the total number of rows, all rows are returned. tail()
is generally preferred for readability, whileiloc
provides more control over indexing.
Additional Considerations:
- Error Handling: If
n
is negative,tail()
raises aValueError
. You might want to add checks or handle the exception appropriately. - Resetting Index (Optional): The resulting DataFrame might have non-sequential indices (e.g., 3, 4, 2). To reset the index to start from 0, use:
last_two_rows = df.tail(2).reset_index(drop=True) # Drop the old index
By understanding these methods, you can effectively extract the last N rows of data from your pandas DataFrames for further analysis or manipulation.
Example 1: Using tail() with Error Handling
import pandas as pd
# Sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
def get_last_n_rows(df, n):
"""
Safely retrieves the last n rows of a DataFrame using tail().
Args:
df (pandas.DataFrame): The DataFrame to extract from.
n (int): The number of rows to get (can be negative or zero).
Returns:
pandas.DataFrame: The last n rows of the DataFrame.
"""
try:
return df.tail(n)
except ValueError: # Handle negative n or n exceeding total rows
if n < 0:
print("n cannot be negative. Returning all rows.")
return df.copy() # Return a copy to avoid modifying original
else:
print(f"n ({n}) exceeds total number of rows ({len(df)}). Returning all rows.")
return df.copy()
# Get the last 3 rows (even if n is negative or too large)
last_three_rows = get_last_n_rows(df.copy(), -2) # Pass a copy to avoid modifying original
print(last_three_rows)
Example 2: Using iloc with Resetting Index
import pandas as pd
# Sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
# Get the last 2 rows using iloc and reset the index
last_two_rows = df.iloc[-2:].reset_index(drop=True)
print(last_two_rows)
These examples demonstrate how to handle potential errors and customize the output according to your needs.
Using query (for Conditional Selection):
If you need to filter the last N rows based on a specific condition, you can combine query
and boolean indexing:
import pandas as pd
# Sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
# Get the last 2 rows where col1 is greater than 2
last_two_filtered = df.query("col1 > 2").tail(2)
print(last_two_filtered)
This approach allows you to retrieve the last N rows that meet a certain criteria.
Using List Comprehension (Less Efficient):
For smaller DataFrames, you can use list comprehension to create a new list containing the last N rows and convert it back to a DataFrame:
import pandas as pd
# Sample DataFrame
data = {'col1': [1, 2, 3, 4, 5], 'col2': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
# Get the last 3 rows using list comprehension
n = 3
last_three_rows = pd.DataFrame(df.iloc[-n:]) # Create a new DataFrame from the list
print(last_three_rows)
Important Note: This method is generally less efficient for larger DataFrames as it involves creating a temporary list. It's recommended to stick with tail()
or iloc
for most cases.
Choose the method that best suits your specific needs based on readability, efficiency, and whether conditional selection is required.
python pandas dataframe