How to Get the Row Count of a Pandas DataFrame in Python
- Using the len() function: This is the simplest way to get the row count. The
len()
function works on many sequence-like objects in Python, including DataFrames. Here's an example:
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'col1':[1,2,3], 'col2':['a','b','c']})
# Get the number of rows of the dataframe
number_of_rows = len(df)
# Print the number of rows
print(number_of_rows)
This code will output:
3
- Using the shape attribute: The
shape
attribute of a DataFrame is a tuple that holds the dimensions of the DataFrame, which includes the number of rows and columns. You can access the number of rows by indexing the tuple at position 0. Here's an example:
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'col1':[1,2,3], 'col2':['a','b','c']})
# Get the number of rows of the dataframe
number_of_rows = df.shape[0]
# Print the number of rows
print(number_of_rows)
3
Using len()
is generally more concise for getting the row count, but shape
can be useful if you also need the column count of the DataFrame.
Example 1: Using the len() function
import pandas as pd
# Create a sample DataFrame
data = {'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']}
df = pd.DataFrame(data)
# Get the number of rows
number_of_rows = len(df)
# Print the number of rows
print("Number of rows using len():", number_of_rows)
Number of rows using len(): 3
Explanation:
- Import pandas: This line imports the pandas library, which provides data structures and tools for data analysis.
- Create DataFrame: We create a DataFrame named
df
with two columns (col1
andcol2
) and three rows of data. - Get row count: The
len(df)
function calculates the length of the DataFrame, which is the number of rows. - Print result: We print the number of rows along with a clear message for better readability.
Example 2: Using the shape attribute
import pandas as pd
# Create a sample DataFrame (same as Example 1)
data = {'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']}
df = pd.DataFrame(data)
# Get the number of rows (accessing the first element of the shape tuple)
number_of_rows = df.shape[0]
# Print the number of rows
print("Number of rows using shape:", number_of_rows)
Number of rows using shape: 3
- Import pandas: Similar to Example 1.
- Create DataFrame: We reuse the DataFrame created earlier for consistency.
- Get row count: The
df.shape
attribute returns a tuple containing the number of rows (at index 0) and columns (at index 1). Here, we extract the number of rows usingdf.shape[0]
.
These examples demonstrate two effective ways to retrieve the row count of a Pandas DataFrame in Python. You can choose the method that best suits your coding style and preferences.
Using df[column_name].count() (Not Recommended):
This method seems like it should work, but it has a caveat. It counts the number of non-null (not missing) values in a specific column. If you have missing values (NaNs) in your DataFrame, this won't accurately reflect the actual row count. Here's an example:
import pandas as pd
import numpy as np # for creating NaNs
# Create a sample DataFrame with NaNs
data = {'col1': [1, 2, 3, np.nan], 'col2': ['a', 'b', 'c', None]}
df = pd.DataFrame(data)
# Get (inaccurate) row count using a column with NaNs
number_of_rows = df['col1'].count()
# Print the (potentially incorrect) number of rows
print("Number of rows using df[column].count() (might be inaccurate):", number_of_rows)
Number of rows using df[column].count() (might be inaccurate): 3
As you can see, it only counts 3 rows because of the NaN in the first column. This can be misleading if you actually have 4 rows in your DataFrame.
Looping through the DataFrame (Very Inefficient):
This approach is highly discouraged due to its inefficiency. It iterates through each row of the DataFrame, counting them one by one. This can be slow for large DataFrames. Here's an example (avoid using this in practice):
import pandas as pd
# Create a sample DataFrame
data = {'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']}
df = pd.DataFrame(data)
# Initialize a counter
number_of_rows = 0
# Loop through each row and increment the counter
for _ in df.iterrows(): # Using iterrows() to avoid unnecessary data access
number_of_rows += 1
# Print the number of rows
print("Number of rows using loop (inefficient):", number_of_rows)
While this code technically works, it's much slower than the other methods, especially for large datasets.
In conclusion, stick to len(df.index)
or df.shape[0]
for getting the row count of a DataFrame. They are efficient, accurate, and the recommended approaches.
python pandas dataframe