Looping Over Rows in Pandas DataFrames: A Guide

2024-06-24

Using iterrows():

This is the most common method. It iterates through each row of the DataFrame and returns a tuple containing two elements:

  • The index of the row
  • A Pandas Series object containing the values for that row

Here's an example:

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)

for index, row in df.iterrows():
  print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}")

Using positional indexing (iloc):

You can use positional indexing (iloc) to access rows by their position (zero-based indexing). This method is efficient if you know the specific rows you need to access.

for i in range(len(df)):
  row = df.iloc[i]
  # Access elements like with iterrows()
  print(f"Index: {i}, Name: {row['Name']}, Age: {row['Age']}")

Similar to positional indexing, you can use label-based indexing (loc) to access rows by their index labels.

for index in df.index:
  row = df.loc[index]
  # Access elements like with iterrows()
  print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}")

Choosing the right method:

  • Use iterrows() for general iteration where you need both the index and the row values.
  • Use iloc if you know the specific row positions you need to access for efficiency.
  • Use loc if you want to access rows by their index labels.

Additional methods:

  • itertuples(): This method returns namedtuples for each row, which can be useful if you prefer attribute-based access.
  • apply(): This method allows you to apply a function to each row of the DataFrame.

Remember, iterating row-by-row might not always be the most efficient approach. Consider vectorized operations using Pandas functionalities whenever possible.




import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)

for index, row in df.iterrows():
  print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}")

This code iterates through each row in the DataFrame df. The iterrows() method returns a tuple for each row, where:

  • index is the index label of the row (e.g., 0, 1, 2 in this case)

Inside the loop, we access the name and age values using row['Name'] and row['Age'].

for i in range(len(df)):
  row = df.iloc[i]
  # Access elements like with iterrows()
  print(f"Index: {i}, Name: {row['Name']}, Age: {row['Age']}")

This code iterates through the DataFrame using a loop that goes from 0 to the length of the DataFrame minus 1 (to account for zero-based indexing). Inside the loop:

  • i represents the position of the row (0, 1, 2)
  • row is obtained using df.iloc[i], which retrieves the row at position i
  • We then access elements like name and age similar to iterrows().
for index in df.index:
  row = df.loc[index]
  # Access elements like with iterrows()
  print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}")
  • index represents the index label itself (e.g., 0, 1, 2)



This method iterates over the DataFrame rows and returns namedtuples instead of Series objects. Namedtuples are similar to dictionaries but offer attribute-based access, which can be convenient for some use cases.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)

for row in df.itertuples(index=True, name='Pandas'):  # Specify index and name
  print(f"Index: {row.Index}, Name: {row.Name}, Age: {row.Age}")

Use this method when:

  • You prefer attribute-based access (e.g., row.Name instead of row['Name']).
  • Working with large DataFrames where memory efficiency might be a concern (namedtuples are generally lighter weight than Series objects).

Using apply():

This method allows you to apply a custom function to each row of the DataFrame. The function you define operates on each row individually.

def process_row(row):
  # Your custom logic to process each row (e.g., calculations)
  return f"Name: {row['Name']}, Age: {row['Age'] * 2}"

result = df.apply(process_row, axis=1)  # Apply function to each row (axis=1)
print(result)
  • You need to perform specific calculations or transformations on each row.
  • You want a more concise way to express row-wise operations compared to traditional loops.

Important Note:

While iterating over rows can be useful, it's generally recommended to leverage vectorized operations offered by Pandas whenever possible. Vectorized operations are often more efficient, especially for large DataFrames. These operations work on entire columns or the whole DataFrame at once, avoiding the need for explicit loops.


python pandas dataframe


Python Power Tools: Transposing Matrices with zip and List Comprehension

Understanding zip function:zip accepts multiple iterables (like lists, tuples) and combines their elements into tuples.For lists of unequal length...


"Is None" vs. "== None": A Beginner's Guide to Python Identity and Equality

Identity (is):foo is None checks if the object foo is the exact same object as the special value None.Think of it like asking "are these two pointers pointing to the same memory location?"...


Building Many-to-Many Relationships with SQLAlchemy in Python

Many-to-Many RelationshipsIn relational databases, a many-to-many relationship exists when a single record in one table can be associated with multiple records in another table...


Building Dictionaries with Pandas: Key-Value Pairs from DataFrames

Understanding the Task:You have a pandas DataFrame, which is a powerful data structure in Python for tabular data analysis...


Demystifying Group By in Python: When to Use pandas and Alternatives

Group By in PythonWhile NumPy itself doesn't have a built-in groupBy function, Python offers the pandas library, which excels at data manipulation and analysis tasks like grouping...


python pandas dataframe