Example Codes for Logical Operators and Boolean Indexing in Pandas
Logical Operators for Boolean Indexing in Pandas
Boolean indexing is a powerful technique in Pandas for filtering dataframes based on specific conditions. It involves creating a boolean mask, which is a series of True and False values that correspond to each row in the dataframe. This mask is then used to select rows that meet the specified criteria.
Logical operators are essential for constructing these boolean masks. Here are the primary ones:
and:
- Usage:
condition1 & condition2
- Purpose: Returns True if both
condition1
andcondition2
are True.
or:
not:
- Usage:
~condition
- Purpose: Reverses the boolean value of
condition
. Ifcondition
is True,~condition
is False, and vice versa.
Example:
import pandas as pd
# Create a sample dataframe
data = {'col1': [1, 2, 3, 4],
'col2': ['A', 'B', 'C', 'D']}
df = pd.DataFrame(data)
# Boolean indexing using logical operators
# Select rows where col1 is greater than 2 and col2 is 'B'
result = df[(df['col1'] > 2) & (df['col2'] == 'B')]
print(result)
Output:
col1 col2
2 3 C
In this example:
df['col1'] > 2
creates a boolean mask where True indicates values greater than 2.- The
&
operator combines these masks, selecting only the rows where both conditions are True.
Additional notes:
- You can use parentheses to group conditions and control the order of operations.
- For more complex conditions, consider using functions like
isin
,between
, andquery
.
Example Codes for Logical Operators and Boolean Indexing in Pandas
Basic Example:
import pandas as pd
# Create a sample DataFrame
data = {'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd'],
'C': [True, False, True, False]}
df = pd.DataFrame(data)
# Boolean indexing using 'and' and 'or'
result1 = df[(df['A'] > 2) & (df['B'] == 'c')]
result2 = df[(df['A'] == 1) | (df['C'] == True)]
print(result1)
print(result2)
Explanation:
result1
selects rows where column 'A' is greater than 2 and column 'B' is equal to 'c'.
Using not operator:
# Select rows where column 'C' is False
result3 = df[~(df['C'] == True)]
print(result3)
- The
~
operator negates the condition, selecting rows where 'C' is not True (i.e., False).
Using isin function:
# Select rows where column 'B' contains 'a' or 'c'
result4 = df[df['B'].isin(['a', 'c'])]
print(result4)
- The
isin
function checks if values in a Series are contained within a specified list.
# Select rows where column 'A' is between 2 and 4 (inclusive)
result5 = df[df['A'].between(2, 4)]
print(result5)
Using query method:
# Select rows using a query string
result6 = df.query("A > 2 and B == 'c'")
print(result6)
- The
query
method allows you to filter the DataFrame using a query string, which can be more concise for complex conditions.
query Method:
- Purpose: Provides a more SQL-like syntax for filtering dataframes.
- Example:
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'c', 'd']}) result = df.query("A > 2 and B == 'c'")
loc and iloc Indexing:
- Purpose: Directly access rows and columns using integer or label-based indexing.
- Example:
# Using integer-based indexing (iloc) result = df.iloc[2:] # Select rows starting from index 2 # Using label-based indexing (loc) result = df.loc[df['A'] > 2] # Select rows where 'A' is greater than 2
- Purpose: Replaces values in a DataFrame based on a boolean condition.
- Example:
# Replace values in column 'A' where 'B' is 'c' df['A'] = df['A'].where(df['B'] != 'c', 0)
Custom Functions:
- Purpose: Create tailored filtering logic for specific use cases.
- Example:
def is_even(x): return x % 2 == 0 result = df[df['A'].apply(is_even)] # Select rows where 'A' is even
Vectorized Operations:
- Purpose: Leverage NumPy's vectorization for efficient operations on arrays.
- Example:
import numpy as np # Create a boolean mask using NumPy mask = np.logical_and(df['A'] > 2, df['B'] == 'c') result = df[mask]
Choosing the Right Method:
- Simplicity and Readability: For basic filtering, logical operators or the
query
method might be sufficient. - Performance: For large datasets or complex conditions, vectorized operations can offer speed advantages.
- Flexibility: Custom functions provide the most flexibility but require more coding effort.
- Specific Use Cases: Consider the nature of your data and the filtering requirements.
python pandas dataframe