Python: Search DataFrame Columns Containing a String
Import pandas library:
import pandas as pd
Create a sample DataFrame:
df = pd.DataFrame({'A': [1,2,3], 'B': ['apple', 'banana', 'cherry'], 'C': ['cat', 'dog', 'fish']})
Find columns using list comprehension:
You can achieve this using a list comprehension that iterates through the DataFrame's columns (df.columns
) and checks if the desired string is present in each column name using the in
operator.
string_to_find = 'A'
matching_columns = [col for col in df.columns if string_to_find in col]
print(matching_columns)
This code will output:
['A']
Explanation:
- The list comprehension
[col for col in df.columns if string_to_find in col]
creates a new list. - It iterates through each column name (
col
) indf.columns
. - The
if
condition checks if thestring_to_find
is present within the column name (col
). - If the condition is true, the column name is added to the new list.
Alternatively, using the filter method:
matching_columns = df.filter(like=string_to_find)
print(matching_columns)
This method directly filters the DataFrame itself to include only columns with names containing the specified string.
Both methods achieve the same result of finding columns whose names contain the desired string. Choose the approach that best suits your needs and coding style.
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1,2,3], 'B': ['apple', 'banana', 'cherry'], 'C': ['cat', 'dog', 'fish']})
# Find columns using list comprehension
string_to_find = 'A'
matching_columns_list_comp = [col for col in df.columns if string_to_find in col]
print("Columns using list comprehension:", matching_columns_list_comp)
# Find columns using filter method
matching_columns_filter = df.filter(like=string_to_find)
print("Columns using filter method:")
print(matching_columns_filter)
Columns using list comprehension: ['A']
Columns using filter method:
A
0 1
1 2
2 3
As you can see, both methods successfully identify the column "A" which contains the string "A" in its name. You can modify the string_to_find
variable to search for different strings in your column names.
Using regular expressions (regex):
This method allows for more complex matching patterns beyond just a simple string search.
import pandas as pd
import re
# Sample DataFrame
df = pd.DataFrame({'A': [1,2,3], 'B': ['apple', 'banana', 'cherry'], 'Price_C': [10, 20, 30]})
# Find columns with names starting with 'A'
pattern = r'^A.*' # r for raw string, .* matches any character sequence
matching_columns_regex = df.filter(regex=pattern)
print(matching_columns_regex)
re.compile(pattern)
is used for advanced regex patterns (not shown here).df.filter(regex=pattern)
filters based on the regular expression.- The pattern
^A.*
matches any column name starting with "A" (^
for beginning,.*
for any characters).
Using str.contains:
This method offers a more concise way to check for string containment within column names.
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({'A': [1,2,3], 'B': ['apple', 'banana', 'cherry'], 'Price_C': [10, 20, 30]})
# Find columns with names containing 'ice'
string_to_find = 'ice'
matching_columns_str_contains = df.columns[df.columns.str.contains(string_to_find)]
print(matching_columns_str_contains)
df.columns
returns a Series containing the column names.str.contains(string_to_find)
checks for the presence of the string within each column name.- We then use indexing to extract the actual column names based on the boolean Series returned by
str.contains
.
These methods provide additional flexibility in searching for columns based on specific string patterns or containment within names. Choose the method that best suits the complexity of your search criteria.
python python-3.x string