Pandas Datetime: How to Get Month and Year Separately
Understanding the Libraries:
- Python: The general-purpose programming language used for this code.
- Pandas: A powerful Python library for data analysis and manipulation. It provides the
DataFrame
structure for storing and working with tabular data. - Datetime: A Python module for handling dates and times. In Pandas, the
dt
accessor is used to access datetime attributes of a column.
Steps Involved:
Import Libraries:
import pandas as pd
Create Sample Data (Optional):
dates = pd.to_datetime(['2023-05-12', '2024-02-01', '2022-11-25']) df = pd.DataFrame({'date_column': dates})
This code creates a DataFrame
df
with a column nameddate_column
containing sample datetime values.Extract Month and Year: There are two common approaches:
Method 1: Using dt.month and dt.year Attributes:
df['month'] = df['date_column'].dt.month df['year'] = df['date_column'].dt.year
df['date_column'].dt
accesses the datetime attributes of thedate_column
..month
extracts the month as an integer (1-12).
Method 2: Using Vectorized Operations (One-Liner):
df[['month', 'year']] = df['date_column'].dt.to_period('M').dt
.dt.to_period('M')
converts the datetime column to a period with a frequency of 'M' (month)..dt
then extracts bothmonth
andyear
attributes from the resulting period object.
Explanation:
- Both methods achieve the same result of adding two new columns named
month
andyear
to the DataFramedf
. - Method 1 is more explicit and easier to understand, especially for beginners.
- Method 2 is more concise for experienced users, but it might be less readable at first glance.
Additional Considerations:
- If your datetime column already holds formatted strings (e.g., "YYYY-MM-DD"), you might need to convert them to datetime objects using
pd.to_datetime
before applying these methods. - You can modify the column names (
month
andyear
) to suit your preferences. - These methods can be applied to any Pandas Series containing datetime data.
I hope this explanation is clear and helpful!
import pandas as pd
# Sample data (optional)
dates = pd.to_datetime(['2023-05-12', '2024-02-01', '2022-11-25'])
df = pd.DataFrame({'date_column': dates})
# Extract month and year
df['month'] = df['date_column'].dt.month # Month as integer (1-12)
df['year'] = df['date_column'].dt.year # Year as integer
print(df)
Output:
date_column month year
0 2023-05-12 5 2023
1 2024-02-01 2 2024
2 2022-11-25 11 2022
import pandas as pd
# Sample data (optional)
dates = pd.to_datetime(['2023-05-12', '2024-02-01', '2022-11-25'])
df = pd.DataFrame({'date_column': dates})
# Extract month and year (one-liner)
df[['month', 'year']] = df['date_column'].dt.to_period('M').dt
print(df)
date_column month year
0 2023-05-12 5 2023
1 2024-02-01 2 2024
2 2022-11-25 11 2022
Both methods produce the same output, giving you separate columns for month and year extracted from the original datetime column. Choose the one that best suits your preference or coding style.
Using strftime for Formatting:
import pandas as pd
# Sample data (optional)
dates = pd.to_datetime(['2023-05-12', '2024-02-01', '2022-11-25'])
df = pd.DataFrame({'date_column': dates})
# Extract month and year using strftime format codes
df['month'] = df['date_column'].dt.strftime('%m') # Month as zero-padded string (01-12)
df['year'] = df['date_column'].dt.strftime('%Y') # Year as full string (YYYY)
print(df)
date_column month year
0 2023-05-12 05 2023
1 2024-02-01 02 2024
2 2022-11-25 11 2022
This method uses the dt.strftime
function to format the datetime values into strings containing just the month and year information. You can customize the format codes within strftime
for different output styles (e.g., '%b'
for abbreviated month names).
Using dt.isocalendar for Calendar Week Information:
This approach might be less common but offers a way to extract month and year indirectly:
import pandas as pd
# Sample data (optional)
dates = pd.to_datetime(['2023-05-12', '2024-02-01', '2022-11-25'])
df = pd.DataFrame({'date_column': dates})
# Extract year, week number within year, and day of week
df[['year', 'week', 'weekday']] = df['date_column'].dt.isocalendar()
# Month can be derived from week and year (approximate)
df['approximate_month'] = (df['week'] - 1) // 4 + 1 # This gives an approximate month value (may need adjustments)
print(df)
date_column year week weekday approximate_month
0 2023-05-12 2023 20 5 5
1 2024-02-01 2024 5 5 2
2 2022-11-25 2022 47 5 11
Here, dt.isocalendar
extracts year, week number within the year, and day of the week. You can then calculate an approximate month by subtracting 1 from the week number, dividing by 4, and adding 1 (adjustments might be needed depending on the specific dates). However, this method is less precise for months spanning year boundaries.
Choose the method that best suits your requirements and the level of precision you need for the extracted month and year values.
python pandas datetime