Python for Time Series Analysis: Exploring Rolling Averages with NumPy
Importing libraries and sample data:
import numpy as np
# Sample data (you can replace this with your actual time series data)
data = np.random.randint(1, 100, 10) # Create random data of size 10
Window size for averaging:
The window size determines how many data points are included in the calculation for each rolling average value. A larger window size smoothens out the data more but might obscure sharper trends.
window_size = 3
Implementing the rolling average function:
Here, we'll use a loop-based approach to calculate the moving average for each data point. There are other methods using vectorization that can be more efficient for larger datasets.
def moving_average(data, window_size):
# Check if window size is larger than data length (to avoid errors)
if window_size > len(data):
raise ValueError("Window size cannot be larger than data length")
# Initialize moving average array with zeros
moving_avg = np.zeros(len(data))
# Iterate through the data and calculate average for each window
for i in range(len(data)):
# Define the window based on the current index
window_data = data[max(0, i - window_size + 1):i + 1]
# Calculate the average of the window data
moving_avg[i] = np.mean(window_data)
return moving_avg
Calculating and printing the results:
# Calculate moving average using the function
moving_avg = moving_average(data, window_size)
# Print original data and moving average
print("Original data:", data)
print("Moving average:", moving_avg)
This code will output the original data and the corresponding moving average values.
Key points:
- The
moving_average
function iterates through the data, considering a window of sizewindow_size
around each data point. - It calculates the average of the data points within the window and assigns that value as the corresponding moving average for the center point of the window.
- This process is repeated for all data points, resulting in a new array with the moving average values.
While this is a basic implementation using a loop, NumPy also offers vectorized operations using functions like np.convolve
for potentially faster calculations on bigger datasets. Additionally, Pandas offers a more convenient rolling.mean()
function for time series analysis tasks.
import numpy as np
# Sample data
data = np.random.randint(1, 100, 10)
# Window size
window_size = 3
# Vectorized approach using convolution
def moving_average_vectorized(data, window_size):
# Check if window size is larger than data length
if window_size > len(data):
raise ValueError("Window size cannot be larger than data length")
# Create weights for averaging (ones for the window size)
weights = np.ones(window_size) / window_size
# Perform convolution to calculate moving average efficiently
moving_avg = np.convolve(data, weights, mode='valid')
return moving_avg
# Calculate moving average using vectorized function
moving_avg_vectorized = moving_average_vectorized(data, window_size)
# Print original data and both moving averages (loop and vectorized)
print("Original data:", data)
print("Moving average (loop):", moving_average(data, window_size))
print("Moving average (vectorized):", moving_avg_vectorized)
This code showcases two ways to calculate the moving average:
- Loop-based approach: Explained in the previous response.
- Vectorized approach: This method utilizes
np.convolve
which performs a weighted sum over the data using the defined weights (ones in this case for equal weights within the window). It's generally more efficient for larger datasets.
Both methods provide the same results, but the vectorized approach using convolution might be faster for extensive calculations.
Using scipy.ndimage.uniform_filter1d:
SciPy's ndimage
module offers a function called uniform_filter1d
specifically designed for this purpose. It's a convenient and efficient way to calculate rolling averages.
from scipy.ndimage import uniform_filter1d
# Sample data
data = np.random.randint(1, 100, 10)
# Window size
window_size = 3
# Calculate moving average using uniform_filter1d
moving_avg_scipy = uniform_filter1d(data, window_size, mode='valid')
# Print original data and moving average
print("Original data:", data)
print("Moving average (scipy):", moving_avg_scipy)
Using Pandas rolling.mean() (if applicable):
If you're working with time series data in Pandas DataFrames, the rolling
attribute provides a convenient mean()
function for calculating rolling averages. This approach is specifically designed for time series analysis and offers additional functionalities like handling missing values and specifying different window types (expanding or rolling).
import pandas as pd
# Sample data as pandas Series (assuming you have time series data)
data = pd.Series(np.random.randint(1, 100, 10), index=pd.date_range('2023-01-01', periods=10))
# Window size
window_size = 3
# Calculate moving average using pandas rolling.mean()
moving_avg_pandas = data.rolling(window=window_size).mean()
# Print original data and moving average
print("Original data:")
print(data)
print("Moving average (pandas):")
print(moving_avg_pandas)
Recursive approach (for understanding, might not be most efficient):
This method uses recursion to iteratively calculate the moving average based on the previous value and the new data point. It's less efficient than other methods but can be helpful for understanding the logic behind the calculation.
def moving_average_recursive(data, window_size, i=0, prev_avg=None):
# Base case: reached end of data
if i >= len(data):
return
# Initialize moving average for the first window
if prev_avg is None:
prev_avg = np.mean(data[:window_size])
# Calculate new moving average based on previous value and new data point
new_avg = prev_avg + (data[i] - prev_avg) / window_size
# Recursively call for the next data point
moving_averages = moving_average_recursive(data, window_size, i + 1, new_avg)
# If at the beginning, initialize the result array
if i == 0:
result = np.zeros(len(data))
result[i] = new_avg
return result
# Sample data
data = np.random.randint(1, 100, 10)
# Window size
window_size = 3
# Calculate moving average using recursion
moving_avg_recursive = moving_average_recursive(data.copy(), window_size)
# Print original data and moving average
print("Original data:", data)
print("Moving average (recursive):", moving_avg_recursive)
These are some alternate methods for calculating rolling averages in Python. Choose the approach that best suits your needs based on factors like data size, efficiency requirements, and whether you're working with NumPy arrays or Pandas DataFrames.
python numpy time-series