Python for Time Series Analysis: Exploring Rolling Averages with NumPy

2024-06-21

Importing libraries and sample data:

import numpy as np

# Sample data (you can replace this with your actual time series data)
data = np.random.randint(1, 100, 10)  # Create random data of size 10

Window size for averaging:

The window size determines how many data points are included in the calculation for each rolling average value. A larger window size smoothens out the data more but might obscure sharper trends.

window_size = 3

Implementing the rolling average function:

Here, we'll use a loop-based approach to calculate the moving average for each data point. There are other methods using vectorization that can be more efficient for larger datasets.

def moving_average(data, window_size):
  # Check if window size is larger than data length (to avoid errors)
  if window_size > len(data):
    raise ValueError("Window size cannot be larger than data length")

  # Initialize moving average array with zeros
  moving_avg = np.zeros(len(data))

  # Iterate through the data and calculate average for each window
  for i in range(len(data)):
    # Define the window based on the current index
    window_data = data[max(0, i - window_size + 1):i + 1]
    # Calculate the average of the window data
    moving_avg[i] = np.mean(window_data)

  return moving_avg

Calculating and printing the results:

# Calculate moving average using the function
moving_avg = moving_average(data, window_size)

# Print original data and moving average
print("Original data:", data)
print("Moving average:", moving_avg)

This code will output the original data and the corresponding moving average values.

Key points:

The moving_average function iterates through the data, considering a window of size window_size around each data point.
It calculates the average of the data points within the window and assigns that value as the corresponding moving average for the center point of the window.
This process is repeated for all data points, resulting in a new array with the moving average values.

While this is a basic implementation using a loop, NumPy also offers vectorized operations using functions like np.convolve for potentially faster calculations on bigger datasets. Additionally, Pandas offers a more convenient rolling.mean() function for time series analysis tasks.

import numpy as np

# Sample data
data = np.random.randint(1, 100, 10)

# Window size
window_size = 3

# Vectorized approach using convolution
def moving_average_vectorized(data, window_size):
  # Check if window size is larger than data length
  if window_size > len(data):
    raise ValueError("Window size cannot be larger than data length")

  # Create weights for averaging (ones for the window size)
  weights = np.ones(window_size) / window_size

  # Perform convolution to calculate moving average efficiently
  moving_avg = np.convolve(data, weights, mode='valid')

  return moving_avg

# Calculate moving average using vectorized function
moving_avg_vectorized = moving_average_vectorized(data, window_size)

# Print original data and both moving averages (loop and vectorized)
print("Original data:", data)
print("Moving average (loop):", moving_average(data, window_size))
print("Moving average (vectorized):", moving_avg_vectorized)

This code showcases two ways to calculate the moving average:

Loop-based approach: Explained in the previous response.
Vectorized approach: This method utilizes np.convolve which performs a weighted sum over the data using the defined weights (ones in this case for equal weights within the window). It's generally more efficient for larger datasets.

Both methods provide the same results, but the vectorized approach using convolution might be faster for extensive calculations.

Using scipy.ndimage.uniform_filter1d:

SciPy's ndimage module offers a function called uniform_filter1d specifically designed for this purpose. It's a convenient and efficient way to calculate rolling averages.

from scipy.ndimage import uniform_filter1d

# Sample data
data = np.random.randint(1, 100, 10)

# Window size
window_size = 3

# Calculate moving average using uniform_filter1d
moving_avg_scipy = uniform_filter1d(data, window_size, mode='valid')

# Print original data and moving average
print("Original data:", data)
print("Moving average (scipy):", moving_avg_scipy)

Using Pandas rolling.mean() (if applicable):

If you're working with time series data in Pandas DataFrames, the rolling attribute provides a convenient mean() function for calculating rolling averages. This approach is specifically designed for time series analysis and offers additional functionalities like handling missing values and specifying different window types (expanding or rolling).

import pandas as pd

# Sample data as pandas Series (assuming you have time series data)
data = pd.Series(np.random.randint(1, 100, 10), index=pd.date_range('2023-01-01', periods=10))

# Window size
window_size = 3

# Calculate moving average using pandas rolling.mean()
moving_avg_pandas = data.rolling(window=window_size).mean()

# Print original data and moving average
print("Original data:")
print(data)
print("Moving average (pandas):")
print(moving_avg_pandas)

Recursive approach (for understanding, might not be most efficient):

This method uses recursion to iteratively calculate the moving average based on the previous value and the new data point. It's less efficient than other methods but can be helpful for understanding the logic behind the calculation.

def moving_average_recursive(data, window_size, i=0, prev_avg=None):
  # Base case: reached end of data
  if i >= len(data):
    return

  # Initialize moving average for the first window
  if prev_avg is None:
    prev_avg = np.mean(data[:window_size])

  # Calculate new moving average based on previous value and new data point
  new_avg = prev_avg + (data[i] - prev_avg) / window_size

  # Recursively call for the next data point
  moving_averages = moving_average_recursive(data, window_size, i + 1, new_avg)

  # If at the beginning, initialize the result array
  if i == 0:
    result = np.zeros(len(data))
  result[i] = new_avg

  return result

# Sample data
data = np.random.randint(1, 100, 10)

# Window size
window_size = 3

# Calculate moving average using recursion
moving_avg_recursive = moving_average_recursive(data.copy(), window_size)

# Print original data and moving average
print("Original data:", data)
print("Moving average (recursive):", moving_avg_recursive)

These are some alternate methods for calculating rolling averages in Python. Choose the approach that best suits your needs based on factors like data size, efficiency requirements, and whether you're working with NumPy arrays or Pandas DataFrames.

python numpy time-series

Python for Time Series Analysis: Exploring Rolling Averages with NumPy

Mastering Tree Rendering in Django: From Loops to Libraries

Adding Seconds to Time Objects in Python: A Beginner-Friendly Guide

The Importance of Closing Database Connections in Python (SQLite)

Understanding Django Model Customization: A Look at the Meta Class

Beyond Catching Errors: Effective Strategies for Handling SQLAlchemy Integrity Violations in Python

Python for Data Smoothing: Exploring Moving Averages with NumPy and SciPy