Python for Data Smoothing: Exploring Moving Averages with NumPy and SciPy
Here's how to calculate moving average in Python using NumPy and SciPy:
NumPy's convolve function:
This method is efficient for calculating moving averages. It uses convolution to slide a window of a specified size over the data and calculates the mean of each window.
import numpy as np
from scipy.signal import convolve
# Sample data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Window size for moving average
window_size = 3
# Create a weight vector with equal weights (1/window_size) for averaging
weights = np.ones(window_size) / window_size
# Calculate moving average using convolution
smoothed_data = convolve(data, weights, mode='valid') # 'valid' removes elements outside the window
# Print the original data and smoothed data
print("Original data:", data)
print("Smoothed data:", smoothed_data)
This code outputs:
Original data: [ 1 2 3 4 5 6 7 8 9 10]
Smoothed data: [2. 3. 4. 5. 6. 7. 8. 9.]
SciPy's ndimage.uniform_filter1d function:
This SciPy function offers another way to compute the moving average. It applies a uniform filter of a defined size to the data.
import numpy as np
from scipy.ndimage import uniform_filter1d
# Sample data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Window size for moving average
window_size = 3
# Calculate moving average using uniform_filter1d
smoothed_data = uniform_filter1d(data, window_size)
# Print the original data and smoothed data
print("Original data:", data)
print("Smoothed data:", smoothed_data)
This code outputs the same result as the previous example.
Key Points:
- Both methods achieve the same result of calculating the moving average.
convolve
offers more flexibility for defining custom weights for the moving average.uniform_filter1d
might be simpler to use for basic moving averages.- Choose the method that best suits your specific needs and preferences.
import numpy as np
from scipy.signal import convolve
# Sample data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
def calculate_moving_average_convolve(data, window_size):
"""
This function calculates the moving average of a data series using convolution.
Args:
data: The data series as a NumPy array.
window_size: The size of the moving average window.
Returns:
The moving average of the data series as a NumPy array.
"""
# Create a weight vector with equal weights for averaging
weights = np.ones(window_size) / window_size
# Calculate moving average using convolution (removes elements outside window)
smoothed_data = convolve(data, weights, mode='valid')
return smoothed_data
# Example usage
window_size = 3
smoothed_data = calculate_moving_average_convolve(data.copy(), window_size) # Avoid modifying original data
# Print the original data and smoothed data
print("Original data:", data)
print("Smoothed data:", smoothed_data)
This code defines a function calculate_moving_average_convolve
that takes the data and window size as arguments, making it reusable for different data and window sizes.
import numpy as np
from scipy.ndimage import uniform_filter1d
# Sample data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
def calculate_moving_average_uniform(data, window_size):
"""
This function calculates the moving average of a data series using uniform filter.
Args:
data: The data series as a NumPy array.
window_size: The size of the moving average window.
Returns:
The moving average of the data series as a NumPy array.
"""
# Calculate moving average using uniform_filter1d
smoothed_data = uniform_filter1d(data, window_size)
return smoothed_data
# Example usage
window_size = 3
smoothed_data = calculate_moving_average_uniform(data.copy(), window_size) # Avoid modifying original data
# Print the original data and smoothed data
print("Original data:", data)
print("Smoothed data:", smoothed_data)
This code defines a similar function calculate_moving_average_uniform
for calculating the moving average using the uniform filter. Both functions now allow for easy application to different datasets.
List comprehension and loop (for small datasets):
This method is suitable for small datasets and offers more control over the calculation.
def calculate_moving_average_loop(data, window_size):
"""
This function calculates the moving average using a loop.
Args:
data: The data series as a list.
window_size: The size of the moving average window.
Returns:
A list containing the moving averages for each data point.
"""
smoothed_data = []
for i in range(len(data)):
# Check if enough data is available for the window
if i < window_size - 1:
smoothed_data.append(np.nan) # Insert NaN for missing values
else:
# Calculate average for the window
window_data = data[i - window_size + 1: i + 1]
smoothed_data.append(sum(window_data) / window_size)
return smoothed_data
# Example usage with sample data as a list
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
window_size = 3
smoothed_data = calculate_moving_average_loop(data.copy(), window_size) # Avoid modifying original data
# Print the original data and smoothed data
print("Original data:", data)
print("Smoothed data:", smoothed_data)
Pandas rolling window (for time series data):
If you're working with time series data stored in a Pandas DataFrame, the rolling
window functionality is a convenient option.
import pandas as pd
# Sample data as Pandas Series
data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Window size for moving average
window_size = 3
# Calculate moving average using Pandas rolling window
smoothed_data = data.rolling(window=window_size).mean()
# Print the original data and smoothed data
print("Original data:")
print(data)
print("\nSmoothed data:")
print(smoothed_data)
This method is efficient for time series analysis and leverages the capabilities of Pandas for data manipulation.
Remember:
- Choose the method that best suits your data size, application, and desired level of control.
- The loop-based method offers flexibility but might be less efficient for large datasets.
- Pandas
rolling
is ideal for time series data analysis within the Pandas ecosystem.
python numpy scipy