Sample Like a Pro: Mastering Normal Distribution Generation with PyTorch

2024-07-27

A bell-shaped probability distribution where data tends to cluster around a central value (mean) with a specific spread (standard deviation).
Commonly used in machine learning and statistics for modeling continuous data.

Creating a Normal Distribution in PyTorch:

PyTorch offers two primary methods to generate random samples from a normal distribution:

torch.normal() function:

Takes two arguments:
- mean: The central value of the distribution (tensor or scalar).
- std: The standard deviation (tensor or scalar).
Returns a new tensor filled with random samples drawn from the specified normal distribution.

import torch

mean = torch.tensor(3.0)
std = torch.tensor(1.5)

samples = torch.normal(mean, std)
print(samples)

This code generates a tensor of random numbers centered around 3.0 with a standard deviation of 1.5.

A simpler approach for generating samples from the standard normal distribution (mean of 0 and standard deviation of 1).
Returns a new tensor with the same size and device (CPU or GPU) as the input tensor (if provided) filled with random samples.

import torch

samples = torch.randn(10, 2)  # Creates a 10x2 tensor with standard normal samples
print(samples)

This code generates a 10x2 tensor populated with random values from the standard normal distribution.

Key Points:

Both methods produce random numbers. The specific values will differ on each run.
You can control the shape (size) of the generated tensor by specifying dimensions during function calls.
For more advanced distribution functionalities, consider exploring libraries like torch.distributions.

Incorporating numpy (Optional):

If you're working with NumPy arrays and want to convert them to PyTorch tensors for distribution generation, you can use the torch.from_numpy() function:

import torch
import numpy as np

# Create a NumPy array with desired mean and standard deviation
data = np.random.normal(loc=5.0, scale=2.0, size=(100, 50))

# Convert NumPy array to PyTorch tensor
tensor_data = torch.from_numpy(data)

# Use PyTorch functions to generate samples from the normal distribution
# (operations on the tensor_data object)

import torch

# Define mean and standard deviation
mean = torch.tensor(7.0)
std = torch.tensor(2.5)

# Generate 100 samples with the specified mean and standard deviation
samples = torch.normal(mean, std, size=(100,))  # Shape (100,) for a 1D tensor

print(samples)

Generating Samples from the Standard Normal Distribution:

import torch

# Generate a 3x4 tensor with standard normal samples (mean 0, std 1)
samples = torch.randn(3, 4)

print(samples)

Generating Samples from a NumPy Array (Optional):

import torch
import numpy as np

# Create a NumPy array with desired mean and standard deviation
data = np.random.normal(loc=3.0, scale=1.0, size=(50, 20))  # Shape (50, 20)

# Convert NumPy array to PyTorch tensor
tensor_data = torch.from_numpy(data)

# Generate samples from the normal distribution based on the converted tensor
# (operations on the tensor_data object, potentially using torch.normal())

This code demonstrates converting a NumPy array with a specific mean and standard deviation to a PyTorch tensor. You can then use PyTorch functions like torch.normal() to generate samples from the distribution represented by the tensor_data.

This method involves transforming a uniform distribution (values between 0 and 1) into a normal distribution using the inverse cumulative distribution function (CDF) of the normal distribution. However, it's generally less efficient than torch.normal() and might not be suitable for large-scale applications in PyTorch.

Here's a basic illustration (without error handling):

import torch

def inverse_transform_normal(u):
  """
  This function is for illustration purposes and might not be numerically stable.
  """
  # Invert the standard normal CDF (replace with a proper implementation)
  z = torch.sqrt(-2.0 * torch.log(u))
  return z

# Generate uniform samples
u = torch.rand(10)

# Apply inverse transform
samples = inverse_transform_normal(u)

print(samples)

Utilizing the Box-Muller Transform:

This method leverages two uniform random variables to generate two independent samples from a standard normal distribution. While more efficient than the inverse transform method, it's still less practical than torch.normal() for most PyTorch use cases.

Here's a simplified example (ignoring potential numerical issues):

import torch

def box_muller_normal(u1, u2):
  """
  This function is for illustration purposes and might not be robust.
  """
  z1 = torch.sqrt(-2.0 * torch.log(u1)) * torch.cos(2.0 * np.pi * u2)
  z2 = torch.sqrt(-2.0 * torch.log(u1)) * torch.sin(2.0 * np.pi * u2)
  return z1, z2

# Generate uniform samples
u1 = torch.rand(10)
u2 = torch.rand(10)

# Apply Box-Muller transform
samples1, samples2 = box_muller_normal(u1, u2)

print(samples1)
print(samples2)

Important Considerations:

These alternative methods are provided for educational purposes and might not be the most efficient or numerically stable solutions for real-world PyTorch applications.
For most scenarios, torch.normal() remains the recommended and optimized approach for generating samples from a normal distribution in PyTorch.

python pytorch normal-distribution