Working with NumPy Arrays: Saving and Loading Made Easy
Saving NumPy Arrays:
np.save(file, arr, allow_pickle=False): This is the recommended approach for most cases. It saves a single array to a compact, binary
.npy
file. Here's how it works:file
: The filename (string) or a file-like object where the array will be stored.arr
: The NumPy array you want to save.allow_pickle
(optional, defaultFalse
): Controls whether to allow saving object arrays (arrays containing custom objects) using Python's pickling mechanism. Disabling pickling is generally safer and more efficient.
Example:
import numpy as np data = np.array([1, 2, 3, 4, 5]) np.save('my_data.npy', data)
np.savetxt(fname, X, fmt='%s', delimiter=',', newline='\n', header=None, comments='# ', encoding='utf-8'): Use this if you need a human-readable text file (like CSV). It stores the array as plain text, with each row representing an array element.
fname
: The filename (string) for the output file.X
: The NumPy array to save.fmt
(optional): Format string for individual elements (e.g.,'%f'
for floats).delimiter
(optional): String separating elements in each row (default: comma).newline
(optional): String representing the newline character (default:\n
).header
(optional): String to write as the header row.comments
(optional): String to prepend to each comment line (default:#
).encoding
(optional): String for the file encoding (default:utf-8
).
import numpy as np data = np.array([[1, 2, 3], [4, 5, 6]]) np.savetxt('data.csv', data, delimiter=',')
np.load(file): This function retrieves a single array from a
.npy
file.file
: The filename (string) or a file-like object containing the NumPy array data.
import numpy as np loaded_data = np.load('my_data.npy') print(loaded_data) # Output: [1 2 3 4 5]
np.loadtxt(fname, dtype=float, skiprows=0, converters=None, delimiter=None, usecols=None, ndmin=0, encoding='utf-8'): Use this to read text files (like CSV) back into NumPy arrays.
fname
: The filename (string) of the text file to load.dtype
(optional): The data type of the loaded array (default:float
).skiprows
(optional): Number of header rows to skip (default: 0).converters
(optional): Function(s) to apply to each column (e.g.,lambda x: int(x)
).delimiter
(optional): String separating elements in each row (default: whitespace).usecols
(optional): List of column indices to read (e.g.,[0, 2]
).ndmin
(optional): Minimum number of dimensions in the output array (default: 0).encoding
(optional): String for the file encoding (default:utf-8
).
import numpy as np loaded_data = np.loadtxt('data.csv', delimiter=',') print(loaded_data) # Output: [[1. 2. 3.] [4. 5. 6.]] (assuming data.csv is a CSV file)
Key Considerations:
- .npy files are generally preferred for efficiency and compactness, especially for binary data.
- Use allow_pickle=True with caution only if you must save object arrays, as it can introduce security risks and compatibility issues.
- np.savetxt is suitable for human-readable output but
Saving NumPy Arrays:
Using np.save for binary .npy files (recommended):
import numpy as np
# Create a sample NumPy array
data = np.array([1, 2, 3, 4, 5])
# Save the array to a .npy file
np.save('my_data.npy', data)
print("Array saved successfully!")
Using np.savetxt for human-readable text files (CSV or custom format):
import numpy as np
# Create a sample 2D NumPy array
data = np.array([[1, 2, 3], [4, 5, 6]])
# Save the array to a CSV file with comma delimiter
np.savetxt('data.csv', data, delimiter=',')
# Save the array to a custom text file with tab delimiter and header
np.savetxt('data_custom.txt', data, delimiter='\t', header='My Data', comments='# ')
print("Arrays saved in text formats!")
Using np.load for binary .npy files:
import numpy as np
# Load the previously saved array
loaded_data = np.load('my_data.npy')
print("Loaded array:", loaded_data)
import numpy as np
# Load the previously saved CSV file
loaded_data_csv = np.loadtxt('data.csv', delimiter=',')
# Load the previously saved custom text file with tab delimiter and skip header
loaded_data_custom = np.loadtxt('data_custom.txt', delimiter='\t', skiprows=1) # Skip header
print("Loaded data from CSV:", loaded_data_csv)
print("Loaded data from custom text file:", loaded_data_custom)
Using pickle:
Purpose: For saving and loading more complex data structures containing custom objects alongside NumPy arrays.
Caution:
- Security risks exist if the loaded data comes from an untrusted source.
- Compatibility issues might arise when loading pickled data across different Python versions or environments.
Example:
import pickle import numpy as np # Create a custom data structure data = {'array': np.array([1, 2, 3]), 'name': 'My Data'} # Save the data using pickle (avoid for sensitive data) with open('data.pkl', 'wb') as f: pickle.dump(data, f) # Load the data using pickle with open('data.pkl', 'rb') as f: loaded_data = pickle.load(f) print(loaded_data) # Output: {'array': array([1, 2, 3]), 'name': 'My Data'}
Using HDF5 (Hierarchical Data Format 5):
Purpose: For storing large, complex datasets that might have hierarchical structures or require efficient compression.
Library: Requires the
h5py
library (pip install h5py
).import h5py import numpy as np # Create a NumPy array data = np.array([1, 2, 3]) # Save the array to an HDF5 file with h5py.File('data.hdf5', 'w') as f: f.create_dataset('my_array', data=data) # Load the array from the HDF5 file with h5py.File('data.hdf5', 'r') as f: loaded_data = f['my_array'][:] # Slicing to get the actual data print(loaded_data) # Output: [1 2 3]
Using JSON (JavaScript Object Notation):
Purpose: For saving and loading NumPy arrays that represent simple data structures suitable for JSON format. Note that JSON doesn't natively support NumPy data types.
Library: Requires the
json
library (usually included in Python by default).Example (converting NumPy array to a list before saving):
import json import numpy as np # Create a NumPy array of integers data = np.array([1, 2, 3]) # Convert the array to a list for JSON serialization data_list = data.tolist() # Save the list to a JSON file with open('data.json', 'w') as f: json.dump(data_list, f) # Load the data from the JSON file with open('data.json', 'r') as f: loaded_data_list = json.load(f) # Convert the list back to a NumPy array loaded_data = np.array(loaded_data_list) print(loaded_data) # Output: [1 2 3]
python arrays numpy