Extracting Data from CSV Files for Storage in SQLite3 Databases with Python

2024-05-08

I'd be glad to explain how to import a CSV file into a SQLite3 database table using Python:

Importing Necessary Modules:

sqlite3: This built-in Python module allows you to interact with SQLite3 databases. It provides functions for connecting, creating/dropping tables, executing queries, and fetching results.
csv: This module, also included in the standard library, helps you work with CSV (Comma-Separated Values) files. It offers methods for reading, writing, and manipulating CSV data.

Steps Involved:

Establish Database Connection:
- Use sqlite3.connect() to create a connection object. This object represents the link between your Python program and the database file. You can specify the database filename or use :memory: to create an in-memory database (temporary).
Create Cursor Object:
- Obtain a cursor object from the connection using conn.cursor(). The cursor acts as your interface for executing SQL statements and interacting with the database.
(Optional) Create Table:
- If the table you want to import data into doesn't exist, prepare a SQL CREATE TABLE statement that defines the table structure (column names and data types). Execute this statement using the cursor's execute() method.
Read CSV File:
- Open the CSV file in read mode ('r').
- Use the csv.reader() function to create a reader object. This object helps you iterate through the rows of the CSV file in an organized way.
Process CSV Data (Optional):
- You might want to perform some pre-processing on the data before inserting it into the database. For example, you could convert data types, handle missing values, or filter specific rows.
Insert Data into Database:
- Construct an INSERT INTO SQL statement with placeholders for the column values. Use the executemany() method of the cursor to execute the statement in bulk for each row in the CSV file. This is more efficient than executing individual INSERT statements.
Commit Changes:
- Use conn.commit() to save the changes you've made to the database.
Close Connections:
- Ensure you close the database connection and cursor object using conn.close() to release resources properly.

Here's an example code that incorporates these steps:

import sqlite3
import csv

# Connect to the database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Create table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS my_table (
                    column1 TEXT,
                    column2 INTEGER,
                    column3 REAL
                )''')

# Open the CSV file
with open('my_data.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)

    # Skip header row (if present)
    next(reader, None)  # Assuming the first row contains column names

    # Prepare INSERT statement
    insert_stmt = "INSERT INTO my_table VALUES (?, ?, ?)"

    # Insert data in bulk
    for row in reader:
        cursor.execute(insert_stmt, row)

# Commit changes
conn.commit()

# Close connections
conn.close()

print("CSV data imported successfully!")

Explanation of the Code:

The code imports sqlite3 and csv modules.
It connects to the database file my_database.db.
The CREATE TABLE statement defines the table structure.
The CSV file my_data.csv is opened in read mode.
The next() function skips the header row if present.
The INSERT statement is prepared with placeholders.
The executemany() method efficiently inserts multiple rows at once.
The connection is committed, and resources are closed.

Remember to replace my_database.db, my_data.csv, and the table structure (my_table) with your specific file names and table definition.

Here's an example code that incorporates different data types and error handling:

import sqlite3
import csv

# Connect to the database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Create table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS my_table (
                    id INTEGER PRIMARY KEY,
                    name TEXT,
                    age INTEGER,
                    price REAL
                )''')

# Open the CSV file
try:
    with open('my_data.csv', 'r') as csvfile:
        reader = csv.reader(csvfile)

        # Skip header row (if present)
        next(reader, None)  # Assuming the first row contains column names

        # Prepare INSERT statement
        insert_stmt = "INSERT INTO my_table VALUES (?, ?, ?, ?)"

        # Insert data in bulk with error handling
        for row in reader:
            try:
                # Convert data types if necessary (e.g., convert string to integer)
                row[2] = int(row[2])  # Assuming age is in the third column (index 2)
                row[3] = float(row[3])  # Assuming price is in the fourth column (index 3)
                cursor.execute(insert_stmt, row)
            except ValueError:  # Handle potential conversion errors
                print(f"Error converting data in row: {row}")

# Handle file opening errors
except FileNotFoundError:
    print("Error: CSV file not found!")

# Commit changes
conn.commit()

# Close connections
conn.close()

print("CSV data imported (with potential errors handled).")

Explanation of Enhancements:

Error Handling: The code includes a try-except block to catch potential ValueError exceptions during data type conversion (e.g., converting a non-numeric string to an integer). This prevents the entire import process from failing due to a single invalid row.
Data Type Conversion: The code demonstrates basic data type conversion using int() and float() for the age and price columns, assuming they are originally strings in the CSV file.
Informative Messages: The code provides informative messages for both successful import and potential errors encountered during the import process.

import pandas as pd
import sqlite3

# Connect to the database
conn = sqlite3.connect('my_database.db')

# Read CSV data into a DataFrame
df = pd.read_csv('my_data.csv')

# Write DataFrame to an existing table (or create a new one)
df.to_sql('my_table', conn, if_exists='append', index=False)

# Close connections
conn.close()

print("CSV data imported using Pandas.")

This approach simplifies the process:

pd.read_csv() reads the CSV file into a DataFrame, a structured data container.
DataFrame.to_sql() writes the DataFrame directly to the database table. You can specify if_exists='append' to add data to an existing table or 'replace' to overwrite it.

Using executemany with Type Conversion:

You can leverage the existing executemany() method in the sqlite3 module but incorporate data type conversion during insertion:

import sqlite3
import csv

# Connect to the database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Create table (if it doesn't exist)
# ... (same as previous examples)

# Open the CSV file
with open('my_data.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)

    # Skip header row (if present)
    next(reader, None)  # Assuming the first row contains column names

    # Prepare INSERT statement
    insert_stmt = "INSERT INTO my_table VALUES (?, ?, ?, ?)"

    # Insert data with type conversion
    for row in reader:
        # Convert data types based on column definitions
        row[1] = int(row[1])  # Assuming column 1 needs integer conversion
        row[3] = float(row[3])  # Assuming column 3 needs float conversion
        cursor.execute(insert_stmt, row)

# Commit changes, close connections
# ... (same as previous examples)

print("CSV data imported with type conversion.")

This approach offers more control over data type conversion compared to Pandas (which might infer types automatically).

Using a Context Manager:

You can use a context manager with the sqlite3.connect() function to ensure proper connection closing:

import sqlite3
import csv

with sqlite3.connect('my_database.db') as conn:
    cursor = conn.cursor()

    # ... (rest of the code using cursor object)

print("CSV data imported using context manager.")

This approach promotes cleaner code by automatically closing the connection when the with block exits.

python database csv

Extracting Data from CSV Files for Storage in SQLite3 Databases with Python

Django Form Defaults: initial Dictionary vs. Model Defaults

Python: Find All Files with a Specific Extension in a Directory

Three Ways to Clear Your Django Database: Model Manager, Management Commands, and Beyond