2024-05-08

Extracting Data from CSV Files for Storage in SQLite3 Databases with Python

python database csv

I'd be glad to explain how to import a CSV file into a SQLite3 database table using Python:

Importing Necessary Modules:

  • sqlite3: This built-in Python module allows you to interact with SQLite3 databases. It provides functions for connecting, creating/dropping tables, executing queries, and fetching results.
  • csv: This module, also included in the standard library, helps you work with CSV (Comma-Separated Values) files. It offers methods for reading, writing, and manipulating CSV data.

Steps Involved:

  1. Establish Database Connection:

  2. Create Cursor Object:

  3. (Optional) Create Table:

  4. Read CSV File:

    • Open the CSV file in read mode ('r').
    • Use the csv.reader() function to create a reader object. This object helps you iterate through the rows of the CSV file in an organized way.
  5. Process CSV Data (Optional):

  6. Insert Data into Database:

  7. Commit Changes:

    • Use conn.commit() to save the changes you've made to the database.
  8. Close Connections:

Here's an example code that incorporates these steps:

import sqlite3
import csv

# Connect to the database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Create table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS my_table (
                    column1 TEXT,
                    column2 INTEGER,
                    column3 REAL
                )''')

# Open the CSV file
with open('my_data.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)

    # Skip header row (if present)
    next(reader, None)  # Assuming the first row contains column names

    # Prepare INSERT statement
    insert_stmt = "INSERT INTO my_table VALUES (?, ?, ?)"

    # Insert data in bulk
    for row in reader:
        cursor.execute(insert_stmt, row)

# Commit changes
conn.commit()

# Close connections
conn.close()

print("CSV data imported successfully!")

Explanation of the Code:

  • The code imports sqlite3 and csv modules.
  • It connects to the database file my_database.db.
  • The CREATE TABLE statement defines the table structure.
  • The CSV file my_data.csv is opened in read mode.
  • The next() function skips the header row if present.
  • The INSERT statement is prepared with placeholders.
  • The executemany() method efficiently inserts multiple rows at once.
  • The connection is committed, and resources are closed.

Remember to replace my_database.db, my_data.csv, and the table structure (my_table) with your specific file names and table definition.



Here's an example code that incorporates different data types and error handling:

import sqlite3
import csv

# Connect to the database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Create table (if it doesn't exist)
cursor.execute('''CREATE TABLE IF NOT EXISTS my_table (
                    id INTEGER PRIMARY KEY,
                    name TEXT,
                    age INTEGER,
                    price REAL
                )''')

# Open the CSV file
try:
    with open('my_data.csv', 'r') as csvfile:
        reader = csv.reader(csvfile)

        # Skip header row (if present)
        next(reader, None)  # Assuming the first row contains column names

        # Prepare INSERT statement
        insert_stmt = "INSERT INTO my_table VALUES (?, ?, ?, ?)"

        # Insert data in bulk with error handling
        for row in reader:
            try:
                # Convert data types if necessary (e.g., convert string to integer)
                row[2] = int(row[2])  # Assuming age is in the third column (index 2)
                row[3] = float(row[3])  # Assuming price is in the fourth column (index 3)
                cursor.execute(insert_stmt, row)
            except ValueError:  # Handle potential conversion errors
                print(f"Error converting data in row: {row}")

# Handle file opening errors
except FileNotFoundError:
    print("Error: CSV file not found!")

# Commit changes
conn.commit()

# Close connections
conn.close()

print("CSV data imported (with potential errors handled).")

Explanation of Enhancements:

  • Error Handling: The code includes a try-except block to catch potential ValueError exceptions during data type conversion (e.g., converting a non-numeric string to an integer). This prevents the entire import process from failing due to a single invalid row.
  • Data Type Conversion: The code demonstrates basic data type conversion using int() and float() for the age and price columns, assuming they are originally strings in the CSV file.
  • Informative Messages: The code provides informative messages for both successful import and potential errors encountered during the import process.

Remember to adjust the data type conversions and error handling based on the actual data types in your CSV file.



Here are some alternate methods for importing a CSV file into a SQLite3 database table using Python:

import pandas as pd
import sqlite3

# Connect to the database
conn = sqlite3.connect('my_database.db')

# Read CSV data into a DataFrame
df = pd.read_csv('my_data.csv')

# Write DataFrame to an existing table (or create a new one)
df.to_sql('my_table', conn, if_exists='append', index=False)

# Close connections
conn.close()

print("CSV data imported using Pandas.")

This approach simplifies the process:

  • pd.read_csv() reads the CSV file into a DataFrame, a structured data container.
  • DataFrame.to_sql() writes the DataFrame directly to the database table. You can specify if_exists='append' to add data to an existing table or 'replace' to overwrite it.

Using executemany with Type Conversion:

  • You can leverage the existing executemany() method in the sqlite3 module but incorporate data type conversion during insertion:
import sqlite3
import csv

# Connect to the database
conn = sqlite3.connect('my_database.db')
cursor = conn.cursor()

# Create table (if it doesn't exist)
# ... (same as previous examples)

# Open the CSV file
with open('my_data.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)

    # Skip header row (if present)
    next(reader, None)  # Assuming the first row contains column names

    # Prepare INSERT statement
    insert_stmt = "INSERT INTO my_table VALUES (?, ?, ?, ?)"

    # Insert data with type conversion
    for row in reader:
        # Convert data types based on column definitions
        row[1] = int(row[1])  # Assuming column 1 needs integer conversion
        row[3] = float(row[3])  # Assuming column 3 needs float conversion
        cursor.execute(insert_stmt, row)

# Commit changes, close connections
# ... (same as previous examples)

print("CSV data imported with type conversion.")

This approach offers more control over data type conversion compared to Pandas (which might infer types automatically).

Using a Context Manager:

  • You can use a context manager with the sqlite3.connect() function to ensure proper connection closing:
import sqlite3
import csv

with sqlite3.connect('my_database.db') as conn:
    cursor = conn.cursor()

    # ... (rest of the code using cursor object)

print("CSV data imported using context manager.")

This approach promotes cleaner code by automatically closing the connection when the with block exits.

Choose the method that best suits your project's requirements and coding style. Remember to adapt the code examples to your specific table structure and data types.


python database csv

Keeping Your Python Code Clean: When Should Imports Be at the Top?

Benefits of Placing Imports at the Top:Clarity: It provides a clear overview of all dependencies upfront, making the code easier to understand and maintain...


Troubleshooting "CUDA initialization: CUDA unknown error" in PyTorch

Error Breakdown:CUDA initialization: This part indicates that PyTorch is attempting to initialize its connection with the NVIDIA CUDA toolkit...