Posted 05/28/2025

Bulk insert optimalization

Optimize bulk insert of json data using temporary table.

Before Optimalization

CREATE OR REPLACE FUNCTION bulk_insert(
p_data json
)
RETURNS INT[] 
AS $$
DECLARE
  ids INT[];
BEGIN
  WITH inserted_users AS (
    INSERT INTO user (
      name,
      street,
      postal_code,
      city,
      created,
      country
    )
    SELECT 
      (j.p_data->>'Name')::VARCHAR(150),
      (j.p_data->>'Street')::VARCHAR(100),
      (j.p_data->>'PostalCode')::VARCHAR(10),
      (j.p_data->>'City')::VARCHAR(100),
      (j.p_data->>'Created')::TIMESTAMP,
      (j.p_data->>'Country')::SMALLINT
    FROM json_array_elements(p_data) as j(p_data)
    RETURNING id
  )
  SELECT array_agg(id) INTO ids FROM inserted_users;
  
  RETURN ids;
END;
$$ LANGUAGE plpgsql;

After Optimalization

CREATE OR REPLACE FUNCTION bulk_insert(
  p_data json
)
RETURNS INT[]
AS $$
DECLARE
  ids INT[];
BEGIN
  -- Step 1: Create a temporary table
  CREATE TEMP TABLE temp_user (
    name VARCHAR(150),
    street VARCHAR(100),
    postal_code VARCHAR(10),
    city VARCHAR(100),
    created TIMESTAMP,
    country SMALLINT
  );

  -- Step 2: Insert data into the temporary table
  INSERT INTO temp_user (
    name,
    street,
    postal_code,
    city,
    created,
    country
  )
  SELECT 
    (j.p_data->>'Name')::VARCHAR(150),
    (j.p_data->>'Street')::VARCHAR(100),
    (j.p_data->>'PostalCode')::VARCHAR(10),
    (j.p_data->>'City')::VARCHAR(100),
    (j.p_data->>'Created')::TIMESTAMP,
    (j.p_data->>'Country')::SMALLINT
  FROM json_array_elements(p_data) AS j(p_data);

  -- Step 3: Bulk insert into the main table
  WITH inserted_users AS (
    INSERT INTO user (
      name,
      street,
      postal_code,
      city,
      created,
      country
    )
    SELECT * FROM temp_user
    RETURNING id
  )
  SELECT array_agg(id) INTO ids FROM inserted_users;

  -- Step 4: Drop the temporary table
  DROP TABLE temp_user;

  RETURN ids;
END;
$$ LANGUAGE plpgsql;

Why Temporary Tables Improve Performance

  1. What Happens in not optimized Function

    In not optimized function, you’re directly inserting rows into the user table using a WITH clause and INSERT INTO … SELECT. This works fine, but PostgreSQL has to do extra work for each row being inserted:

    • Maintain indexes: Every row inserted into the user table updates the indexes on that table.
    • Check constraints: PostgreSQL checks constraints (e.g., foreign keys, unique constraints) for every row.
    • Write-Ahead Logging (WAL): PostgreSQL logs every change to the user table to ensure durability (so it can recover if something goes wrong).

    This overhead happens for every row, which can slow things down when inserting a large number of rows.

  2. What Happens with a Temporary Table

    When you use a temporary table, the initial insert goes into the temporary table instead of the user table. Temporary tables are special because:

    • No Index Maintenance: Temporary tables usually don’t have indexes unless you explicitly create them. This means PostgreSQL doesn’t have to update indexes during the initial insert, making it faster.
    • No Constraints: Temporary tables don’t have constraints (like foreign keys or unique constraints), so PostgreSQL skips those checks during the initial insert.
    • No WAL Logging: Temporary tables are not logged in the Write-Ahead Log (WAL). This reduces disk I/O and makes the insert faster.

    After all the rows are inserted into the temporary table, you can bulk insert them into the user table in one operation. This reduces the overhead compared to inserting rows one by one.

Why Bulk Insert from Temporary Table is Faster

Key Benefits of Temporary Tables

When to Use Temporary Tables

Temporary tables are especially useful when: