+9

Implementing Bulk Inserts with UNIQUE Constraints Using Node.js and PostgreSQL

I had the opportunity to research how to do a bulk insert of multiple records in a Node.js + PostgreSQL environment, even when there is a UNIQUE constraint on the target table, so I'm going to summarize it in a blog post.

Using PostgreSQL with Node.js

First, if you want to implement PostgreSQL client functionality in Node.js, I think the most commonly used npm library is pg (node-postgres):

image.png

Like this, create a connection pooling.

var PG = require( 'pg' );
var pg = new PG.pool( 'postgres://localhost:5432/mydb' );

This is an example of executing an SQL "INSERT" process and getting the result.

if( pg ){
  var conn = await pg.connect();                                    // Connection assignment
  if( conn ){
    try{
      var sql = "insert into table1( name, num ) values ( $1, $2 )";  // SQL
      var query = { text: sql, values: [ "yamaha", 100 ] );             // SQL parameters
      conn.query( query, function( err, result ){
        if( err ){
          console.log( err );                                           // SQL execution error
        }else{
          console.log( result );                                        // The result of executing an SQL query
        }
      });
    }catch( e ){
      console.log( e );                                               // An exception has occurred
    }finally{
      if( conn ){
        conn.release();                                               // Releasing the connection
      }
    }
  }
}

This time, I will introduce it using this pg.

Implement bulk insert with Node.js and PostgreSQL

In the example above, we showed an example of a process to insert one record with one SQL execution. You can also execute this multiple times to insert multiple records, but it is convenient if you can efficiently insert them when the number of records is large. This process of "creating multiple records at once" is called "bulk insert".

There are several ways to implement bulk insert with Node.js and PostgreSQL, but this time we will introduce an example using the npm library node-pg-format.

image.png

When using it, prepare bulk insert data in an array and implement it as follows:

var format = require( 'pg-format' );

var records = [        
  [ "yamaha", 100 ],
  [ "suzuki", 101 ],
  [ "yamaha", 102 ]
];

if( pg ){
  var conn = await pg.connect();                                    // Connection assignment
  if( conn ){
    try{
      var sql = format( "insert into table1( name, num ) values %L", records );  // SQL
      var query = { text: sql, values: [] );                          // SQL parameters
      conn.query( query, function( err, result ){
        if( err ){
          console.log( err );                                           // SQL SQL execution error
        }else{
          console.log( result );                                        // The result of executing an SQL query
        }
      });
    }catch( e ){
      console.log( e );                                               // An exception has occurred
    }finally{
      if( conn ){
        conn.release();                                                // Releasing the connection
      }
    }
  }
}

We are instantiating the entire SQL statement in pg-format, and attaching array data as multiple records to be inserted (the parameter for conn.query() execution is empty). This allows us to insert multiple records specified in the records array variable with one SQL execution.

Bulk inserting into Node.js and PostgreSQL with a UNIQUE constraint

Now we will move on to the main topic of this blog entry. We have already introduced the method of bulk insert. The problem is that this does not always work. Specifically, when the target table has a UNIQUE constraint and some of the records in the bulk insert are affected by this constraint.

I will give a specific example. For example, I am introducing an example of bulk inserting multiple records into a table called table1, but it is not possible to exclude the possibility that the following specifications were given when creating the table1 table:

create table table1( name varchar(50) default '', num int default 0 );
alter table table1 add unique( name, num );

The first line "create table..." is fine, but the problem is the second line. By giving this constraint, it has been specified that the combination of name and num must be unique. It's okay for name to be repeated and num to be repeated, but the combination of name and num must be unique (data that violates this condition cannot be inserted).

Under the assumption that there is a UNIQUE constraint, let's look at the example again. The records that are trying to be bulk inserted in this example,

var records = [         // Prepare an array of records to insert.
  [ "yamaha", 100 ],
  [ "suzuki", 101 ],
  [ "yamaha", 102 ]
];

The content was (the first element of the array is name and the second is num and it will be bulk inserted). Looking at this example alone (since the combination of name and num is all different), it seems that bulk insertion can be done without any problems. ... But what if the data {name: "yamaha", num: 100} was already recorded before the bulk insertion was executed? The first record of the bulk insertion will be subject to the UNIQUE constraint, which means it will result in an execution error.

In addition, there is a complication that makes this problem more complicated. That is the question of whether or not records that did not fail the UNIQUE constraint should be inserted when bulk insert fails due to UNIQUE constraint error. The answer to this is probably case by case, depending on the data and application. If the records that must be inserted together have no meaning if not inserted together, then it is correct for the bulk insert to fail and none of the records to be inserted. However, if some of the records failed the UNIQUE constraint but the rest should be inserted, then only part of the bulk insert should fail and the rest should be inserted.

In the case of the bulk insert implementation introduced in the above example, it behaves like the former. In other words, "the bulk insert will only succeed if none of the records violate the constraints, and if one of the records violates the constraints, none of the records will be inserted and an error will occur" (strictly speaking, "bulk insert" means such behavior and processing).

Bulk inserting into Node.js and PostgreSQL with a UNIQUE constraint

After a long introduction, now that we understand the background, how can we implement a bulk insert so that records that violate the UNIQUE constraint are not inserted, but records that do not violate the constraint are inserted correctly? This is the key part of this blog entry.

If you are using PostgreSQL 9.5 or higher and Node.js + node-postgres, the following method can be used to achieve this, although it may vary depending on the RDB, library, and programming language you are actually using.

var records = [         // Prepare an array of records to insert.
  [ "yamaha", 100 ],
  [ "suzuki", 101 ],
  [ "yamaha", 102 ]
];

if( pg ){
  var conn = await pg.connect();                                      
  if( conn ){
    try{
      // Create an SQL that ignores data that violates the UNIQUE constraint.
      var sql = "insert into table1( name, num ) ";
      var selects = [];
      for( var i = 0; i < records.length; i ++ ){
        selects.push( "select '" + records[i][0] + "', " + records[i][1] );
      }
      sql += selects.join( " union all" ) + " on conflict ( name, num ) do nothing";

      var query = { text: sql, values: [] );                         
      conn.query( query, function( err, result ){
        if( err ){
          console.log( err );                                        
        }else{
          console.log( result );                                     
        }
      });
    }catch( e ){
      console.log( e );                                             
    }finally{
      if( conn ){
        conn.release();                                              
      }
    }
  }
}

In this example, we are using programming to generate and execute the following SQL statement:

insert into table1( name, num )
  select 'yamaha', 100 union all
  select 'suzuki', 101 union all
  select 'yamaha', 102
  on conflict ( name, num ) do nothing

I'm not doing a bulk insert exactly, but I'm specifying that in one SQL execution, multiple records can be inserted and if a "conflict" (UNIQUE constraint violation) occurs, it should be "ignored". As a result, only records without UNIQUE constraint violations are inserted together.

In my own tests, bulk insert worked as expected even when the number of records to be inserted at once became very large (in cases where the SQL statement itself becomes huge and complex). I was researching this because I was making an app that needs to do this kind of processing, so I hope this helps someone else.

Mình hy vọng bạn thích bài viết này và học thêm được điều gì đó mới.

Donate mình một ly cafe hoặc 1 cây bút bi để mình có thêm động lực cho ra nhiều bài viết hay và chất lượng hơn trong tương lai nhé. À mà nếu bạn có bất kỳ câu hỏi nào thì đừng ngại comment hoặc liên hệ mình qua: Zalo - 0374226770 hoặc Facebook. Mình xin cảm ơn.

Momo: NGUYỄN ANH TUẤN - 0374226770

TPBank: NGUYỄN ANH TUẤN - 0374226770 (hoặc 01681423001)

image.png


All Rights Reserved

Viblo
Let's register a Viblo Account to get more interesting posts.