Postgres:30亿行数据的大表自增主键用完了怎么办?- Option 3: 添加一个新的bigint类型的字段(英文版)

151 阅读1分钟

Option 3: Add a new column of type bigint

Overview of the process

Add a new column to the table with type bigint. Copy over id values to id_new and then rename the new column to id (by renaming the existing id to id_old).

Steps:

  • Add a new nullable column id_new.
  • Script that runs an update to id_new with values from id in chunks of 1,000,000.
  • Once in maintenance mode, drop primary key constraint, foreign keys constraints and sequence on id.
  • Rename id and id_new.
  • Add back primary key constraint.
  • Add back foreign key constraint.

Details of the process

Environment

Operating SystemDisk typeAmount of dataBackground timeMaintenance time
MAC Big SurSSD0.5 billion3h30m
  1. Record min id value and max id value. Example: latest negative max id value is -480383918

    select max(id) as latest_negative_max_id 
    from trade_logs
    where id < 0; --latest negative max id value
    
  2. Add new nullable bigint column without a default specified for id column.

    -- 1s
    ALTER TABLE trade_logs ADD COLUMN id_new BIGINT NULL;
    
  3. Add new nullable bigint column without a default specified for allocation_log_id column.

    -- 1s
    ALTER TABLE trade_logs ADD COLUMN allocation_log_id_new BIGINT NULL;
    
  4. Perform VACUUM command to reclaim dead tuples before coping data from id to id_new

    VACUUM (VERBOSE, ANALYZE) trade_logs;
    
  5. Copy data from id to id_new in background mode. The reason we chose increments of 1,000,000 is mainly because it performed better in terms of incremental speed, disk utilization and cleanup between transactions(Need to be verified in AWS env again). Record the max id value printed in the log

    -- 1s
    DROP PROCEDURE if exists syn_data(int8);
    -- 1s
    create or replace PROCEDURE syn_data(in_latest_negative_max_id int8)
    as
    $$
    DECLARE
        counter int8;
        min_id int8;
        max_id int8;
        step int8 := 100000;
    begin
        select min(id) into min_id from trade_logs;
        select max(id) into max_id from trade_logs;
        RAISE  NOTICE 'begin time is: %',CURRENT_TIMESTAMP;
        RAISE NOTICE 'latest negative max id value is: %',in_latest_negative_max_id;
        RAISE NOTICE 'min id value is: %',min_id;
        RAISE NOTICE 'max id value is: %',max_id;
    
        counter:=min_id;
        while counter <= in_latest_negative_max_id
            loop
                begin
                    update trade_logs
                    set id_new=id,
                        allocation_log_id_new=allocation_log_id
                    where id < 0
                      and id >= counter
                      and id < counter + step;
                    commit;
                end;
                counter := counter + step;
                RAISE NOTICE 'current datetime: % counter: %',CURRENT_TIMESTAMP,counter;
            end loop;
    
        counter:=0;
        while counter <= max_id
            loop
                begin
                    update trade_logs
                    set id_new=id,
                        allocation_log_id_new=allocation_log_id
                    where id >= 0
                      and id >= counter
                      and id < counter + step;
                    commit;
                end;
                counter := counter + step;
                RAISE NOTICE 'current datetime: % counter: %',CURRENT_TIMESTAMP,counter;
            end loop;
         RAISE NOTICE 'end time is: %',CURRENT_TIMESTAMP;
    END
    $$
        language plpgsql;
    -- 3h 3m 8s 234ms
    call syn_data(-480383918);
    
  6. Once the background migration completed, we can start maintenance mode.

  • Make sure there are no queries hitting the DB.

    SELECT * FROM pg_stat_activity 
    WHERE datname = '<db>' AND state = 'active' LIMIT 10;
    
  • Run updates that id value less than latest min id value.

    -- 1s
    UPDATE trade_logs
    SET id_new = id, allocation_log_id_new = allocation_log_id
    WHERE id >= -2047483648 and id<0;
    
  • Checking for not null values on the column id_new before dropping constraints on id, in order to avoid some data may not be copied successfully

    -- 1s
    DROP PROCEDURE if exists check_null_and_fixed();
    create or replace PROCEDURE check_null_and_fixed()
    as
    $$
    DECLARE
        i RECORD;
    begin
        FOR i IN select * from trade_logs tl where tl.id_new is null
            loop
                begin
                    RAISE NOTICE '% id is for null id_new', i.id;
                    update trade_logs
                    set id_new=i.id,
                        allocation_log_id_new=i.allocation_log_id
                    where id = i.id;
                    commit;
                end;
            end loop;
    END
    $$
        language plpgsql;
    
    -- 4 m 59 s 118 ms
    call check_null_and_fixed();
    
  • Select and remove Foreign Keys from the id column

    -- 1s
    select tc.constraint_name,
           tc.table_name,
           kcu.column_name,
           ccu.table_name  as foreign_table_name,
           ccu.column_name as foreign_column_name
    from information_schema.table_constraints as tc
             join information_schema.key_column_usage as kcu
                  on
                      tc.constraint_name = kcu.constraint_name
             join information_schema.constraint_column_usage as ccu
                  on
                      ccu.constraint_name = tc.constraint_name
    where constraint_type = 'FOREIGN KEY'
      and ccu.table_name = 'trade_logs';
    
    -- 1s
    alter table trade_logs  drop constraint <constraint_name>;
    
  • Remove Primary Key constraint on id.

    -- 1s
    ALTER TABLE trade_logs DROP CONSTRAINT trade_logs_pkey;
    
  • Rename column id to id_old.

    -- 1s
    ALTER TABLE trade_logs RENAME COLUMN id TO id_old;
    
  • Rename column id_new to id.

    -- 1s
    ALTER TABLE trade_logs RENAME COLUMN id_new TO id;
    
  • Rename column allocation_log_id to allocation_log_id_old.

    -- 1s
    ALTER TABLE trade_logs RENAME COLUMN allocation_log_id TO allocation_log_id_old;
    
  • Rename column allocation_line_new to allocation_log_id.

    -- 1s
    ALTER TABLE trade_logs RENAME COLUMN allocation_log_id_new TO allocation_log_id;
    
  • Add Primary Key constraint to id .

    -- 10m 15s 861ms
    ALTER TABLE trade_logs ADD PRIMARY KEY (id);
    
  • Add Foreign Key constraints back to id .

    -- 7m 42s 986ms
    ALTER TABLE trade_logs ADD CONSTRAINT <constraint_name> FOREIGN KEY (allocation_log_id) REFERENCES trade_logs (id);
    
  • Add back the sequence to id.

    -- 1s
    ALTER TABLE trade_logs ALTER COLUMN id_old drop default;
    -- 1s
    ALTER TABLE trade_logs ALTER COLUMN id SET DEFAULT NEXTVAL('trade_logs_id_seq'::regclass);
    -- 1s
    alter sequence trade_logs_id_seq owned by trade_logs.id;
    
  • Drop not null constraint on id_old

    -- 1s
    ALTER TABLE trade_logs ALTER COLUMN id_old DROP NOT NULL;
    
  • Validate the data looks good by running automated tests with fixed input and expected output.

  • Drop column id_old

    -- 1s
    ALTER TABLE trade_logs DROP COLUMN id_old;
    
  • Drop column allocation_log_id_old

    -- 1s
    ALTER TABLE trade_logs  DROP COLUMN allocation_log_id_old;
    
  • Drop procedures that are no longer used

    -- 1s
    DROP PROCEDURE if exists syn_data(int8, int8);
    -- 1s
    DROP procedure if exists check_null_and_fixed();
    

Note

  1. Back up the schema for the table before performing the changes.