All Of Options
Context
The issue is about the trade_logs.id column datatype (integer) is too small to insert more rows. The range of integer(signed int32) is from -2.1 billion to +2.1 billion, currently the positive numbers have ran out and the id is being generated with negative numbers. So we have to change the datatype from integer to bigint . Note that the trade_logs.allocation_log_id column is a circular reference back to trade_logs.id and will also need to be updated.
Scope
The solutions here are based on database layer, not application layer. Because we noted that Rails is not using all sequence numbers it requests, there might be other options from application layer to remediate this issue which is out of scope.
The trade_logs table schema:
create table trade_logs
(
id integer default nextval('trade_logs_id_seq'::regclass) not null
constraint trade_logs_pkey
primary key,
trade_id integer
constraint fk_rails_7a0ba7830b_trade_logs
references trades,
account_id integer
constraint fk_rails_f946cfeeba_trade_logs
references accounts,
amount bigint,
is_credit boolean,
tax_rate varchar,
description varchar(1000),
created_at timestamp,
updated_at timestamp,
type varchar not null,
allocation_log_id integer
constraint trade_logs_allocation_log_id_fk
references trade_logs
)
with (autovacuum_analyze_scale_factor = 0.0, autovacuum_analyze_threshold = 2000000);
create index index_trade_logs_on_account_id
on trade_logs (account_id);
create index index_trade_logs_on_account_id_amount
on trade_logs (account_id, amount);
create index index_trade_logs_on_account_id_trade_id
on trade_logs (account_id, trade_id);
create index index_trade_logs_on_allocation_log_id
on trade_logs (allocation_log_id);
create index index_trade_logs_on_trade_id
on trade_logs (trade_id);
alter sequence trade_logs_id_seq owned by trade_logs.id; --1s
Option 1: (Change the datatype of column directly)
Change the datatype by using ALTER COLUMN.
Risks:
- Acquires an access exclusive lock which can be problematic for big tables.
- Causes table rewrite where Postgres rewrites every single row of the table, which is a very expensive operation (both in terms of disk I/O and wall clock time).
Steps:
- Run alter table to change datatype from int to bigint
Option 2: (Create a new table like old table schema)
Create a new table with the same schema butid with type bigint and use INSERT INTO.
Risks:
- Bloating the disk usage on the database server.
- Bloating the CPU and memory usage on the database server during the copy data
Steps:
- Create a table with the same schema as old_table except for id and allocation_log_id and drop all Foreign Key constraints that reference id in the new table.
- Create a trigger, when the data of the source table is modified, it is used to synchronize the records of the source table to the new table in real time
- Copy data from the source data table to the new table.
- Once the background migration completed, we can start maintenance mode
- Drop trigger
- Rename the source data table to the old table, rename the new table to the source table name.
- Add back Foreign Key constraints to reference id.
- Add back the sequence to id.
- Drop function
- Drop old table
Option 3: (Add a new column of type bigint)
Add a new column to the table with type bigint. Copy over id values to id_new and then rename the new column to id (by renaming the existing id to id_old).
Risks:
- It takes about more than 3 hours of downtime for maintenance operations.
30 mins for 0.5 billion rows (Local with SSD)
> 3 hrs for 3 billion rows (Estimation for prod)
// The time has to be verified again in AWS env with prod cloned data
Steps:
- Add a new nullable column id_new.
- Script that runs an update to id_new with values from id in chunks of 1,000,000.
- Once in maintenance mode, drop primary key constraint, foreign keys constraints and sequence on id.
- Rename id and id_old .
- Add back primary key constraint.
- Add back foreign key constraint.
- Drop not null constraint on id_old (the previous primary key).
Option 4: (Add a new column of type bigint with zero downtime)
This option is similar to Option 3, but this one can perform with zero downtime.
- Add new column
- Copy data
- Update constraint
- Rename column
- Drop old column.
Recommendation
Let's compare the three options in terms of time complexity and space complexity
| Options | Background****Time Cost | Maintenance****Time Cost | Space Cost | Risk | Complexity |
|---|---|---|---|---|---|
| Option 1 | Low | High | Low | High | Low |
| Option 2 | High | Low | High | Low | High |
| Option 3 | MEDIUM | MEDIUM | Low | Low | MEDIUM |
| Option 4 | High | Low | Low | Low | High |
Option 4 is preferred because it’s almost no downtime cost, less disk space cost and less risk. We have completed the validation of Option 4 in local environment, with about 0.2 billion pieces of data. The details of Option 4 are as follows: