透過 pg_upgrade 快速升級 Postgres

最近在研究如何快速對 Postgres 進行升版,之前有聽過透過 Logical Replication 的方式來進行升版,但後來發現 Postgres 官方有推出 pg_upgrade 工具可以 in place 的快速升級 Postgres,並且保證盡可能向下相容,且支援跨主版本的升級。

因此這篇文章就來講講 pg_upgrade 這個工具怎麼使用。

pg_upgrade 介紹

因為在 Postgres 的版本中,通常會加一些新功能而會使得 system tables 的格式有改變,但通常底層的 data storage format 不太會改變,因此 pg_upgrade 才可以透過 create 新的 system tables,並且復用舊的 pg data 來達到快速升級 Postgres 版本的效果。

如同官方文件這段話:

Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files. If a future major release ever changes the data storage format in a way that makes the old data format unreadable, pg_upgrade will not be usable for such upgrades. (The community will attempt to avoid such situations.)

自己是有實驗從 9.6 升版到 11 沒遇到什麼錯誤,所以我想 pg_upgrade 是可以嘗試看看的。

pg_upgrade demo

首先可以先灌多一點假資料:

1
2
3
CREATE TABLE t1 AS SELECT id AS a
FROM generate_series(1, 50000000) AS id;
CREATE TABLE t2 AS SELECT * FROM t1;

接著檢查目前 database 的 size 多少

1
SELECT pg_size_pretty(pg_database_size('postgres'));
  • postgres 代表 database name

接著透過 initdb 指令將新版本的 PG 先 init 新的 data 目錄出來。

1
initdb -D /Users/kenny/Library/ApplicationSupport/Postgres/var-11/

上面的 path 是如果你是在 mac 上且透過 Postgres.app 來安裝 Postgres 的話,預設路徑會長這樣。

接著建議實際跑 pg_upgrade 的時候先透過 --check 的指令檢查是否可以沒有錯誤的 upgrade 到新版本。

另外如果你想在 mac 上做實驗,同時你是透過 Postgres.app 來安裝 Postgres 的話,因為你要嘗試不同版本的 PG,那麼你就要先在你的機器上先安裝不同版本的 PG,可以透過 https://postgresapp.com/downloads.html

就可以讓 Postgres.app 提供多版本的 PG。

因此如果你要 run pg_upgrade 指令會變成如下:

1
Applications/Postgres.app/Contents/Versions/11/bin/pg_upgrade -d /Users/kenny/Library/ApplicationSupport/Postgres/var-9.6/ -D /Users/kenny/Library/ApplicationSupport/Postgres/var-11/ -b /Applications/Postgres.app/Contents/Versions/9.6/bin/ -B /Applications/Postgres.app/Contents/Versions/11/bin/ --check -U postgres
  1. 拿新版本的 pg_upgrade 去跑

  2. -d 選項指的是舊版本 PG 的 data 目錄位置

  3. -D 選項指的是新版本 PG 的 data 目錄位置

  4. -b 選項指的是舊版本的 PG 的執行目錄位置

  5. -B 選項指的是新版本的 PG 的執行目錄位置

  6. –check 是事先檢查是否可以 upgrade 過去而沒有錯誤,等於是 dry run 的概念。

  7. -U 選項指的是 PG 當初 initdb 的 username 是誰,另外如果你在 mac 上是用 Postgres.app 來安裝 PG,並且是借助他來幫你 initdb 的話,那通常會噴這個錯誤

    1
    database user "postgres" is not the install user

    因為 pg_upgrade 只允許 install user 存在 new cluster 上,因此 new cluster 不能有其他 defined user,因為怕會跟要將 migrate 的 old cluster user 有所衝突。

    而 Postgres.app 幫你 init db 的時候會特地幫你 create $USER role 及 $USER database 出來,所以解決方案有兩種:

    1. 自己跑 init db
    2. DROP DATABASE $USER; DROP ROLE $USER 將 database 跟 role 都砍掉就可以順利跑成功了。

最後要注意的如果是跑 --check 的話,要先把新版本的 PG 先 shutdown 才可以。

如果你是在 debian linux 上去跑的話,指令通常會變成這樣:

1
/usr/lib/postgresql/11/bin/pg_upgrade -d /var/lib/postgresql/9.6/main -D /var/lib/postgresql/11/main -b /usr/lib/postgresql/9.6/bin -B /usr/lib/postgresql/11/bin --check -U postgres

檢查的結果通常會長這樣:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for invalid "unknown" user columns ok
Checking for hash indexes ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok

*Clusters are compatible*

就是檢查新舊版本的是不是相容,才能安心的 upgrade 過去。

如果 check 都沒問題那就正式來跑 upgrade:

  1. 需要將舊版本與新版本的 PG 兩邊都 shutdown 才可以跑,沒錯你可能會說會有 downtime,但說實在如果要 zero downtime 真的要花費不少 cost 才能達到,但 pg_upgrade 就是確保可以快速的達到,才能快速的 downtime 的幾秒鐘就可以回復了。

  2. 另外可以採用 --link 的選項,這個作用是什麼呢?因為如果不採用 --link 的方式,pg_upgrade 會透過 copy binary data 的方式將舊資料搬到新資料夾下,但是這樣速度就慢了,downtime 也會變長。而 --link 是透過建立 hardlink 的方式去指向同一份資料,所以你可以想像 old cluster 跟 new cluster 都指向同一份資料,而不是複製一份新的,所以單純建立 hardlink 這個速度可以說是飛快的。

    但 hardlink 的缺點就是不能跨 disk 跟 file system。

  3. 如果不採用 --link 的話速度就會慢很多。

那就開始跑吧!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
time /usr/lib/postgresql/11/bin/pg_upgrade -d /var/lib/postgresql/9.6/main -D /var/lib/postgresql/11/main -b /usr/lib/postgresql/9.6/bin -B /usr/lib/postgresql/11/bin --link -U postgres
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for invalid "unknown" user columns ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_clog to new server ok
Setting oldest XID for new cluster ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
ok
Adding ".old" suffix to old global/pg_control ok

If you want to start the old cluster, you will need to remove
the ".old" suffix from /var/lib/postgresql/9.6/main/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.

Linking user relation files
ok
Setting next OID for new cluster ok
Sync data directory to disk ok
Creating script to analyze new cluster ok
Creating script to delete old cluster ok
Checking for hash indexes ok
Checking for extension updates ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
./delete_old_cluster.sh

real 0m5.443s
user 0m0.193s
sys 0m0.346s

用 pg_upgrade 跑的好處還有就是如果中途失敗,那不會對舊的資料有所影響,你只需要將新版本的 PG 重新 initdb 把資料清乾淨就可以重跑了。

如果你跑完還想要重啟 old PG 的話要先將 pg_control.old 的 old 副檔名去掉才可以

If you want to start the old cluster, you will need to remove
the “.old” suffix from /var/lib/postgresql/9.6/main/global/pg_control.old.
Because “link” mode was used, the old cluster cannot be safely
started once the new cluster has been started.

同時我用 time 指令來測試整個 pg_upgrade 的時間花了多少,這個資料量我用了 1 TB 資料量來測試,在 --link 模式下只需要花費 5 秒鐘這個速度是非常快的。

最後 pg_upgrade 會建立 ./analyze_new_cluster.sh 跟 ./delete_old_cluster.sh,當要啟動新的 PG 建立先跑這兩個 script,來看看裏面的內容

  1. analyze_new_cluster.sh

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    #!/bin/sh

    echo 'This script will generate minimal optimizer statistics rapidly'
    echo 'so your system is usable, and then gather statistics twice more'
    echo 'with increasing accuracy. When it is done, your system will'
    echo 'have the default level of optimizer statistics.'
    echo

    echo 'If you have used ALTER TABLE to modify the statistics target for'
    echo 'any tables, you might want to remove them and restore them after'
    echo 'running this script because they will delay fast statistics generation.'
    echo

    echo 'If you would like default statistics as quickly as possible, cancel'
    echo 'this script and run:'
    echo ' "/Applications/Postgres.app/Contents/Versions/11/bin/vacuumdb" -U postgres --all --analyze-only'
    echo

    "/Applications/Postgres.app/Contents/Versions/11/bin/vacuumdb" -U postgres --all --analyze-in-stages
    echo

    echo 'Done'

​ 簡單來說 statistics 的資料因為沒辦法也更新過來,所以會建議重新跑一次 vacuum db 的來重新產生 Statistics

Because optimizer statistics are not transferred by pg_upgrade, you will be instructed to run a command to regenerate that information at the end of the upgrade. You might need to set connection parameters to match your new cluster.

  1. delete_old_cluster.sh

    1
    2
    3
    #!/bin/sh

    rm -rf '/Users/kenny/Library/ApplicationSupport/Postgres/var-9.6'

    這個很單純就是將 old cluster data 刪掉,但是建議還是事先將舊資料備份一份,以防萬一會比較好。

    Once you are satisfied with the upgrade, you can delete the old cluster’s data directories by running the script mentioned when pg_upgrade completes. (Automatic deletion is not possible if you have user-defined tablespaces inside the old data directory.) You can also delete the old installation directories (e.g., bin, share).

另外要注意的是如果是採用 --link 方式的話,因為 old & new cluster 是共用同一份資料,如果將 old & new cluster 同時開啟是不安全的。

最後就是如果你不希望 pg_upgrade 後的新 cluster 去修改到 old cluster 的資料,那建議乾脆就是將 old cluster snapshot 一份,用 snapshot 的那份來做 pg_upgrade,因此原本的 cluster 還是完整不受污染的。

總結

今天算是筆記性質,紀錄了一下 pg_upgrade 的注意事項,以速度來說如果採用 --link 在 in place 下的 upgrade 是很方便的,如果你的 PG 並不是 primary-replica 的架構下,只有單台的話,那麼幾秒鐘的 downtime 就能迅速幫你 upgrade 到跨版本的 PG 是很方便的,建立搭配 snapshot 的方式去保留舊的 PG 是最安全的。

因此如果是 primary-replica 架構下要怎麼使用 pg_upgrade 呢?請等我下集待續。