DBZ-1503 Update PostgreSQL connector documentation

* Use current versus 9.6 (version specific) documentation links
* Rephrased a section to have better readability
This commit is contained in:
Chris Cranford 2019-09-24 18:17:28 -04:00 committed by Jiri Pechanec
parent dc3f07d9d2
commit 9be1ec3d6b

View File

@ -17,7 +17,7 @@ The connector works with Postgres 9.6, 10 and 11 (support for the latter has bee
[[overview]]
== Overview
PostgreSQL's https://www.postgresql.org/docs/9.6/static/logicaldecoding-explanation.html[_logical decoding_] feature was first introduced in version 9.4 and is a mechanism which allows the extraction of the changes which were committed to the transaction log and the processing of these changes in a user-friendly manner via the help of an https://www.postgresql.org/docs/9.6/static/logicaldecoding-output-plugin.html[_output plugin_]. This output plugin must be installed prior to running the PostgreSQL server and enabled together with a replication slot in order for clients to be able to consume the changes.
PostgreSQL's https://www.postgresql.org/docs/current/static/logicaldecoding-explanation.html[_logical decoding_] feature was first introduced in version 9.4 and is a mechanism which allows the extraction of the changes which were committed to the transaction log and the processing of these changes in a user-friendly manner via the help of an https://www.postgresql.org/docs/current/static/logicaldecoding-output-plugin.html[_output plugin_]. This output plugin must be installed prior to running the PostgreSQL server and enabled together with a replication slot in order for clients to be able to consume the changes.
Debezium's PostgreSQL connector contains two different parts which work together in order to be able to read and process server changes:
@ -27,7 +27,7 @@ Debezium's PostgreSQL connector contains two different parts which work together
** pgoutput, the standard logical decoding plug-in in PostgreSQL 10+ (maintained by the Postgres community, used by Postgres itself for https://www.postgresql.org/docs/current/logical-replication-architecture.html[logical replication]);
this plug-in is always present, meaning that no additional libraries must be installed,
and the Debezium connector will interpret the raw replication event stream into change events directly.
* Java code (the actual Kafka Connect connector) which reads the changes produced by the chosen plugin, using PostgreSQL's https://www.postgresql.org/docs/9.6/static/logicaldecoding-walsender.html[_streaming replication protocol_], via the PostgreSQL https://github.com/pgjdbc/pgjdbc[_JDBC driver_]
* Java code (the actual Kafka Connect connector) which reads the changes produced by the chosen plugin, using PostgreSQL's https://www.postgresql.org/docs/current/static/logicaldecoding-walsender.html[_streaming replication protocol_], via the PostgreSQL https://github.com/pgjdbc/pgjdbc[_JDBC driver_]
The connector then produces a _change event_ for every row-level insert, update, and delete operation that was received, recording all the change events for each table in a separate Kafka topic. Your client applications read the Kafka topics that correspond to the database tables they're interested in following, and react to every row-level event it sees in those topics.
@ -105,7 +105,7 @@ As of Debezium 0.10, the connector supports PostgreSQL 10+ logical replication s
This means that a logical decoding output plug-in is no longer necessary and changes can be emitted directly from the replication stream by the connector.
====
As of PostgreSQL 9.4, the only way to read changes to the write-ahead-log is to first install a logical decoding output plugin. Plugins are written in C, compiled, and installed on the machine which runs the PostgreSQL server. Plugins use a number of PostgreSQL specific APIs, as described by the https://www.postgresql.org/docs/9.6/static/logicaldecoding-output-plugin.html[_PostgreSQL documentation_].
As of PostgreSQL 9.4, the only way to read changes to the write-ahead-log is to first install a logical decoding output plugin. Plugins are written in C, compiled, and installed on the machine which runs the PostgreSQL server. Plugins use a number of PostgreSQL specific APIs, as described by the https://www.postgresql.org/docs/current/static/logicaldecoding-output-plugin.html[_PostgreSQL documentation_].
Debezium's PostgreSQL connector works with one of Debezium's supported logical decoding plugin to encode the changes in either https://github.com/google/protobuf[_Protobuf format_] or http://www.json.org/[_JSON_] format.
See the documentation of your chosen plugin (https://github.com/debezium/postgres-decoderbufs/blob/master/README.md[_protobuf_], https://github.com/eulerto/wal2json/blob/master/README.md[_wal2json_]) to learn more about the plugin's requirements, limitations, and how to compile it.
@ -173,7 +173,7 @@ We also recommend to set parameter `wal_keep_segments = 0`. Please follow Postgr
[TIP]
====
We strongly recommend reading and understanding https://www.postgresql.org/docs/9.6/static/wal-configuration.html[the official documentation] regarding the mechanics and configuration of the PostgreSQL write-ahead log
We strongly recommend reading and understanding https://www.postgresql.org/docs/current/static/wal-configuration.html[the official documentation] regarding the mechanics and configuration of the PostgreSQL write-ahead log
====
[[PostgreSQL-permissions]]
@ -208,7 +208,7 @@ host replication <youruser> ::1/128 trust //<3>
[TIP]
====
See https://www.postgresql.org/docs/9.6/static/datatype-net-types.html[_the PostgreSQL documentation_] for more information on network masks.
See https://www.postgresql.org/docs/current/static/datatype-net-types.html[_the PostgreSQL documentation_] for more information on network masks.
====
[[supported-PostgreSQL-topologies]]
@ -226,7 +226,7 @@ As mentioned link:#limitations[in the beginning], PostgreSQL 9.6 only supports l
Most PostgreSQL servers are configured to not retain the complete history of the database in the WAL segments, so the PostgreSQL connector would be unable to see the entire history of the database by simply reading the WAL. So, by default the connector will upon first startup perform an initial _consistent snapshot_ of the database. Each snapshot consists of the following steps (when using the builtin snapshot modes, *custom* snapshot modes may override this):
1. Start a transaction with a https://www.postgresql.org/docs/9.6/static/sql-set-transaction.html[SERIALIZABLE, READ ONLY, DEFERRABLE] isolation level to ensure that all subsequent reads within this transaction are done against a single consistent version of the data. Any changes to the data due to subsequent `INSERT`, `UPDATE`, and `DELETE` operations by other clients will not be visible to this transaction.
1. Start a transaction with a https://www.postgresql.org/docs/current/static/sql-set-transaction.html[SERIALIZABLE, READ ONLY, DEFERRABLE] isolation level to ensure that all subsequent reads within this transaction are done against a single consistent version of the data. Any changes to the data due to subsequent `INSERT`, `UPDATE`, and `DELETE` operations by other clients will not be visible to this transaction.
2. Obtain a `SHARE UPDATE EXCLUSIVE MODE` lock on each of the monitored tables to ensure that no structural changes can occur to any of the tables while the snapshot is taking place. Note that these locks do not prevent table `INSERTS`, `UPDATES` and `DELETES` from taking place during the operation. _This step is omitted when using the exported snapshot mode to allow for a lock-free snapshots_.
3. Read the current position in the server's transaction log.
4. Scan all of the database tables and schemas, and generate a `READ` event for each row and write that event to the appropriate table-specific Kafka topic.
@ -337,7 +337,7 @@ All of the builtin snapshot modes are implemented in terms of this interface as
[[streaming-changes]]
=== Streaming Changes
The PostgreSQL connector will typically spend the vast majority of its time streaming changes from the PostgreSQL server to which it is connected. This mechanism relies on https://www.postgresql.org/docs/9.6/static/protocol-replication.html[_PostgreSQL's replication protocol_] where the client can receive changes from the server as they are committed in the server's transaction log at certain positions (also known as `Log Sequence Numbers` or in short LSNs).
The PostgreSQL connector will typically spend the vast majority of its time streaming changes from the PostgreSQL server to which it is connected. This mechanism relies on https://www.postgresql.org/docs/current/static/protocol-replication.html[_PostgreSQL's replication protocol_] where the client can receive changes from the server as they are committed in the server's transaction log at certain positions (also known as `Log Sequence Numbers` or in short LSNs).
Whenever the server commits a transaction, a separate server process invokes a callback function from the link:#output-plugin[logical decoding plugin]. This function processes the changes from the transaction, converts them to a specific format (Protobuf or JSON in the case of Debezium plugin) and writes them on an output stream which can then be consumed by clients.
@ -375,7 +375,7 @@ See link:#setting-up-PostgreSQL[Setting up PostgreSQL] for more details.
[[topic-names]]
=== Topics Names
The PostgreSQL connector writes events for all insert, update, and delete operations on a single table to a single Kafka topic. The name of the Kafka topics takes by default the form _serverName_._schemaName_._tableName_, where _serverName_ is the logical name of the connector as specified with the `database.server.name` configuration property, _schemaName_ is the name of the database schema where the operation occurred, and _tableName_ is the name of the database table on which the operation occurred.
The PostgreSQL connector writes events for all insert, update, and delete operations on a single table to a single Kafka topic. By default, the Kafka topic name is _serverName_._schemaName_._tableName_ where _serverName_ is the logical name of the connector as specified with the `database.server.name` configuration property, _schemaName_ is the name of the database schema where the operation occurred, and _tableName_ is the name of the database table on which the operation occurred.
For example, consider a PostgreSQL installation with a `postgres` database and an `inventory` schema that contains four tables: `products`, `products_on_hand`, `customers`, and `orders`. If the connector monitoring this database were given a logical server name of `fulfillment`, then the connector would produce events on these four Kafka topics:
@ -413,7 +413,7 @@ The PostgreSQL connector uses only 1 Kafka Connect _partition_ and it places the
The `sourceOffset` portion of the message contains information about the location of the server where the event occurred:
* `lsn` represents the PostgreSQL https://www.postgresql.org/docs/9.6/static/datatype-pg-lsn.html[_log sequence number_] or `offset` in the transaction log
* `lsn` represents the PostgreSQL https://www.postgresql.org/docs/current/static/datatype-pg-lsn.html[_log sequence number_] or `offset` in the transaction log
* `txId` represents the identifier of the server transaction which caused the event
* `ts_usec` represents the number of microseconds since Unix Epoch as the server time at which the transaction was committed
@ -516,7 +516,7 @@ And of course, the _schema_ portion of the event message's value contains a sche
[[replica-identity]]
===== Replica Identity
https://www.postgresql.org/docs/9.6/static/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY[REPLICA IDENTITY] is a PostgreSQL specific table-level setting which determines the amount of information that is available to `logical decoding` in case of `UPDATE` and `DELETE` events. More specifically, this controls what (if any) information is available regarding the previous values of the table columns involved, whenever one of the aforementioned events occur.
https://www.postgresql.org/docs/current/static/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY[REPLICA IDENTITY] is a PostgreSQL specific table-level setting which determines the amount of information that is available to `logical decoding` in case of `UPDATE` and `DELETE` events. More specifically, this controls what (if any) information is available regarding the previous values of the table columns involved, whenever one of the aforementioned events occur.
There are 4 possible values for `REPLICA IDENTITY`:
@ -1506,27 +1506,27 @@ Debezium will instead use the publication as defined.
|`database.sslmode`
|`disable`
|Whether to use an encrypted connection to the PostgreSQL server. Options include: *disable* (the default) to use an unencrypted connection ; *require* to use a secure (encrypted) connection, and fail if one cannot be established; *verify-ca* like `require` but additionally verify the server TLS certificate against the configured Certificate Authority (CA) certificates, or fail if no valid matching CA certificates are found; *verify-full* like `verify-ca` but additionally verify that the server certificate matches the host to which the connection is attempted. See https://www.postgresql.org/docs/9.6/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|Whether to use an encrypted connection to the PostgreSQL server. Options include: *disable* (the default) to use an unencrypted connection ; *require* to use a secure (encrypted) connection, and fail if one cannot be established; *verify-ca* like `require` but additionally verify the server TLS certificate against the configured Certificate Authority (CA) certificates, or fail if no valid matching CA certificates are found; *verify-full* like `verify-ca` but additionally verify that the server certificate matches the host to which the connection is attempted. See https://www.postgresql.org/docs/current/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|`database.sslcert`
|
|The path to the file containing the SSL Certificate for the client. See https://www.postgresql.org/docs/9.6/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|The path to the file containing the SSL Certificate for the client. See https://www.postgresql.org/docs/current/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|`database.sslkey`
|
|The path to the file containing the SSL private key of the client. See https://www.postgresql.org/docs/9.6/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|The path to the file containing the SSL private key of the client. See https://www.postgresql.org/docs/current/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|`database.sslpassword`
|
|The password to access the client private key from the file specified by `database.sslkey`. See https://www.postgresql.org/docs/9.6/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|The password to access the client private key from the file specified by `database.sslkey`. See https://www.postgresql.org/docs/current/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|`database.sslrootcert`
|
|The path to the file containing the root certificate(s) against which the server is validated. See https://www.postgresql.org/docs/9.6/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|The path to the file containing the root certificate(s) against which the server is validated. See https://www.postgresql.org/docs/current/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|`database.tcpKeepAlive`
|
|Enable TCP keep-alive probe to verify that database connection is still alive. (enabled by default). See https://www.postgresql.org/docs/9.6/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|Enable TCP keep-alive probe to verify that database connection is still alive. (enabled by default). See https://www.postgresql.org/docs/current/static/libpq-connect.html[the PostgreSQL documentation] for more information.
|`tombstones.on.delete`
|`true`