= Logical Decoding Output Plug-in Installation for PostgreSQL
include::_attributes.adoc[]
:toc:
:toc-placement: macro
:linkattrs:
:icons: font
:source-highlighter: highlight.js
toc::[]
This document describes the database setup required for streaming data changes out of https://www.postgresql.org/[PostgreSQL].
This comprises configuration applying to the database itself as well as the installation of the https://github.com/eulerto/wal2json[wal2json] logical decoding output plug-in.
The installation and the tests are performed at the following environment/configuration:
Similar steps need to be taken for other Postgres and OS versions and the Decoderbufs logical decoding plug-in which also is supported by Debezium.
[NOTE]
====
As of Debezium 0.10, the connector supports PostgreSQL 10+ logical replication streaming using _pgoutput_.
This means that a logical decoding output plug-in is no longer necessary and changes can be emitted directly from the replication stream by the connector.
====
[[logical-decoding-plugin-setup]]
== Logical Decoding Plug-ins
Logical decoding is the process of extracting all persistent changes to a database's tables into a coherent, easy to understand format
which can be interpreted without detailed knowledge of the database's internal state.
As of PostgreSQL 9.4, logical decoding is implemented by decoding the contents of the write-ahead log, which describe changes
on a storage level, into an application-specific form such as a stream of tuples or SQL statements.
In the context of logical replication, a slot represents a stream of changes that can be replayed to a client in the order
they were made on the origin server. Each slot streams a sequence of changes from a single database.
The output plug-ins transform the data from the write-ahead log's internal representation into the format the consumer
of a replication slot desires. Plug-ins are written in C, compiled, and installed on the machine which runs the PostgreSQL server,
and they use a number of PostgreSQL specific APIs, as described by the
to encode the changes in either https://github.com/google/protobuf[Protobuf] format or http://www.json.org/[JSON] format.
[TIP]
====
For simplicity, Debezium also provides a Docker image based on a vanilla https://github.com/debezium/docker-images/tree/master/postgres/9.6[PostgreSQL server image]
on top of which it compiles and installs the plug-ins.
====
[WARNING]
====
The Debezium logical decoding plug-ins have only been installed and tested on _Linux_ machines. For Windows and other platforms it may
require different installation steps
====
[discrete]
==== Differences between Plug-ins
The plug-ins' behaviour is not completely same for all cases. So far these differences have been identified
* wal2json plug-in is not able to process quoted identifiers (https://github.com/eulerto/wal2json/issues/35[issue])
* wal2json plug-in does not emit events for tables without primary keys
* wal2json plug-in does not support special values (`NaN` or `infinity`) for floating point types
All up-to-date differences are tracked in a test suite
<1> tells the server that it should load at startup the `wal2json` (use `decoderbufs` for https://github.com/google/protobuf[protobuf]) logical decoding plug-in(s)
(the names of the plug-ins are set in https://github.com/debezium/postgres-decoderbufs/blob/v{debezium-version}/Makefile[protobuf]
and https://github.com/eulerto/wal2json/blob/master/Makefile[wal2json] Makefiles)
<2> tells the server that it should use logical decoding with the write-ahead log
<3> tells the server that it should use a maximum of `4` separate processes for processing WAL changes
<4> tells the server that it should allow a maximum of `4` replication slots to be created for streaming WAL changes
Debezium needs a PostgreSQL's WAL to be kept during Debezium outages.
If your WAL retention is too small and outages too long, then Debezium will not be able to recover after restart as it will miss part of the data changes.
The usual indicator is an error similar to this thrown during the startup: `ERROR: requested WAL segment 000000010000000000000001 has already been removed`.
When this happens then it is necessary to re-execute the snapshot of the database.
We also recommend to set parameter `wal_keep_segments = 0`. Please follow PostgreSQL official documentation for fine-tuning of WAL retention.
[TIP]
====
We strongly recommend reading and understanding https://www.postgresql.org/docs/9.6/static/wal-configuration.html[the official documentation] regarding the mechanics and configuration of the PostgreSQL write-ahead log.
====
[discrete]
[[setting_replication_permissions]]
=== _Setting up replication permissions_
Replication can only be performed by a database user that has appropriate permissions and only for a configured number of hosts.
In order to give a user replication permissions, define a PostgreSQL role that has _at least_ the `REPLICATION` and `LOGIN` permissions.
For example:
[source,sql]
----
CREATE ROLE name REPLICATION LOGIN;
----
[TIP]
====
Superusers have by default both of the above roles.
====
Add the following lines at the end of the `pg_hba.conf` PostgreSQL configuration file, so as to configure the
https://www.postgresql.org/docs/9.6/static/auth-pg-hba-conf.html[client authentication] for the database replication.
The PostgreSQL server should allow replication to take place between the server machine and the host on which the
Debezium PostgreSQL connector is running.
Note that the authentication refers to the database superuser `postgres`. You may change this accordingly,
if some other user with `REPLICATION` and `LOGIN` permissions has been created.
that controls PostgreSQL logical decoding streams.
Before starting make sure that you have logged in as a user with database replication permissions, as configured at a link:#setting_replication_permissions[previous step].
Otherwise, the slot creation and streaming fails with the following error message:
[source,bash]
----
pg_recvlogical: could not connect to server: FATAL: no pg_hba.conf entry for replication connection from host "[local]", user "root", SSL off
----
At the test environment, the user with replication permission is the `postgres`.
Also, make sure that the `PATH` environment variable is set so as the `pg_recvlogical` can be found.
If not, update the `PATH` environment variable appropriately. For example at the test environment:
[source,bash]
----
export PATH="$PATH:/usr/pgsql-9.6/bin"
----
* *Create a slot* named `test_slot` for the database named `test`, using the logical output plug-in `wal2json`
[source,bash]
----
$ pg_recvlogical -d test --slot test_slot --create-slot -P wal2json
----
* *Begin streaming changes* from the logical replication slot `test_slot` for the database `test`
is a PostgreSQL specific table-level setting which determines the amount of information that is available
to logical decoding in case of `UPDATE` and `DELETE` events.
There are 4 possible values for `REPLICA IDENTITY`:
* *DEFAULT* - `UPDATE` and `DELETE` events will only contain the previous values for the primary key columns of a table
* *NOTHING* - `UPDATE` and `DELETE` events will not contain any information about the previous value on any of the table columns
* *FULL* - `UPDATE` and `DELETE` events will contain the previous values of all the table's columns
* *INDEX* `index name` - `UPDATE` and `DELETE` events will contains the previous values of the columns contained in the index definition named `index name`
You can modify and check the replica `REPLICA IDENTITY` for a table with the following commands: