tet123/documentation/modules/ROOT/pages/post-processors/reselect-columns.adoc
Chris Cranford c1b7e68319 DBZ-7358 Use relational table primary key by default
This fix uses the relational table primary key by default; however, as a
user can define `message.key.columns` to even override the primary key
configuration or to handle keyless tables, the user can override the
`reselect.use.event.key` option with `true` so use the event key fields
instead for the re-select so that keyless tables can also participate
with the column reselection process.
2024-01-18 05:53:37 +01:00

104 lines
5.9 KiB
Plaintext

= Re-select columns
:toc:
:toc-placement: macro
:linkattrs:
:icons: font
:source-highlighter: highlight.js
toc::[]
[NOTE]
====
This post-processor is supported only for the SQL database connectors.
====
== Overview
In some cases, because of the way that certain source databases function, when a {prodname} connector emits a change event, the event might exclude values for specific column types.
For example, values for TOAST columns in PostgreSQL, LOB columns in Oracle, or Extended String columns in Oracle Exadata, might all be excluded.
The `ReselectColumnsPostProcessor` provides a way to re-select one or more columns from a database table and fetch the current state.
You can configure the post processor to re-select the following column types:
* `null` columns.
* columns that contain the `unavailable.value.placeholder` sentinel value.
Configuring a `PostProcessor` is similar to configuring a `CustomConverter` or `Transformation`.
== Keyless tables
The `ReselectColumnsPostProcessor` requires that the table have some unique combination of columns that can be used to generate a re-select query that returns a single row.
By default, the `PostProcessor` will use the relational table model to construct a where-clause based on the table's primary key columns or the unique index that is defined on the table.
However, if a table has no primary key or unique index, effectively keyless, then you can use the `message.key.columns` configuration to define a combination of columns that uniquely identifies a single row.
When using `message.key.columns` for keyless tables, it is important to set the `reselect.use.event.key` configuration property to `true` that the event's key fields are used as the basis for the selection criteria since the relational table model would have no primary key columns.
[NOTE]
====
The `ReselectColumnsPostProcessor` tolerates a re-select query that returns more than one row.
In such circumstances, only the first row will be used and that entry is entirely random and database driven.
It's recommended that if you use `reselect.use.event.key` set to `true`, your connector configuration and data model guarantees that the columns that participate in the event's key uniquely identify a single database row so that the re-select is always deterministic.
====
== Configuration example
Configure a `PostProcessor` much in the same way that you would configure a `CustomConverter` or `Transformation`.
To enable the connector to use the `ReselectColumnsPostProcessor`, add the following options to the connector configuration:
[source,json]
----
"post-processors": "reselector", // <1>
"reselector.type": "io.debezium.processors.reselect.ReselectColumnsPostProcessor", // <2>
"reselector.reselect.columns.include.list": "<schema_name>.<table_name>:<colA>,<schema_name>.<table_name>:<colB>", // <3>
"reselector.reselect.unavailable.values": "true", // <4>
"reselector.reselect.null.values": "true" // <5>
"reselector.reselect.use.event.key": "false" // <6>
----
<1> Comma-separated list of post-processor prefixes.
<2> The fully-qualified class type name for the post-processor.
<3> Comma-separated list of column names specified by using the following format: `<schema>.<table>:<column>`.
<4> Enables or disables the re-selection of columns that contain the `unavailable.value.placeholder` sentinel value.
<5> Enables or disables the re-selection of columns that are `null`.
<6> Enables or disables the re-selection based event key field names.
== Configuration options
The following table lists the configuration options that you can set for the Reselect Columns post-processor.
.Reselect columns post processor configuration options
[cols="30%a,25%a,45%a"]
|===
|Property
|Default
|Description
|[[reselect-columns-post-processor-property-reselect-columns-include-list]]<<reselect-columns-post-processor-property-reselect-columns-include-list, `+reselect.columns.include.list+`>>
|No default
|Comma-separated list of column names to re-select from the source database.
Use the following format to specify column names: + `_<schema>_._<table>_:_<column>_` +
+
Do not set this property if you set the `reselect.columns.exclude.list` property.
|[[reselect-columns-post-processor-property-reselect-columns-exclude-list]]<<reselect-columns-post-processor-property-reselect-columns-exclude-list, `+reselect.columns.exclude.list+`>>
|No default
|Comma-separated list of column names in the source database to exclude from re-selection.
Use the following format to specify column names: + `_<schema>_._<table>_:_<column>_` +
+
Do not set this property if you set the `reselect.columns.include.list` property.
|[[reselect-columns-post-processor-property-reselect-unavailable-values]]<<reselect-columns-post-processor-property-reselect-unavailable-values, `+reselect.unavailable.values+`>>
|`true`
|Specifies whether the post processor reselects a column that matches the `reselect.columns.include.list` filter if the column value is provided by the connector's `unavailable.value.placeholder` property.
|[[reselect-columns-post-processor-property-reselect-null-values]]<<reselect-columns-post-processor-property-reselect-null-values, `+reselect.null.values+`>>
|`true`
|Specifies whether the post processor reselects a column that matches the `reselect.columns.include.list` filter if the column value is `null`.
|[[reselect-columns-post-processor-property-reselect-use-event-key]]<<reselect-columns-post-processor-property-reselect-use-event-key, `+reselect.use.event.key+`>>
|`false`
|Specifies whether the post processor reselects based on the event's key field names or uses the relational table's primary key column names. +
+
By default, the reselect is based on the relational table's primary key columns or unique key index.
Setting this to `true` can be useful if the table has no primary key and the connector is configured to use `message.key.columns` to create events with a key.
This will then use the key field names as the primary key in the SQL reselection query.
|===