4e279c17c1
Also adding Aaron to COPYRIGHT.txt
135 lines
5.8 KiB
Plaintext
135 lines
5.8 KiB
Plaintext
= Topic Routing
|
|
include::../_attributes.adoc[]
|
|
:toc:
|
|
:toc-placement: macro
|
|
:linkattrs:
|
|
:icons: font
|
|
:source-highlighter: highlight.js
|
|
|
|
toc::[]
|
|
|
|
Debezium enables you to re-route the emitted change before the message reaches the converter using a single message transformation,
|
|
or {link-kafka-docs}/#connect_transforms[SMT].
|
|
The SMT provided by Debezium enables you to rewrite the topic and the key according to a regular expression and a replacement pattern,
|
|
configurable per instance of Debezium.
|
|
|
|
The implementation does not care about the sanity of the change, this is in the responsibility of the user.
|
|
|
|
== Use-Cases
|
|
|
|
=== Logical Tables
|
|
|
|
A logical table consists of one or more physical tables with the same table structure.
|
|
A common use case is sharding, where for example two physical tables `db_shard1.my_table` and `db_shard2.my_table` together form one logical table.
|
|
|
|
Typically the physical tables share the same schema.
|
|
|
|
Normally, Debezium connectors send each change event to a topic that is named by the database and table.
|
|
But since the sharded tables have the same schema, we'd instead like to re-route each change event to a topic named by the _logical_ table name.
|
|
This way, all changes events for any of the shards all go to the same topic.
|
|
|
|
What happens if each physical table has a primary key that is only unique within that table?
|
|
In this case, a row in shard 1 can have the same primary key as a row in shard 2.
|
|
Since Debezium events are keyed by the columns that make up the primary key, the events for that row in shard 1 would have the same key as the row in shard 2,
|
|
even though globally they are different rows.
|
|
So, in addition to changing the topic name, we may also want to _modify the event key_ to add a field that makes the key globally unique.
|
|
|
|
This SMT lets you specify how you want to choose the new topic name and then specify how to modify the change event key to ensure it is globally unique.
|
|
|
|
=== Partitioned Postgres Tables
|
|
|
|
when capturing changes from a partioned Postgres table,
|
|
there will be one topic per partition table.
|
|
Using this routing SMT allows you to emit all change events to a single topic for the entire partioned table.
|
|
As key uniqueness across the different partition topics already is guaranteed in this case,
|
|
the `key.enforce.uniqueness` option can be set to `false` (see below).
|
|
|
|
== Topic Names
|
|
|
|
Below is an example for a configuration which replaces a part of the table in the topic with another string, allowing two tables to emit changes to the same topic:
|
|
|
|
[source]
|
|
----
|
|
transforms=Reroute
|
|
transforms.Reroute.type=io.debezium.transforms.ByLogicalTableRouter
|
|
transforms.Reroute.topic.regex=(.*)customers_shard(.*)
|
|
transforms.Reroute.topic.replacement=$1customers_all_shards
|
|
----
|
|
|
|
The configuration above will match topics such as `myserver.mydb.customers_shard1`, `myserver.mydb.customers_shard2` etc. and replace it with `myserver.mydb.customers_all_shards.`
|
|
|
|
== Key Fields
|
|
|
|
To address the concern of uniqueness across all the original tables discussed above,
|
|
one more field for identifying the original (physical) table may be inserted into the key structure of the change events.
|
|
By default, this field is named `\__dbz__physicalTableIdentifier` and has the original topic name as its value.
|
|
|
|
If your tables already contain globally unique keys and you do not need to change the key structure, you can set the `key.enforce.uniqueness` property to `false`:
|
|
|
|
[source]
|
|
----
|
|
...
|
|
transforms.Reroute.key.enforce.uniqueness=false
|
|
...
|
|
----
|
|
|
|
If needed, another _field name_ can be chosen by means of the `key.field.name` property
|
|
(obviously you'll want to choose a field name that doesn't clash with existing primary key fields).
|
|
For example the following configuration will use the name `shard_id` for the key field:
|
|
|
|
[source]
|
|
----
|
|
...
|
|
transforms.Reroute.key.field.name=shard_id
|
|
...
|
|
----
|
|
|
|
The _value_ of the field can be adjusted via the `key.field.regex` and `key.field.replacement` properties.
|
|
The former allows you to define a regular expression that will be applied to the original topic name to capture one or more groups of characters.
|
|
The latter lets you specify an expression that defines the value for the field in terms of those captured groups.
|
|
For example:
|
|
|
|
[source]
|
|
----
|
|
...
|
|
transforms.Reroute.key.field.regex=(.*)customers_shard(.*)
|
|
transforms.Reroute.key.field.replacement=$2
|
|
----
|
|
|
|
This will apply the given regular expression to original topic names and use the second capturing group as value for the key field.
|
|
Assuming the source topics are named `myserver.mydb.customers_shard1`, `myserver.mydb.customers_shard2` etc., the key field's values would be `1`, `2` etc.
|
|
|
|
[[configuration-options]]
|
|
== Configuration Options
|
|
[cols="35%a,10%a,55%a",options="header"]
|
|
|=======================
|
|
|Property
|
|
|Default
|
|
|Description
|
|
|
|
|`topic.regex`
|
|
|
|
|
|A regular expression for matching the topic(s) which should be re-routed.
|
|
|
|
|`topic.replacement`
|
|
|
|
|
|The name of the topic to which re-routed events will be sent. May refer to capturing groups from the regular expression given via `topic.regex` via `$1`, `$2` etc.
|
|
|
|
|`key.enforce.uniqueness`
|
|
|`true`
|
|
|Whether to add a field to the change event key that identifies the original table/topic name and thus helps to prevent collisions of change events for records with the same key originating from multiple source tables; either `true` or `false`
|
|
|
|
|`key.field.name`
|
|
|`__dbz__physicalTableIdentifier`
|
|
|Name of a field to be added to the change event key for identifying the original table/topic name; only takes affect if `key.enforce.uniqueness` is `true`
|
|
|
|
|`key.field.regex`
|
|
|
|
|
|A regular expression applied to the original topic name to capture one or more groups of characters; only takes affect if `key.enforce.uniqueness` is `true`
|
|
|
|
|`key.field.replacement`
|
|
|
|
|
|Expression that specifies the value for the added key field in terms of captured groups defined using `key.field.regex`; only takes affect if `key.enforce.uniqueness` is `true`
|
|
|
|
|=======================
|