tet123/documentation/modules/ROOT/pages/configuration/topic-routing.adoc

80 lines
3.9 KiB
Plaintext

= Topic Routing
include::../_attributes.adoc[]
:linkattrs:
:icons: font
:source-highlighter: highlight.js
Debezium enables you to re-route the emitted change before the message reaches the converter using a single message transformation,
or https://kafka.apache.org/documentation/#connect_transforms[SMT].
The SMT provided by Debezium enables you to rewrite the topic and the key according to a regular expression and a replacement pattern,
configurable per instance of Debezium.
The implementation does not care about the sanity of the change, this is in the responsibility of the user.
== Use-Cases
=== Logical Tables
A logical table consists of one or more physical tables with the same table structure.
A common use case is sharding, where for example two physical tables `db_shard1.my_table` and `db_shard2.my_table` together form one logical table.
Typically the physical tables share the same schema.
Normally, Debezium connectors send each change event to a topic that is named by the database and table.
But since the sharded tables have the same schema, we'd instead like to re-route each change event to a topic named by the _logical_ table name.
This way, all changes events for any of the shards all go to the same topic.
What happens if each physical table has a primary key that is only unique within that table?
In this case, a row in shard 1 can have the same primary key as a row in shard 2.
Since Debezium events are keyed by the columns that make up the primary key, the events for that row in shard 1 would have the same key as the row in shard 2,
even though globally they are different rows.
So, in addition to changing the topic name, we may also want to _modify the event key_ to add a field that makes the key globally unique.
This SMT lets you specify how you want to choose the new topic name and then specify how to modify the change event key to ensure it is globally unique.
== Topic Names
Below is an example for a configuration which replaces a part of the table in the topic with another string, allowing two tables to emit changes to the same topic:
[source]
----
transforms=Reroute
transforms.Reroute.type=io.debezium.transforms.ByLogicalTableRouter
transforms.Reroute.topic.regex=(.*)customers_shard(.*)
transforms.Reroute.topic.replacement=$1customers_all_shards
----
The configuration above will match topics such as `myserver.mydb.customers_shard1`, `myserver.mydb.customers_shard2` etc. and replace it with `myserver.mydb.customers_all_shards.`
== Key Fields
To address the concern of uniqueness across all the original tables discussed above,
one more field for identifying the original (physical) table is inserted into the key structure of the change events.
By default, this field is named `__dbz__physicalTableIdentifier` and has the original topic name as its value.
If needed, another _field name_ can be chosen by means of the `key.field.name` property
(obviously you'll want to choose a field name that doesn't clash with existing primary key fields).
For example the following configuration will use the name `shard_id` for the key field:
[source]
----
...
transforms.Reroute.key.field.name=shard_id
...
----
The _value_ of the field can be adjusted via the `key.field.regex` and `key.field.replacement` properties.
The former allows you to define a regular expression that will be applied to the original topic name to capture one or more groups of characters.
The latter lets you specify an expression that defines the value for the field in terms of those captured groups.
For example:
[source]
----
...
transforms.Reroute.key.field.regex=(.*)customers_shard(.*)
transforms.Reroute.key.field.replacement=$2
----
This will apply the given regular expression to original topic names and use the second capturing group as value for the key field.
Assuming the source topics are named `myserver.mydb.customers_shard1`, `myserver.mydb.customers_shard2` etc., the key field's values would be `1`, `2` etc.