tet123/documentation/modules/ROOT/pages/transformations/content-based-routing.adoc

262 lines
15 KiB
Plaintext
Raw Normal View History

2021-08-20 12:21:04 +02:00
:page-aliases: configuration/content-based-routing.adoc
// Category: debezium-using
// Type: assembly
// ModuleID: routing-change-event-records-to-topics-according-to-event-content
// Title: Routing change event records to topics according to event content
[id="content-based-routing"]
= Content-based routing
ifdef::community[]
:toc:
:toc-placement: macro
:linkattrs:
:icons: font
:source-highlighter: highlight.js
toc::[]
endif::community[]
By default, {prodname} streams all of the change events that it reads from a table to a single static topic.
However, there might be situations in which you might want to reroute selected events to other topics, based on the event content.
The process of routing messages based on their content is described in the https://www.enterpriseintegrationpatterns.com/patterns/messaging/ContentBasedRouter.html[Content-based routing] messaging pattern.
2020-09-17 10:49:05 +02:00
To apply this pattern in {prodname}, you use the content-based routing link:https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect[single message transform] (SMT) to write expressions that are evaluated for each event.
Depending how an event is evaluated, the SMT either routes the event message to the original destination topic, or reroutes it to the topic that you specify in the expression.
While it is possible to use Java to create a custom SMT to encode routing logic, using a custom-coded SMT has its drawbacks.
For example:
* It is necessary to compile the transformation up front and deploy it to Kafka Connect.
* Every change needs code recompilation and redeployment, leading to inflexible operations.
The content-based routing SMT supports scripting languages that integrate with https://jcp.org/en/jsr/detail?id=223[JSR 223] (Scripting for the Java(TM) Platform).
{prodname} does not come with any implementations of the JSR 223 API.
To use an expression language with {prodname}, you must download the JSR 223 script engine implementation for the language.
ifdef::community[]
For example, for Groovy 3, you can download its JSR 223 implementation from https://groovy-lang.org/.
The JSR223 implementation for GraalVM JavaScript is available at https://github.com/graalvm/graaljs.
After you obtain the script engine files, you add them to your {prodname} connector plug-in directories, along any other JAR files used by the language implementation.
endif::community[]
ifdef::product[]
Depending on the method that you use to deploy {prodname}, you can automatically download the required artifacts from Maven Central,
or you can manually download the artifacts, and then add them to your {prodname} connector plug-in directories, along any other JAR files used by the language implementation.
endif::product[]
// Type: procedure
// Title: Setting up the {prodname} content-based-routing SMT
// ModuleID: setting-up-the-debezium-content-based-routing-smt
2020-09-17 10:49:05 +02:00
[[set-up-content-based-routing]]
== Set up
2020-09-22 03:06:51 +02:00
For security reasons, the content-based routing SMT is not included with the {prodname} connector archives.
Instead, it is provided in a separate artifact, `debezium-scripting-{debezium-version}.tar.gz`.
2020-09-17 11:50:19 +02:00
ifdef::product[]
If you deploy the {prodname} connector by building a custom Kafka Connect container image from a Dockerfile, to use the filter SMT, you must explicitly add the SMT artifact to your Kafka Connect environment.
When you use {StreamsName} to deploy the connector, it can download the required artifacts automatically based on configuration parameters that you specify in the Kafka Connect custom resource.
endif::product[]
ifdef::community[]
To use the content-based routing SMT with a {prodname} connector plug-in, you must explicitly add the SMT artifact to your Kafka Connect environment.
endif::community[]
IMPORTANT: After the routing SMT is present in a Kafka Connect instance, any user who is allowed to add a connector to the instance can run scripting expressions.
To ensure that scripting expressions can be run only by authorized users, be sure to secure the Kafka Connect instance and its configuration interface before you add the routing SMT.
2020-09-17 10:49:05 +02:00
ifdef::community[]
With https://zookeeper.apache.org[Zookeeper], http://kafka.apache.org/[Kafka], {link-kafka-docs}.html#connect[Kafka Connect], and one or more {prodname} connectors installed, the remaining tasks to install the filter SMT are:
. Download the link:https://repo1.maven.org/maven2/io/debezium/debezium-scripting/{debezium-version}/debezium-scripting-{debezium-version}.tar.gz[scripting SMT archive]
. Extract the contents of the archive into the {prodname} plug-in directories of your Kafka Connect environment.
2020-09-17 11:50:19 +02:00
. Obtain a JSR-223 script engine implementation and add its contents to the {prodname} plug-in directories of your Kafka Connect environment.
2020-09-17 10:49:05 +02:00
. Restart your Kafka Connect process to pick up the new JAR files.
endif::community[]
ifdef::product[]
The following procedure applies if you build your Kafka Connect container image from a Dockerfile.
If you use {StreamsName} to create the Kafka Connect image, follow the instructions in the deployment topic for your connector.
.Procedure
. From a browser, open the link:{LinkDebeziumDownloads}[{NameDebeziumDownloads}], and download the {prodname} scripting SMT archive (`debezium-scripting-{debezium-version}.tar.gz`).
2020-09-17 10:49:05 +02:00
. Extract the contents of the archive into the {prodname} plug-in directories of your Kafka Connect environment.
2020-09-17 11:50:19 +02:00
. Obtain a JSR-223 script engine implementation and add its contents to the {prodname} plug-in directories of your Kafka Connect environment.
. Restart the Kafka Connect process to pick up the new JAR files.
2020-09-17 10:49:05 +02:00
endif::product[]
The Groovy language needs the following libraries on the classpath:
* `groovy`
* `groovy-json` (optional)
* `groovy-jsr223`
The JavaScript language needs the following libraries on the classpath:
* `graalvm.js`
* `graalvm.js.scriptengine`
// Type: concept
// ModuleID: example-debezium-basic-content-based-routing-configuration
// Title: Example: {prodname} basic content-based routing configuration
[[example-basic-content-based-routing-configuration]]
== Example: Basic configuration
To configure a {prodname} connector to route change event records based on the event content, you configure the `ContentBasedRouter` SMT in the Kafka Connect configuration for the connector.
Configuration of the content-based routing SMT requires you to specify a regular expression that defines the filtering criteria.
In the configuration, you create a regular expression that defines routing criteria.
The expression defines a pattern for evaluating event records.
It also specifies the name of a destination topic where events that match the pattern are routed.
The pattern that you specify might designate an event type, such as a table insert, update, or delete operation.
You might also define a pattern that matches a value in a specific column or row.
For example, to reroute all update (`u`) records to an `updates` topic, you might add the following configuration to your connector configuration:
[source]
----
...
transforms=route
transforms.route.type=io.debezium.transforms.ContentBasedRouter
transforms.route.language=jsr223.groovy
transforms.route.topic.expression=value.op == 'u' ? 'updates' : null
...
----
The preceding example specifies the use of the `Groovy` expression language.
Records that do not match the pattern are routed to the default topic.
.Customizing the configuration
The preceding example shows a simple SMT configuration that is designed to process only DML events, which contain an `op` field.
Other types of messages that a connector might emit (heartbeat messages, tombstone messages, or metadata messages about transactions or schema changes) do not contain this field.
To avoid processing failures, you can define xref:options-for-applying-the-transformation-selectively[an SMT predicate statement that selectively applies the transformation] to specific events only.
// Type: concept
// ModuleID: variables-for-use-in-debezium-content-based-routing-expressions
//Title: Variables for use in {prodname} content-based routing expressions
== Variables for use in content-based routing expressions
{prodname} binds certain variables into the evaluation context for the SMT.
When you create expressions to specify conditions to control the routing destination,
the SMT can look up and interpret the values of these variables to evaluate conditions in an expression.
The following table lists the variables that {prodname} binds into the evaluation context for the content-based routing SMT:
.Content-based routing expression variables
[cols="25%a,35%a,40%a",subs="+attributes",options="header"]
|===
|Name |Description |Type
|`key` |A key of the message. |`org.apache.kafka.connect{zwsp}.data{zwsp}.Struct`
|`value` |A value of the message. |`org.apache.kafka.connect{zwsp}.data{zwsp}.Struct`
|`keySchema` |Schema of the message key.|`org.apache.kafka.connect{zwsp}.data{zwsp}.Schema`
|`valueSchema`|Schema of the message value.| `org.apache.kafka.connect{zwsp}.data{zwsp}.Schema`
|`topic`|Name of the target topic.| String
|`headers`
a|A Java map of message headers. The key field is the header name.
The `headers` variable exposes the following properties:
* `value` (of type `Object`)
* `schema` (of type `org.apache.kafka{zwsp}.connect{zwsp}.data{zwsp}.Schema`)
| `java.util.Map{zwsp}<String,{zwsp} io.debezium{zwsp}.transforms{zwsp}.scripting{zwsp}.RecordHeader>`
|===
An expression can invoke arbitrary methods on its variables.
Expressions should resolve to a Boolean value that determines how the SMT dispositions the message.
When the routing condition in an expression evaluates to `true`, the message is retained.
When the routing condition evaluates to `false`, the message is removed.
Expressions should not result in any side-effects. That is, they should not modify any variables that they pass.
// Type: concept
// Title: Options for applying the content-based routing transformation selectively
// ModuleID: options-for-applying-the-content-based-routing-transformation-selectively
[id="options-for-applying-the-transformation-selectively"]
== Options for applying the transformation selectively
In addition to the change event messages that a {prodname} connector emits when a database change occurs, the connector also emits other types of messages, including heartbeat messages, and metadata messages about schema changes and transactions.
Because the structure of these other messages differs from the structure of the change event messages that the SMT is designed to process, it's best to configure the connector to selectively apply the SMT, so that it processes only the intended data change messages.
You can use one of the following methods to configure the connector to apply the SMT selectively:
* {link-prefix}:{link-smt-predicates}#applying-transformations-selectively[Configure an SMT predicate for the transformation].
* Use the xref:content-based-router-topic-regex[topic.regex] configuration option for the SMT.
// Type: reference
// ModuleID: configuration-of-content-based-routing-conditions-for-other-scripting-languages
// Title: Configuration of content-based routing conditions for other scripting languages
== Language specifics
The way that you express content-based routing conditions depends on the scripting language that you use.
For example, as shown in the xref:example-basic-content-based-routing-configuration[basic configuration example], when you use `Groovy` as the expression language,
the following expression reroutes all update (`u`) records to the `updates` topic, while routing other records to the default topic:
[source,groovy]
----
value.op == 'u' ? 'updates' : null
----
Other languages use different methods to express the same condition.
[TIP]
====
The {prodname} MongoDB connector emits the `after` and `patch` fields as serialized JSON documents rather than as structures. +
To use the ContentBasedRouting SMT with the MongoDB connector, you must first unwind the array fields in the JSON into separate documents. +
ifdef::community[]
You can do this by applying the {link-prefix}:{link-mongodb-event-flattening}#new-record-state-extraction[MongoDB `ExtractNewDocumentState`] SMT.
You could also take the approach of using a JSON parser within an expression to generate separate output documents for each array item. +
endif::community[]
ifdef::product[]
You can use a JSON parser within an expression to generate separate output documents for each array item.
endif::product[]
For example, if you use Groovy as the expression language, add the `groovy-json` artifact to the classpath, and then add an expression such as `(new groovy.json.JsonSlurper()).parseText(value.after).last_name == 'Kretchmar'`.
====
.Javascript
When you use JavaScript as the expression language, you can call the `Struct#get()` method to specify the content-based routing condition, as in the following example:
[source,javascript]
----
value.get('op') == 'u' ? 'updates' : null
----
.Javascript with Graal.js
When you create content-based routing conditions by using JavaScript with Graal.js, you use an approach that is similar to the one use with Groovy.
For example:
[source,javascript]
----
value.op == 'u' ? 'updates' : null
----
// Type: reference
// ModuleID: options-for-configuring-the-content-based-routing-transformation
// Title: Options for configuring the content-based routing transformation
[[content-based-router-configuration-options]]
== Configuration options
[cols="30%a,25%a,45%a"]
|===
|Property
|Default
|Description
|[[content-based-router-topic-regex]]<<content-based-router-topic-regex, `topic.regex`>>
|
|An optional regular expression that evaluates the name of the destination topic for an event to determine whether to apply the condition logic.
If the name of the destination topic matches the value in `topic.regex`, the transformation applies the condition logic before it passes the event to the topic.
If the name of the topic does not match the value in `topic.regex`, the SMT passes the event to the topic unmodified.
|[[content-based-router-language]]<<content-based-router-language, `language`>>
|
|The language in which the expression is written. Must begin with `jsr223.`, for example, `jsr223.groovy`, or `jsr223.graal.js`. {prodname} supports bootstrapping through the https://jcp.org/en/jsr/detail?id=223[JSR 223 API ("Scripting for the Java (TM) Platform")] only.
|[[content-based-router-topic-expression]]<<content-based-router-topic-expression, `topic.expression`>>
|
|The expression to be evaluated for every message. Must evaluate to a `String` value where a result of non-null reroutes the message to a new topic, and a `null` value routes the message to the default topic.
|[[content-based-router-null-handling-mode]]<<content-based-router-null-handling-mode, `null.handling.mode`>>
|`keep`
a|Specifies how the transformation handles `null` (tombstone) messages. You can specify one of the following options:
`keep`:: (Default) Pass the messages through.
`drop`:: Remove the messages completely.
`evaluate`:: Apply the condition logic to the messages.
|===