tet123/documentation/modules/ROOT/pages/configuration/content-based-routing.adoc

114 lines
5.4 KiB
Plaintext
Raw Normal View History

= Content-Based Routing
include::../_attributes.adoc[]
:toc:
:toc-placement: macro
:linkattrs:
:icons: font
:source-highlighter: highlight.js
toc::[]
[NOTE]
====
This SMT is currently in incubating state, i.e. exact semantics, configuration options etc. may change in future revisions, based on the feedback we receive. Please let us know if you encounter any problems while using this routing transformation.
====
With real-life applications, it is often necessary to route the events not only to a static topic but to a topic based on the message content.
This is expressed as the https://www.enterpriseintegrationpatterns.com/patterns/messaging/ContentBasedRouter.html[Content-Based Router] integration pattern.
Kafka Connect provides a generic mechanism to do the routing in the form of link:https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect[Simple Message Transforms] (SMT).
The SMT is a Java class that encodes the routing logic.
This is a very powerful mechanism but has two drawbacks:
* It is necessary to compile the transformation upfront and deploy it to Kafka Connect.
* Every change needs code recompilation and redeployment, leading to inflexible operations.
To solve this problem, Debezium comes with the Content-Based Router SMT.
This SMT allows the operator to write an expression that is evaluated for each event and according to the result, it is either kept in the original topic or re-routed to another one.
The current implementation supports any scripting language which integrates with https://jcp.org/en/jsr/detail?id=223[JSR 223] (Scripting for the Java(TM) Platform).
The Content-Based Router SMT can be used like so:
[source]
----
...
transforms=route
transforms.route.type=io.debezium.transforms.ContentBasedRouter
transforms.route.language=jsr223.groovy
transforms.route.topic.expression=value.op == 'u' ? 'updates' : null
...
----
In this example, we are using `Groovy` as the expression language, and we're re-routing all update records to the `updates` topic and the others into the default one.
[IMPORTANT]
====
Debezium does not come with the language implementations in its installation packages.
It is the user's responsibility to provide an implementation, such as link:https://groovy-lang.org/[Groovy 3] or link:https://github.com/graalvm/graaljs[GraalVM JavaScript], on the classpath.
Bootstrapping is done exclusively via the JSR 223 API currently, so the engine's support for this API must be provided as well.
====
Debezium binds four variables into the evaluation context:
* `key` - a key of the message
* `value` - a value of the message
* `keySchema` - the schema of the message key
* `valueSchema` - the schema of the message value
* `topic` - the name of the target topic
* `headers` - the map of message headers keyed with header name and value composed of `schema` and `value` variables
The `key` and `value` are of type `org.apache.kafka.connect.data.Struct` and `keySchema` and `valueSchema` are variables of type `org.apache.kafka.connect.data.Schema`.
The expression can invoke arbitrary methods on the variables and should evaluate into a boolean value that decides whether the message is removed `true` or kept.
Expressions should be side-effect free, i.e. they should *not* modify the passed variables in any way.
== Language specifics
The same business logic - re-route all update records an be expressed like this, depending on your preferred scripting language;
In case of `Groovy`, the value fields can be accessed in a property-like way:
[source,groovy]
----
value.op == 'u' ? 'updates' : null
----
Other languages, such as JavaScript, will typically require to call the `Struct#get()` method:
[source,javascript]
----
value.get('op') == 'u' ? 'updates' : null
----
When using JavaScript via Graal.js, simplified property references can be used, akin to the Groovy approach:
[source,javascript]
----
value.op == 'u' ? 'updates' : null
----
[[content-based-router-configuration-options]]
== Configuration options
[cols="35%a,10%a,55%a",options="header"]
|=======================
|Property
|Default
|Description
|[[content-based-router-topic-regex]]<<content-based-router-topic-regex, `topic.regex`>>
|
|An optional regular expression for specifying the topic(s) this transformation should be applied to. Records on any topic whose name does not match the given expression are passed on as-is.
|[[content-based-router-language]]<<content-based-router-language, `language`>>
|
|The language in which the expression is written. Must begin with `jsr223.`, e.g. `jsr223.groovy`, or `jsr223.graal.js`. Currently, only bootstrapping via the https://jcp.org/en/jsr/detail?id=223[JSR 223 API] ("Scripting for the Java (TM) Platform") is supported.
|[[content-based-router-topic-expression]]<<content-based-router-topic-expression, `topic.expression`>>
|
|The expression evaluated for every message. Must evaluate to a `String` value where a result of non-null will re-route the message to a new topic, and a `null` value will keep it in the original topic.
|[[content-based-router-null-handling-mode]]<<content-based-router-null-handling-mode, `null.handling.mode`>>
|`keep`
|Prescribes how the transformation should handle `null` (tombstone) messages. The options are: `keep` (the default) to pass the message through, `drop` to remove the messages completely, or `evaluate` to run the message through the topic expression.
|=======================