tet123/documentation/modules/ROOT/pages/architecture.adoc
2020-06-16 08:42:00 +02:00

50 lines
3.2 KiB
Plaintext

[id="debezium-architecture"]
= {prodname} Architecture
Most commonly, {prodname} is deployed via Apache {link-kafka-docs}/#connect[Kafka Connect].
Kafka Connect is a framework and runtime for implementing and operating
* source connectors such as {prodname}, which ingest data into Kafka and
* sink connectors, which propagate data from Kafka topics into other systems.
The following image shows the architecture of a CDC pipeline based on {prodname}:
image::debezium-architecture.png[{prodname} Architecture]
Kafka Connect is operated as a separate service besides the Kafka broker itself.
The {prodname} connectors for MySQL and Postgres are deployed to capture the changes out of these two databases.
For this purpose, the two connectors establish a connection to the two source databases
using a client library for accessing the binlog in case of MySQL and reading from a logical replication stream in case of Postgres.
By default, the changes from one capture table are written to a corresponding Kafka topic.
If needed, the topic name can be adjusted with help of {prodname}'s {link-prefix}:{link-topic-routing}[topic routing SMT],
e.g. to use topic names that deviate from the captured tables names or to stream changes from multiple tables into a single topic.
Once the change events are in Apache Kafka, different connectors from the Kafka Connect eco-system can be used
to stream the changes to other systems and databases such as Elasticsearch, data warehouses and analytics systems or caches such as Infinispan.
Depending on the chosen sink connector, it may be needed to apply {prodname}'s {link-prefix}:{link-event-flattening}[new record state extraction] SMT,
which will only propagate the "after" structure from {prodname}'s event envelope to the sink connector.
ifdef::community[]
== {prodname} Server
Another way to deploy {prodname} is using the xref:operations/debezium-server.adoc[{prodname} server].
The {prodname} server is a configurable, read-to-use application that streams change events from a source database to a variety of messaging infrastructures.
The following image shows the architecture of a CDC pipeline using the {prodname} server:
image::debezium-server-architecture.png[{prodname} Architecture]
The {prodname} server is configured to use one of the {prodname} source connectors to capture changes from the source database.
Change events can be serialized to different formats like JSON or Apache Avro and then will be sent to one of a variety of messaging infrastructures like Amazon Kinesis, Google Cloud Pub/Sub, or Apache Pulsar.
== Embedded Engine
Yet an alternative way for using the {prodname} connectors is the xref:operations/embedded.adoc[embedded engine].
In this case, {prodname} will not be run via Kafka Connect, but as a library embedded into your custom Java applications.
This can be useful for either consuming change events within your application itself,
without the needed for deploying complete Kafka and Kafka Connect clusters,
or for streaming changes to alternative messaging brokers such as Amazon Kinesis.
You can find https://github.com/debezium/debezium-examples/tree/master/kinesis[an example] for the latter in the examples repository.
endif::community[]