From 70fc601c0fee27a8bde30b2ccd24b3a4386c65eb Mon Sep 17 00:00:00 2001 From: Randall Hauch Date: Wed, 3 Feb 2016 16:09:43 -0600 Subject: [PATCH] DBZ-8 Added documentation about embedded engines. --- README.md | 8 +++++++- debezium-embedded/README.md | 2 +- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 910bbee67..3233dfae8 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,13 @@ Debezium is an open source project that provides a near real-time data streaming Monitoring databases and being notified when data changes has always been complicated. Relational database triggers can be useful, but are specific to each database and often limited to updating state within the same database (not communicating with external processes). Some databases offer APIs or frameworks for monitoring changes, but there is no standard so each database's approach is different and requires a lot of knowledged and specialized code. It still is very challenging to ensure that all changes are seen and processed in the same order while minimally impacting the database. -Debezium provides modules that do this work for you. Some modules are generic and work with multiple database management systems, but are also a bit more limited in functionality and performance. Other modules are tailored for specific database management systems, so they are often far more capable and they leverage the specific features of the system. +Debezium provides modules that do this work for you. Some modules are generic and work with multiple database management systems, but are also a bit more limited in functionality and performance. Other modules are tailored for specific database management systems, so they are often far more capable and they leverage the specific features of the system. + +## Basic architecture + +Debezium is a change data capture (CDC) platform that achives its durability, reliability, and fault tolerance qualities by reusing Kafka and Kafka Connect. Each connector deployed to the Kafka Connect distributed, scalable, fault tolerant service monitors a single upstream database server, capturing all of the changes and recording them in one or more Kafka topics (tyipcally one topic per database table). Kafka ensures that all of these data change events are replicated and totally ordered, and allows many clients to independently consume these same data change events with little impact on the upstream system. Additionally, clients can stop consuming at any time, and when they restart they resume exactly where they left off. Each client can determine whether they want exactly-only or at-least-once delivery of all data change events, and all data change events for each database/table are delivered in the same order they occurred in the upsteram database. + +Applications that don't need or want this level of fault tolerance, performance, scalability, and reliabilty can instead use Debezium's *embedded connector engine* to run a connector directly within the application space. They still want the same data change events, but prefer to have the connectors send them directly to the application rather than persist them inside Kafka. ## Common use cases diff --git a/debezium-embedded/README.md b/debezium-embedded/README.md index f524de552..5a3348a44 100644 --- a/debezium-embedded/README.md +++ b/debezium-embedded/README.md @@ -2,7 +2,7 @@ Debezium connectors are normally operated by deploying them to a Kafka Connect service, and configuring one or more connectors to monitor upstream databases and produce data change events for all changes that they sees in the upstream databases. Those data change events are written to Kafka, where they can be independently consumed by many different applications. Kafka Connect provides excellent fault tolerance and scalability, since it runs as a distributed service and ensures that all registered and configured connectors are always running. For example, even if one of the Kafka Connect endpoints in a cluster goes down, the remaining Kafka Connect endpoints will restart any connectors that were previously running on the now-terminated endpoint, minimizing downtime and eliminating administrative activities. -Not every applications needs this level of fault tolerance and reliability, and they may not want to rely upon an external cluster of Kafka brokers and Kafka Connect services. Instead, some applications would prefer to *embed* Debezium connectors directly within the application space. They still want the same data change events, but prefer to have the connectors send them directly to the application rather than persiste them inside Kafka. +Not every application needs this level of fault tolerance and reliability, and they may not want to rely upon an external cluster of Kafka brokers and Kafka Connect services. Instead, some applications would prefer to *embed* Debezium connectors directly within the application space. They still want the same data change events, but prefer to have the connectors send them directly to the application rather than persiste them inside Kafka. This `debezium-embedded` module defines a small library that allows an application to easily configure and run Debezium connectors.