From 9dbea89f0a2b6239fa650e20cf18896aa1e08051 Mon Sep 17 00:00:00 2001 From: Jiri Pechanec Date: Fri, 31 Jan 2020 14:05:56 +0100 Subject: [PATCH] DBZ-1052 Docs for Oracle and SQL Server --- .../modules/ROOT/pages/connectors/oracle.adoc | 83 ++++++++++++++++++ .../ROOT/pages/connectors/sqlserver.adoc | 85 ++++++++++++++++++- 2 files changed, 167 insertions(+), 1 deletion(-) diff --git a/documentation/modules/ROOT/pages/connectors/oracle.adoc b/documentation/modules/ROOT/pages/connectors/oracle.adoc index 03652438b..9b9cc060e 100644 --- a/documentation/modules/ROOT/pages/connectors/oracle.adoc +++ b/documentation/modules/ROOT/pages/connectors/oracle.adoc @@ -610,6 +610,83 @@ When a row is deleted, the _delete_ event value listed above still works with lo But only if the message value is `null` will Kafka know that it can remove _all messages_ with that same key. To make this possible, Debezium's Oracle connector always follows the _delete_ event with a special _tombstone_ event that has the same key but `null` value. +[[transaction-metadata]] +=== Transaction Metadata + +[NOTE] +==== +This feature is under active development right now (incubating), +so the structure of transaction events or other details may still change as development progresses. +==== + +Debezium can generate events that represents tranaction metadata boundaries and enrich data messages. + +==== Transaction boundaries +Debezium generates events for every transaction start and end. +Every event contains + +* `status` - `BEGIN` or `END` +* `id` - string representation of unique transaction identifier +* `event_count` (for `END` events) - total number of events emmitted by the transaction +* `data_collections` (for `END` events) - an array of pairs of `data_collection` and `event_count` that provides number of events emitted by changes originating from given data collection + +An example of messages looks like +[source,json,indent=0,subs="attributes"] +---- +{ + "status": "BEGIN", + "id": "5.6.641", + "event_count": null, + "data_collections": null +} + +{ + "status": "END", + "id": "5.6.641", + "event_count": "2", + "data_collections": [ + { + "data_collection": "ORCLPDB1.DEBEZIUM.CUSTOMER", + "event_count": "1" + }, + { + "data_collection": "ORCLPDB1.DEBEZIUM.ORDER", + "event_count": "1" + } + ] +} +---- + +==== Data events enrichment +When transaction metadata are enabled then data message `Envelope` is enriched with a new field `transaction`. +This field provide information about every event in form of composite of fields + +* `id` - string representation of unique transaction identifier +* `total_order` - the absolute position the event amongst all events generated by the transaction +* `data_collection_order` - the per-data collection position of the event amongst all events emitted by the transaction + +An example of messages looks like +[source,json,indent=0,subs="attributes"] +---- +{ + "before": null, + "after": { + "pk": "2", + "aa": "1" + }, + "source": { +... + }, + "op": "c", + "ts_ms": "1580390884335", + "transaction": { + "id": "5.6.641", + "total_order": "1", + "data_collection_order": "1" + } +} +---- + [[data-types]] === Data Types @@ -1157,4 +1234,10 @@ The connector will read the table contents in multiple batches of this size. Def |Whether field names will be sanitized to adhere to Avro naming requirements. See xref:configuration/avro.adoc#names[Avro naming] for more details. +|`provide.transaction.metadata` (Incubating) +|`false` +|When set to `true` Debezium generates events with transaction boundaries and enriches data events envelope with transaction metadata. + +See link:#transaction-metadata[Transaction Metadata] for additional details. + |======================= diff --git a/documentation/modules/ROOT/pages/connectors/sqlserver.adoc b/documentation/modules/ROOT/pages/connectors/sqlserver.adoc index 8a5c19cee..569eaf12d 100644 --- a/documentation/modules/ROOT/pages/connectors/sqlserver.adoc +++ b/documentation/modules/ROOT/pages/connectors/sqlserver.adoc @@ -581,11 +581,88 @@ The SQL Server connector's events are designed to work with https://cwiki.apache which allows for the removal of some older messages as long as at least the most recent message for every key is kept. This allows Kafka to reclaim storage space while ensuring the topic contains a complete dataset and can be used for reloading key-based state. -[[sqlserver-tombstone-events]] +[[tombstone-events]] When a row is deleted, the _delete_ event value listed above still works with log compaction, since Kafka can still remove all earlier messages with that same key. But only if the message value is `null` will Kafka know that it can remove _all messages_ with that same key. To make this possible, the SQL Server connector always follows the _delete_ event with a special _tombstone_ event that has the same key but `null` value. +[[transaction-metadata]] +=== Transaction Metadata + +[NOTE] +==== +This feature is under active development right now (incubating), +so the structure of transaction events or other details may still change as development progresses. +==== + +Debezium can generate events that represents tranaction metadata boundaries and enrich data messages. + +==== Transaction boundaries +Debezium generates events for every transaction start and end. +Every event contains + +* `status` - `BEGIN` or `END` +* `id` - string representation of unique transaction identifier +* `event_count` (for `END` events) - total number of events emmitted by the transaction +* `data_collections` (for `END` events) - an array of pairs of `data_collection` and `event_count` that provides number of events emitted by changes originating from given data collection + +An example of messages looks like +[source,json,indent=0,subs="attributes"] +---- +{ + "status": "BEGIN", + "id": "00000025:00000d08:0025", + "event_count": null, + "data_collections": null +} + +{ + "status": "END", + "id": "00000025:00000d08:0025", + "event_count": "2", + "data_collections": [ + { + "data_collection": "testDB.dbo.tablea", + "event_count": "1" + }, + { + "data_collection": "testDB.dbo.tableb", + "event_count": "1" + } + ] +} +---- + +==== Data events enrichment +When transaction metadata are enabled then data message `Envelope` is enriched with a new field `transaction`. +This field provide information about every event in form of composite of fields + +* `id` - string representation of unique transaction identifier +* `total_order` - the absolute position the event amongst all events generated by the transaction +* `data_collection_order` - the per-data collection position of the event amongst all events emitted by the transaction + +An example of messages looks like +[source,json,indent=0,subs="attributes"] +---- +{ + "before": null, + "after": { + "pk": "2", + "aa": "1" + }, + "source": { +... + }, + "op": "c", + "ts_ms": "1580390884335", + "transaction": { + "id": "00000025:00000d08:0025", + "total_order": "1", + "data_collection_order": "1" + } +} +---- + [[schema-evolution]] === Database schema evolution @@ -1436,6 +1513,12 @@ This is used to define the timezone of the transaction timestamp (ts_ms) retriev When unset, default behavior is to use the timezone of the VM running the Debezium connector. In this case, when running on on SQL Server 2014 or older and using different timezones on server and the connector, incorrect ts_ms values may be produced. + Possible values include "Z", "UTC", offset values like "+02:00", short zone ids like "CET", and long zone ids like "Europe/Paris". +|`provide.transaction.metadata` (Incubating) +|`false` +|When set to `true` Debezium generates events with transaction boundaries and enriches data events envelope with transaction metadata. + +See link:#transaction-metadata[Transaction Metadata] for additional details. + |======================= The connector also supports _pass-through_ configuration properties that are used when creating the Kafka producer and consumer. Specifically, all connector configuration properties that begin with the `database.history.producer.` prefix are used (without the prefix) when creating the Kafka producer that writes to the database history, and all those that begin with the prefix `database.history.consumer.` are used (without the prefix) when creating the Kafka consumer that reads the database history upon connector startup.