DBZ-7698 DBZ-7773 Add review suggestions

This commit is contained in:
twthorn 2024-04-23 10:47:31 -05:00 committed by Jiri Pechanec
parent d452286b99
commit 879e02e40f

View File

@ -198,31 +198,36 @@ Following is an example of a message:
[[vitess-ordered-transaction-metadata]] [[vitess-ordered-transaction-metadata]]
=== Ordered Transaction Metadata === Ordered Transaction Metadata
Debezium can include additional metadata in each data change event that allows a downstream system to determine which event is more recent. This is useful in the case where the system that Debezium is producing data change events to (e.g., Kafka) experiences a change that can lead to data being consumed out of order (e.g., Kafka re-partition event). You can configure {prodname} to include additional metadata in data change event records.
Such supplemental metadata can assist downstream consumers in processing messages in the correct order when repartitioning, or some other disruption, might otherwise lead to data being consumed out of order.
.Change Data Enrichment .Change Data Enrichment
When transaction metadata factory is set to `VitessOrderedTransactionMetadataFactory`, the data message `Envelope` is also enriched with a new `transaction` field. When transaction metadata factory is set to `VitessOrderedTransactionMetadataFactory`, the data message `Envelope` is also enriched with a new `transaction` field.
This field provides information about every event in the form of a composite of fields. With ordered transaction metadata enabled, there are two additional fields included: This field provides information about every event in the form of a composite of fields. With ordered transaction metadata enabled, there are two additional fields included:
* `transaction_epoch` - non-decreasing value representing the "epoch" that its transaction rank belongs to `transaction_epoch`:: A non-decreasing value that represents the epoch that the transaction rank belongs to.
* `transaction_rank` - non-decreasing value (within an epoch) that represents the order of the transaction `transaction_rank`:: A non-decreasing value within an epoch that represents the order of the transaction.
Additionally, a third field (already included in standard transaction metadata) is also relevant for ordering: A third field is also relevant to event ordering:
* `total_order` absolute position of the event among all events generated by the transaction `total_order`:: Represents the absolute position of an event among all events generated by a transaction.
This field is included by default in the standard transaction metadata.
For how these fields can be used to establish an ordering of events. Consider the following example. The following example illustrates how to use these fields to establish event order.
There are two data change events that occurred in the same shard and for the same primary key. However, kafka experienced a re-partition event so the consumer order of the two events cannot be trusted. A downstream application can determine which event to apply (the newer event) and which to discard with the following logic: Suppose {prodname} emits change event records for two events that occur in the same shard and share the same primary key.
If the Kafka topic where these events are sent is repartitioned, then the consumer order of the two events cannot be trusted.
If {prodname} is configured to provide enriched transaction metadata, applications that consume from the topic can apply the following logic to determine which of the two event to apply (the newer event) and which to discard:
1. If `transaction_epoch` is not equal, return the event with the grater epoch. Otherwise, continue 1. If the values for `transaction_epoch` are not equal, return the event with the higher `transaction_epoch` value. Otherwise, continue.
2. If `transaction_rank` is not equal, return the event with the greater rank. Otherwise, continue: 2. If the values for `transaction_rank` are not equal, return the event with the higher `transaction_rank` value. Otherwise, continue.
3. Return the event with a greater `total_order` 3. Return the event with a greater `total_order` value.
If we reach step 3, the two events are in the same transaction. The `total_order` field represents the order of events within a transaction, so a greater value will be the newer event. If neither of the two events have a higher `total_order` value, then the events are part of the same transaction.
Because the `total_order` field represents the order of events within a transaction, the event with the greater value is the most recent event.
Following is an example of a data change event with ordered transaction metadata: The following example shows a data change event with ordered transaction metadata:
[source,json,indent=0,subs="+attributes"] [source,json,indent=0,subs="+attributes"]
---- ----
@ -1057,22 +1062,33 @@ a|_n/a_
[id="vitess-temporal-types"] [id="vitess-temporal-types"]
=== Temporal types === Temporal types
Excluding the `TIMESTAMP` data type, Vitess temporal types depend on the value of the `time.precision.mode` connector configuration property. Excluding the `TIMESTAMP` data type, Vitess temporal types depend on the value of the xref:vitess-property-time-precision-mode[`time.precision.mode`] connector configuration property.
.Temporal values without time zones .Temporal values without time zones
The `DATETIME` type represents a local date and time such as "2018-01-13 09:48:27". As you can see, there is no time zone information. Such columns are converted into epoch milliseconds or microseconds based on the columns precision by using UTC. The `TIMESTAMP` type represents a timestamp without time zone information. It is converted by MySQL from the server (or sessions) current time zone into UTC when writing and from UTC into the server (or session's) current time zone when reading back the value. For example: The `DATETIME` type represents a local date and time such as "2018-01-13 09:48:27".
As you can see in the preceding example, this type does not include time zone information.
Columns of this type are converted into epoch milliseconds or microseconds based on the columns precision by using UTC.
The `TIMESTAMP` type represents a timestamp without time zone information.
When writing data, MySQL converts the `TIMESTAMP` type from the time zone of the server or session into UTC format.
When it reads values, the database converts from UTC format to the current time zone of the server or session.
For example:
* `DATETIME` with a value of `2018-06-20 06:37:03` becomes `1529476623000`. * `DATETIME` with a value of `2018-06-20 06:37:03` becomes `1529476623000`.
* `TIMESTAMP` with a value of `2018-06-20 06:37:03` becomes `2018-06-20T13:37:03Z`. * `TIMESTAMP` with a value of `2018-06-20 06:37:03` becomes `2018-06-20T13:37:03Z`.
Such columns are converted into an equivalent `io.debezium.time.ZonedTimestamp` in UTC based on the server (or sessions) current time zone. The time zone will be queried from the server by default. If this fails, it must be specified explicitly by the database `connectionTimeZone` MySQL configuration option. For example, if the databases time zone (either globally or configured for the connector by means of the `connectionTimeZone` option) is "America/Los_Angeles", the TIMESTAMP value "2018-06-20 06:37:03" is represented by a `ZonedTimestamp` with the value "2018-06-20T13:37:03Z". Such columns are converted into an equivalent `io.debezium.time.ZonedTimestamp` in UTC, based on the time zone of the server or session.
By default, {prodname} queries the server for the time zone.
If this fails, you must explicitly specify the timezone by setting the `connectionTimeZone` option in the JDBC connection string.
For example, if the databases time zone (either globally, or as configured for the connector by means of the `connectionTimeZone` option) is "America/Los_Angeles", the TIMESTAMP value "2018-06-20 06:37:03" is represented by a `ZonedTimestamp` with the value "2018-06-20T13:37:03Z".
The time zone of the JVM running Kafka Connect and Debezium does not affect these conversions. The time zone of the JVM that runs Kafka Connect and {prodname} does not affect these conversions.
More details about properties related to temporal values are in the documentation for xref:vitess-connector-properties[Connector configuration properties]. For more information about properties that affect temporal values, see the xref:vitess-connector-properties[connector configuration properties].
time.precision.mode=adaptive_time_microseconds(default):: time.precision.mode=adaptive_time_microseconds(default)::
The Vitess connector determines the literal type and semantic type based on the column's data type definition so that events represent exactly the values in the database. All time fields are in microseconds. Only positive `TIME` field values in the range of `00:00:00.000000` to `23:59:59.999999` can be captured correctly. The Vitess connector determines the literal type and semantic type based on the column's data type definition so that events represent exactly the values in the database.
All time fields are in microseconds.
Only positive `TIME` field values in the range of `00:00:00.000000` to `23:59:59.999999` can be captured correctly.
+ +
.Mappings when `time.precision.mode=adaptive_time_microseconds` .Mappings when `time.precision.mode=adaptive_time_microseconds`
[cols="25%a,20%a,55%a",options="header",subs="+attributes"] [cols="25%a,20%a,55%a",options="header",subs="+attributes"]
@ -1082,27 +1098,32 @@ The Vitess connector determines the literal type and semantic type based on the
|`DATE` |`DATE`
|`INT32` |`INT32`
a|`io.debezium.time.Date` + a|`io.debezium.time.Date` +
Represents the number of days since the epoch. Represents the number of days elapsed since the UNIX epoch.
|`TIME[(M)]` |`TIME[(M)]`
|`INT64` |`INT64`
a|`io.debezium.time.MicroTime` + a|`io.debezium.time.MicroTime` +
Represents the time value in microseconds and does not include time zone information. MySQL allows `M` to be in the range of `0-6`. Represents the time value in microseconds and does not include time zone information.
MySQL allows `M` to be in the range of `0-6`.
|`DATETIME, DATETIME(0), DATETIME(1), DATETIME(2), DATETIME(3)` |`DATETIME, DATETIME(0), DATETIME(1), DATETIME(2), DATETIME(3)`
|`INT64` |`INT64`
a|`io.debezium.time.Timestamp` + a|`io.debezium.time.Timestamp` +
Represents the number of milliseconds past the epoch and does not include time zone information. Represents the number of milliseconds elapsed since the UNIX epoch and does not include time zone information.
|`DATETIME(4), DATETIME(5), DATETIME(6)` |`DATETIME(4), DATETIME(5), DATETIME(6)`
|`INT64` |`INT64`
a|`io.debezium.time.MicroTimestamp` + a|`io.debezium.time.MicroTimestamp` +
Represents the number of microseconds past the epoch and does not include time zone information. Represents the number of microseconds elapsed since the UNIX epoch and does not include time zone information.
|=== |===
time.precision.mode=connect:: time.precision.mode=connect::
The Vitess connector uses defined Kafka Connect logical types. This approach is less precise than the default approach and the events could be less precise if the database column has a _fractional second precision_ value of greater than `3`. Values in only the range of `00:00:00.000` to `23:59:59.999` can be handled. Set `time.precision.mode=connect` only if you can ensure that the `TIME` values in your tables never exceed the supported ranges. The `connect` setting is expected to be removed in a future version of {prodname}. The Vitess connector uses defined Kafka Connect logical types.
This approach is less precise than the default approach, and the events could be less precise if the database column has a _fractional second precision_ value that is greater than `3`.
The connector can process values that range from `00:00:00.000` to `23:59:59.999`.
Set `time.precision.mode=connect` only if you are certain that the `TIME` values in your tables never exceed the supported ranges.
The `connect` setting is expected to be removed in a future version of {prodname}.
+ +
.Mappings when `time.precision.mode=connect` .Mappings when `time.precision.mode=connect`
[cols="25%a,20%a,55%a",options="header",subs="+attributes"] [cols="25%a,20%a,55%a",options="header",subs="+attributes"]
@ -1112,7 +1133,7 @@ The Vitess connector uses defined Kafka Connect logical types. This approach is
|`DATE` |`DATE`
|`INT32` |`INT32`
a|`org.apache.kafka.connect.data.Date` + a|`org.apache.kafka.connect.data.Date` +
Represents the number of days since the epoch. Represents the number of days elapsed since the UNIX epoch.
|`TIME[(M)]` |`TIME[(M)]`
|`INT64` |`INT64`
@ -1122,7 +1143,7 @@ Represents the time value in microseconds since midnight and does not include ti
|`DATETIME[(M)]` |`DATETIME[(M)]`
|`INT64` |`INT64`
a|`org.apache.kafka.connect.data.Timestamp` + a|`org.apache.kafka.connect.data.Timestamp` +
Represents the number of milliseconds since the epoch, and does not include time zone information. Represents the number of milliseconds elapsed since the UNIX epoch, and does not include time zone information.
|=== |===
@ -1549,11 +1570,14 @@ When the connector starts, it skips the snapshot process and immediately begins
|[[vitess-property-time-precision-mode]]<<vitess-property-time-precision-mode, `+time.precision.mode+`>> |[[vitess-property-time-precision-mode]]<<vitess-property-time-precision-mode, `+time.precision.mode+`>>
|`adaptive_time_microseconds` |`adaptive_time_microseconds`
|Time, date, and timestamps can be represented with different kinds of precision, including: + |You can set the following options to determine how {prodname} represents the precision of time, date, and timestamps values: +
+ +
`adaptive_time_microseconds` (the default) captures the date, datetime and timestamp values exactly as in the database using either millisecond, microsecond, or nanosecond precision values based on the database column's type, with the exception of TIME type fields, which are always captured as microseconds. + `adaptive_time_microseconds`::
+ (Default) Captures date, datetime, and timestamp values exactly as they exist in the database.
`connect` always represents time and timestamp values using Kafka Connect's built-in representations for Time, Date, and Timestamp, which use millisecond precision regardless of the database columns' precision. Values are represented with a precision in milliseconds, microseconds, or nanoseconds, depending on the database column type, with the exception of `TIME` type fields, which are always captured as microseconds.
`connect`::
Time and timestamp values are always represented in the default Kafka Connect formats for Time, Date, and Timestamp, which use millisecond precision regardless of the database columns' precision.
|[[vitess-property-bigint-unsigned-handling-mode]]<<vitess-property-bigint-unsigned-handling-mode,`+bigint.unsigned.handling.mode.mode+`>> |[[vitess-property-bigint-unsigned-handling-mode]]<<vitess-property-bigint-unsigned-handling-mode,`+bigint.unsigned.handling.mode.mode+`>>
|string |string
@ -1651,8 +1675,9 @@ By default, truncate operations are skipped (not emitted by this connector).
|[[vitess-property-transaction-metadata-factory]]<<vitess-property-transaction-metadata-factory, `transaction.metadata.factory`>> |[[vitess-property-transaction-metadata-factory]]<<vitess-property-transaction-metadata-factory, `transaction.metadata.factory`>>
|`io.debezium.pipeline.txmetadata.DefaultTransactionMetadataFactory` |`io.debezium.pipeline.txmetadata.DefaultTransactionMetadataFactory`
|Determines the class used to track transaction context and build transaction struct & schemas. |Determines the class that the connector uses to track transaction context and build the data structures and schemas to represent transactions.
`io.debezium.connector.vitess.pipeline.txmetadata.VitessOrderedTransactionMetadataFactory` will provide additional transaction metadata for establishing ordering between two events independent of the order they are consumed. See xref:vitess-ordered-transaction-metadata[Ordered transaction metadata] for details. `io.debezium.connector.vitess.pipeline.txmetadata.VitessOrderedTransactionMetadataFactory` provides additional transaction metadata that can help consumers to interpret the correct order of two events, regardless of the order in which they are consumed.
For more information, see xref:vitess-ordered-transaction-metadata[Ordered transaction metadata].
|[[vitess-property-keepalive-interval-ms]]<<vitess-property-keepalive-interval-ms, `+vitess.keepalive.interval.ms+`>> |[[vitess-property-keepalive-interval-ms]]<<vitess-property-keepalive-interval-ms, `+vitess.keepalive.interval.ms+`>>
|`Long.MAX_VALUE` |`Long.MAX_VALUE`