NO-JIRA Update doc with feedback from SME/QE review

This commit is contained in:
Ben Hardesty 2020-03-27 15:44:05 -04:00 committed by Gunnar Morling
parent 7f016e3766
commit 0725f928e5
16 changed files with 72 additions and 52 deletions

View File

@ -18,6 +18,12 @@ asciidoc:
link-prefix: 'xref'
link-mysql-connector: 'connectors/mysql.adoc'
link-deploy-mysql-connector: 'assemblies/cdc-mysql-connector/as_deploy-the-mysql-connector.adoc'
link-mongodb-connector: 'connectors/mongodb.adoc'
link-postgresql-connector: 'connectors/postgresql.adoc'
link-oracle-connector: 'connectors/oracle.adoc'
link-sqlserver-connector: 'connectors/sqlserver.adoc'
link-db2-connector: 'connectors/db2.adoc'
link-cassandra-connector: 'connectors/cassandra.adoc'
link-mysql-plugin-snapshot: 'https://oss.sonatype.org/service/local/artifact/maven/redirect?r=snapshots&g=io.debezium&a=debezium-connector-mysql&v=LATEST&c=plugin&e=tar.gz'
link-postgres-plugin-snapshot: 'https://oss.sonatype.org/service/local/artifact/maven/redirect?r=snapshots&g=io.debezium&a=debezium-connector-postgres&v=LATEST&c=plugin&e=tar.gz'
link-mongodb-plugin-snapshot: 'https://oss.sonatype.org/service/local/artifact/maven/redirect?r=snapshots&g=io.debezium&a=debezium-connector-mongodb&v=LATEST&c=plugin&e=tar.gz'

View File

@ -895,6 +895,7 @@ The following _advanced_ configuration properties have good defaults that will w
|`true`
|Boolean value that specifies whether the addresses in 'mongodb.hosts' are seeds that should be used to discover all members of the cluster or replica set (`true`), or whether the address(es) in `mongodb.hosts` should be used as is (`false`). The default is `true` and should be used in all cases except where MongoDB is link:#mongodb-replicaset[fronted by a proxy].
ifndef::cdc-product[]
|`source.struct.version`
|v2
|Schema version for the `source` block in CDC events. Debezium 0.10 introduced a few breaking +
@ -902,6 +903,7 @@ changes to the structure of the `source` block in order to unify the exposed str
all the connectors. +
By setting this option to `v1` the structure used in earlier versions can be produced.
Note that this setting is not recommended and is planned for removal in a future {prodname} version.
endif::cdc-product[]
|`heartbeat.interval.ms`
|`0`

View File

@ -1504,6 +1504,7 @@ When set to `0` the connector will fail immediately when it cannot obtain the lo
This property contains a comma-separated list of fully-qualified tables _(SCHEMA_NAME.TABLE_NAME)_. Select statements for the individual tables are specified in further configuration properties, one for each table, identified by the id `snapshot.select.statement.overrides.[SCHEMA_NAME].[TABLE_NAME]`. The value of those properties is the SELECT statement to use when retrieving data from the specific table during snapshotting. _A possible use case for large append-only tables is setting a specific point where to start (resume) snapshotting, in case a previous snapshotting was interrupted._ +
*Note*: This setting has impact on snapshots only. Events captured during log reading are not affected by it.
ifndef::cdc-product[]
|`source.struct.version`
|v2
|Schema version for the `source` block in CDC events; Debezium 0.10 introduced a few breaking +
@ -1511,6 +1512,7 @@ changes to the structure of the `source` block in order to unify the exposed str
all the connectors. +
By setting this option to `v1` the structure used in earlier versions can be produced.
Note that this setting is not recommended and is planned for removal in a future {prodname} version.
endif::cdc-product[]
|`sanitize.field.names`
|`true` when connector configuration explicitly specifies the `key.converter` or `value.converter` parameters to use Avro, otherwise defaults to `false`.

View File

@ -4,11 +4,11 @@
[id="how-the-mysql-connector-handles-schema-change-topics_{context}"]
= How the MySQL connector handles schema change topics
You can configure the {prodname} *MySQL connector* to produce schema change events that include all DDL statements applied to databases in the MySQL server. The connector writes all of these events to a Kakfa topic named `<serverName>` where `serverName` is the name of the connector as specified in the `database.server.name` configuration property.
You can configure the {prodname} *MySQL connector* to produce schema change events that include all DDL statements applied to databases in the MySQL server. The connector writes all of these events to a Kafka topic named `<serverName>` where `serverName` is the name of the connector as specified in the `database.server.name` configuration property.
IMPORTANT: If you choose to use _schema change events_, use the schema change topic and *do not* consume the database history topic.
NOTE: Make sure that the `num.partitions` configuration for Kakfa is set to `1` to ensure schema changes are kept in the correct order.
NOTE: Make sure that the `num.partitions` configuration for Kafka is set to `1` to ensure schema changes are kept in the correct order.
== Schema change topic structure

View File

@ -8,8 +8,8 @@ The {prodname} MySQL connector represents changes to rows with events that are s
Columns that store strings are defined in MySQL with a character set and collation. The MySQL connector uses the column's character set when reading the binary representation of the column values in the binlog events. The following table shows how the connector maps the MySQL data types to both _literal_ and _semantic_ types.
* *literal type* : how the value is represented using Kakfa Connect schema types
* *semantic type* : how the Kakfa Connect schema captures the meaning of the field (schema name)
* *literal type* : how the value is represented using Kafka Connect schema types
* *semantic type* : how the Kafka Connect schema captures the meaning of the field (schema name)
[cols="2,2,6"]
|===
@ -192,7 +192,7 @@ NOTE: Represents the number of microseconds past epoch and does not include time
+
time.precision.mode=connect::
The MySQL connector uses the predefined Kakfa Connect logical types. This approach is less precise than the default approach and the events could be less precise if the database column has a _fractional second precision_ value of greater than `3`.
The MySQL connector uses the predefined Kafka Connect logical types. This approach is less precise than the default approach and the events could be less precise if the database column has a _fractional second precision_ value of greater than `3`.
+
[cols="2,2,6"]
|===
@ -228,7 +228,7 @@ TIP: See xref:assemblies/cdc-mysql-connector/as_deploy-the-mysql-connector.adoc#
decimal.handling.mode=precise::
+
[cols="2,2,6"]
[cols="3,2,5"]
|===
|MySQL type |Literal type |Semantic type
@ -236,13 +236,13 @@ decimal.handling.mode=precise::
|`BYTES`
a| org.apache.kafka.connect.data.Decimal
NOTE: The `scaled` schema parameter contains an integer that represents how many digits the decimal point shifted.
NOTE: The `scale` schema parameter contains an integer that represents how many digits the decimal point shifted.
|`DECIMAL[(M[,D])]`
|`BYTES`
a| org.apache.kafka.connect.data.Decimal
NOTE: The `scaled` schema parameter contains an integer that represents how many digits the decimal point shifted.
NOTE: The `scale` schema parameter contains an integer that represents how many digits the decimal point shifted.
|===
+
@ -250,7 +250,7 @@ NOTE: The `scaled` schema parameter contains an integer that represents how many
decimal.handling.mode=double::
+
[cols="2,2,6"]
[cols="3,2,5"]
|===
|MySQL type |Literal type |Semantic type
@ -268,7 +268,7 @@ a| _n/a_
decimal.handling.mode=string::
+
[cols="2,2,6"]
[cols="3,2,5"]
|===
|MySQL type |Literal type |Semantic type

View File

@ -7,7 +7,7 @@
When your {prodname} MySQL connector is first started, it performs an initial _consistent snapshot_ of your database. The following flow describes how this snapshot is completed.
NOTE: This is the default snapshot mode which is set as `inital` in the `snapshot.mode` property. For other snapshots modes, please check out the xref:assemblies/cdc-mysql-connector/as_deploy-the-mysql-connector.adoc#mysql-connector-configuration-properties_{context}[MySQL connector configuration properties].
NOTE: This is the default snapshot mode which is set as `initial` in the `snapshot.mode` property. For other snapshots modes, please check out the xref:assemblies/cdc-mysql-connector/as_deploy-the-mysql-connector.adoc#mysql-connector-configuration-properties_{context}[MySQL connector configuration properties].
ifeval::["{isImageReady}" == "true"]
image:debezium-architecture.png[Debezium Architecture]
@ -43,7 +43,7 @@ a| Writes the DDL changes to the schema change topic, including all necessary `D
NOTE: This happens if applicable.
|``{counter:snapshotStep}``
a| Scans the database tables and generates `CREATE` events on the relevant table-specific Kakfa topics for each row.
a| Scans the database tables and generates `CREATE` events on the relevant table-specific Kafka topics for each row.
|``{counter:snapshotStep}``
a| Commits the transaction.
@ -91,7 +91,7 @@ a| Writes the DDL changes to the schema change topic, including all necessary `D
NOTE: This happens if applicable.
|``{counter:snapshotStep-noLock}``
a| Scans the database tables and generates `CREATE` events on the relevant table-specific Kakfa topics for each row.
a| Scans the database tables and generates `CREATE` events on the relevant table-specific Kafka topics for each row.
|``{counter:snapshotStep-noLock}``
a| Commits the transaction.
@ -103,4 +103,3 @@ a| Commits the transaction.
a| Records the completed snapshot in the connector offsets.
|===

View File

@ -5,7 +5,7 @@
All data change events produced by the {prodname} MySQL connector contain a key and a value. The change event key and the change event value each contain a _schema_ and a _payload_ where the schema describes the structure of the payload and the payload contains the data.
WARNING: The MySQL connector ensures that all Kakfa Connect schema names adhere to the link:http://avro.apache.org/docs/current/spec.html#names[Avro schema name format]. This is important as any character that is not a latin letter or underscore is replaced by an underscore which can lead to unexpected conflicts in schema names when the logical server names, database names, and table names container other characters that are replaced with these underscores.
WARNING: The MySQL connector ensures that all Kafka Connect schema names adhere to the link:http://avro.apache.org/docs/current/spec.html#names[Avro schema name format]. This is important as any character that is not a latin letter or underscore is replaced by an underscore which can lead to unexpected conflicts in schema names when the logical server names, database names, and table names container other characters that are replaced with these underscores.
== Change event key
@ -79,7 +79,7 @@ a| A *mandatory* string that describes the type of operation.
* `c` = create
* `u` = update
* `d` = delete
* `r` = read (non _initial snapshot_ only)
* `r` = read (_initial snapshot_ only)
|3
|`before`
@ -104,13 +104,13 @@ a| A *mandatory* field that describes the source metadata for the event includin
* the MySQL server ID (if available)
* timestamp
NOTE: If the xref:assemblies/cdc-mysql-connector/as_setup-the-mysql-server.adoc#enable-query-log-events-for-cdc_{context}[binlog_rows_query_log_events] option is enabled and the connector has the `include.query` option enabled, a `query` field displays which contains the original SQL statement that generated the event.
NOTE: If the xref:assemblies/cdc-mysql-connector/as_setup-the-mysql-server.adoc#enable-query-log-events-for-cdc_{context}[binlog_rows_query_log_events] option is enabled and the connector has the `include.query` option enabled, a `query` field is displayed which contains the original SQL statement that generated the event.
|6
|`ts_ms`
a| An optional field that displays the time at which the connector processed the event.
NOTE: The time is based on the system clock in the JVM running the Kakfa Connect task.
NOTE: The time is based on the system clock in the JVM running the Kafka Connect task.
|===

View File

@ -4,9 +4,9 @@
[id="enable-query-log-events-for-cdc_{context}"]
= Enabling query log events for {prodname}
You might want to see the original `SQL` statement for each binlog event. Enabling the `binlog_rows_query_log_events` options in the MySQL configuration file allows you to do this.
You might want to see the original `SQL` statement for each binlog event. Enabling the `binlog_rows_query_log_events` option in the MySQL configuration file allows you to do this.
NOTE: This options is only available from MySQL 5.6 and later.
NOTE: This option is only available from MySQL 5.6 and later.
.Prerequisites

View File

@ -5,7 +5,7 @@
= Enabling the MySQL binlog for {prodname}
// Start the title of a procedure module with a verb, such as Creating or Create. See also _Wording of headings_ in _The IBM Style Guide_.
You must enable binary logging for MySQL replication. The binary logs record transaction updates for replication tools to propogate changes.
You must enable binary logging for MySQL replication. The binary logs record transaction updates for replication tools to propagate changes.
.Prerequisites

View File

@ -8,7 +8,7 @@ The configuration properties listed here are *required* to run the {prodname} My
TIP: The {prodname} MySQL connector supports _pass-through_ configuration when creating the Kafka producer and consumer. See link:http://kafka.apache.org/documentation.html[the Kafka documentation] for more details on _pass-through_ properties.
[cols="3,1,6"]
[cols="3,2,5"]
|===
|Property |Default |Description
@ -190,7 +190,7 @@ Fully-qualified tables could be defined as `DB_NAME.TABLE_NAME` or `SCHEMA_NAME.
== Advanced MySQL connector properties
[[advanced-mysql-connector-properties]]
[cols="3,1,6"]
[cols="3,2,5"]
|===
|Property |Default |Description
@ -307,6 +307,7 @@ This is usually done by database. +
Set to `true` (the default) when {prodname} should do the conversion. +
Set to `false` when conversion is fully delegated to the database.
ifndef::cdc-product[]
|`source.struct.version`
|v2
|Schema version for the `source` block in {prodname} events; {prodname} 0.10 introduced a few breaking +
@ -314,6 +315,7 @@ changes to the structure of the `source` block in order to unify the exposed str
all the connectors. +
By setting this option to `v1` the structure used in earlier versions can be produced.
Note that this setting is not recommended and is planned for removal in a future {prodname} version.
endif::cdc-product[]
|`sanitize.field.names`
|`true` when connector configuration explicitly specifies the `key.converter` or `value.converter` parameters to use Avro, otherwise defaults to `false`.

View File

@ -15,7 +15,7 @@ The {prodname} MySQL connector has three metric types in addition to the built-i
The *MBean* is `debezium.mysql:type=connector-metrics,context=snapshot,server=<database.server.name>`.
[cols="3,1,6"]
[cols="3,2,5"]
|===
|Attribute |Type |Description
@ -89,7 +89,7 @@ The *MBean* is `debezium.mysql:type=connector-metrics,context=binlog,server=<dat
NOTE: The transaction-related attributes are only available if binlog event buffering is enabled. See xref:assemblies/cdc-mysql-connector/as_deploy-the-mysql-connector.adoc#mysql-connector-configuration-properties_{context}[binlog.buffer.size] in the advanced connector configuration properties for more details.
[cols="3,1,6"]
[cols="3,2,5"]
|===
|Attribute |Type |Description
@ -196,7 +196,7 @@ NOTE: The transaction-related attributes are only available if binlog event buff
The *MBean* is `debezium.mysql:type=connector-metrics,context=schema-history,server=<database.server.name>`.
[cols="3,1,6"]
[cols="3,2,5"]
|===
|Attribute |Type |Description

View File

@ -4,6 +4,6 @@
[id="mysql-purges-binlog-files_{context}"]
= MySQL purges binlog files
If the {prodname} MySQL connector stops for too long, the MySQL server purges older binlog files and the connector's last position may be lost. When the connector is restarted, the MySQL server no longer has the starting point and the connector performs another initial snapshot. If the snapshot mode is `disabled`, the connector fails with an error.
If the {prodname} MySQL connector stops for too long, the MySQL server purges older binlog files and the connector's last position may be lost. When the connector is restarted, the MySQL server no longer has the starting point and the connector performs another initial snapshot. If the snapshot is disabled, the connector fails with an error.
TIP: See xref:assemblies/cdc-mysql-connector/as_overview-of-how-the-mysql-connector-works.adoc#how-the-mysql-connector-performs-database-snapshots_{context}[How the MySQL connector performs database snapshots] for more information on initial snapshots.

View File

@ -7,33 +7,22 @@
The {prodname} MySQL connector supports the following MySQL topologies:
Standalone::
====
When a single MySQL server is used, the server must have the binlog enabled (_and optionally GTIDs enabled_) so the {prodname} MySQL connector can monitor the server. This is often acceptable, since the binary log can also be used as an incremental link:https://dev.mysql.com/doc/refman/5.7/en/backup-methods.html[backup]. In this case, the MySQL connector always connects to and follows this standalone MySQL server instance.
====
When a single MySQL server is used, the server must have the binlog enabled (_and optionally GTIDs enabled_) so the {prodname} MySQL connector can monitor the server. This is often acceptable, since the binary log can also be used as an incremental link:https://dev.mysql.com/doc/refman/{mysql-version}/en/backup-methods.html[backup]. In this case, the MySQL connector always connects to and follows this standalone MySQL server instance.
Master and slave::
====
The {prodname} MySQL connector can follow one of the masters or one of the slaves (_if that slave has its binlog enabled_), but the connector only sees changes in the cluster that are visible to that server. Generally, this is not a problem except for the multi-master topologies.
+
The connector records its position in the servers binlog, which is different on each server in the cluster. Therefore, the connector will need to follow just one MySQL server instance. If that server fails, it must be restarted or recovered before the connector can continue.
====
High available clusters::
====
A variety of link:https://dev.mysql.com/doc/mysql-ha-scalability/en/[high availability] solutions exist for MySQL, and they make it far easier to tolerate and almost immediately recover from problems and failures. Most HA MySQL clusters use GTIDs so that slaves are able to keep track of all changes on any of the master.
====
Multi-master::
====
A link:https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-replication-multi-master.html[multi-master MySQL topology] uses one or more MySQL slaves that each replicate from multiple masters. This is a powerful way to aggregate the replication of multiple MySQL clusters, and requires using GTIDs.
A link:https://dev.mysql.com/doc/refman/{mysql-version}/en/mysql-cluster-replication-multi-master.html[multi-master MySQL topology] uses one or more MySQL slaves that each replicate from multiple masters. This is a powerful way to aggregate the replication of multiple MySQL clusters, and requires using GTIDs.
+
The {prodname} MySQL connector can use these multi-master MySQL slaves as sources, and can fail over to different multi-master MySQL slaves as long as thew new slave is caught up to the old slave (_e.g., the new slave has all of the transactions that were last seen on the first slave_). This works even if the connector is only using a subset of databases and/or tables, as the connector can be configured to include or exclude specific GTID sources when attempting to reconnect to a new multi-master MySQL slave and find the correct position in the binlog.
====
Hosted::
====
There is support for the {prodname} MySQL connector to use hosted options such as Amazon RDS and Amazon Aurora.
+
IMPORTANT: Because these hosted options do not allow a *global read lock*, table-level locks are used to create the _consistent snapshot_.
====

View File

@ -0,0 +1,16 @@
[id="metrics-monitoring-connectors"]
= Metrics for monitoring {prodname} connectors
In addition to the built-in support for JMX metrics in Kafka, Zookeeper, and Kafka Connect,
each connector provides additional metrics that you can use to monitor their activities.
* {link-prefix}:{link-deploy-mysql-connector}#mysql-connector-monitoring-metrics_{context}[MySQL connector metrics]
* {link-prefix}:{link-mongodb-connector}#mongodb-connector-monitoring[MongoDB connector metrics]
* {link-prefix}:{link-postgresql-connector}#monitoring[PosgreSQL connector metrics]
* {link-prefix}:{link-sqlserver-connector}#monitoring[SQL Server connector metrics]
ifndef::cdc-product[]
* {link-prefix}:{link-oracle-connector}#monitoring[Oracle connector metrics]
* {link-prefix}:{link-db2-connector}#monitoring[Db2 connector metrics]
* {link-prefix}:{link-cassandra-connector}#monitoring[Cassandra connector metrics]
endif::cdc-product[]

View File

@ -1,4 +1,3 @@
[id="viewing-create-event"]
= Viewing a _create_ event
@ -334,12 +333,15 @@ However, the Kafka topic containing all of the events for a single table might h
The JSON converter includes the key and value schemas in every message,
so it does produce very verbose events.
// The following condition can be removed when the downstream supports Avro.
ifndef::cdc-product[]
Alternatively, you can use the link:http://docs.confluent.io/3.1.2/schema-registry/docs/index.html[Avro converter], which results in far smaller event messages.
This is because it transforms each Kafka Connect schema into an Avro schema and stores the Avro schemas in a separate Schema Registry service.
Thus, when the Avro converter serializes an event message,
it places only a unique identifier for the schema along with an Avro-encoded binary representation of the value.
As a result, the serialized messages that are transferred over the wire and stored in Kafka are far smaller than what you have seen here.
In fact, the Avro Converter is able to use Avro schema evolution techniques to maintain the history of each schema in the Schema Registry.
endif::cdc-product[]
====
--

View File

@ -22,4 +22,6 @@ include::../assemblies/monitoring/as_enabling-jmx-local-installations.adoc[level
include::../assemblies/monitoring/as_enabling-jmx-docker.adoc[leveloffset=+1]
include::../modules/monitoring/c_metrics-monitoring-connectors.adoc[leveloffset=+1]
include::../modules/monitoring/c_using-prometheus-grafana.adoc[leveloffset=+1]