DBZ-5812 Update snapshot.mode description and extended stoppage topic

2022-11-29 17:20:37 -05:00 · 2022-11-29 17:20:37 -05:00 · c46d467cf1
commit c46d467cf1
parent 97ea30f25d
1 changed files with 21 additions and 13 deletions
--- a/documentation/modules/ROOT/pages/connectors/mongodb.adoc
+++ b/documentation/modules/ROOT/pages/connectors/mongodb.adoc
@ -1405,7 +1405,14 @@ If you include this property in the configuration, do not set the `collection.in

 |[[mongodb-property-snapshot-mode]]<<mongodb-property-snapshot-mode, `+snapshot.mode+`>>
 |`initial`
-|Specifies the criteria for running a snapshot upon startup of the connector. The default is *initial*, and specifies that the connector reads a snapshot when either no offset is found or if the change stream no longer contains the previous offset. The *never* option specifies that the connector should never use snapshots, instead the connector should proceed to tail the log.
+|Specifies the criteria for performing a snapshot when the connector starts.
+Set the property to one of the following values: +
+
+`initial`::
+The connector performs a snapshot when it starts if it does not detect an offset in the oplog.
+
+`never`:: When the connector starts, it does not perform a snapshot.
+Instead of running a snapshot, the connector tails the oplog, and emits `read` events for the most recently recorded transactions.

 |[[mongodb-property-capture-mode]]<<mongodb-property-capture-mode, `+capture.mode+`>>
 |`change_streams_update_full`
@ -1817,21 +1824,22 @@ Because there is a chance that some events may be duplicated during a recovery f
 As the connector generates change events, the Kafka Connect framework records those events in Kafka using the Kafka producer API. Kafka Connect will also periodically record the latest offset that appears in those change events, at a frequency that you have specified in the Kafka Connect worker configuration. If the Kafka brokers become unavailable, the Kafka Connect worker process running the connectors will simply repeatedly attempt to reconnect to the Kafka brokers. In other words, the connector tasks will simply pause until a connection can be reestablished, at which point the connectors will resume exactly where they left off.

 [id="debezium-mongodb-connector-is-stopped-for-a-long-interval"]
-=== Connector is stopped for a long interval
+=== Connector fails after it is stopped for a long interval

-If the connector is gracefully stopped, the replica sets can continue to be used and any new changes are recorded in MongoDB's oplog.
-When the connector is restarted, it will resume streaming changes for each replica set where it last left off, recording change events for all of the changes that were made while the connector was stopped.
-If the connector is stopped long enough such that MongoDB purges from its oplog some operations that the connector has not read, then upon startup the connector will perform a snapshot.
+If the connector is gracefully stopped, replica sets can continue to be used.
+Changes that occur while the connector is offline continue to be recorded in MongoDB's oplog.
+In most cases, after the connector is restarted, it reads the offset value in the oplog to determine the last operation that it streamed for each replica set, and then resumes streaming changes from that point.
+Database operations that occurred while the connector was stopped are emitted to Kafka as usual, and after some time, the connector catches up with the database.
+The amount of time required for the connector to catch up depends upon the capabilities and performance of Kafka and the volume of changes that occurred in the database.

-A properly configured Kafka cluster is capable of massive throughput.
-Kafka Connect is written with Kafka best practices, and given enough resources will also be able to handle very large numbers of database change events.
-Because of this, when a connector has been restarted after a while, it is very likely to catch up with the database, though how quickly will depend upon the capabilities and performance of Kafka and the volume of changes being made to the data in MongoDB.
+However, if the connector remains stopped for a long enough interval, it can occur that MongoDB purges the oplog while the connector is inactive, so that the connector's last position is lost.
+After the connector restarts, it cannot read the previous offset value to determine where to resume streaming.
+Typically, if the connector's `snapshot.mode` property is set to the default value (`initial`), when it finds no offset in the data source, it runs a snapshot.
+But in this situation, the connector detects stored offset values in its Kafka topic, but it cannot find a matching value in the database.
+An error results and the connector fails.

-[NOTE]
-====
-If the connector remains stopped for long enough, MongoDB might purge older oplog files and the connector's last position may be lost.
-In this case, when the connector configured with _initial_ snapshot mode (the default) is finally restarted, the MongoDB server will no longer have the starting point and the connector will fail with an error.
-====
+To recover from the failure, delete the failed connector, and create a new connector with the same configuration but with a different connector name.
+When you start the new connector, it performs a snapshot to ingest the state of database, and then resumes streaming.

 [id="mongodb-crash-results-in-lost-commits"]
 === MongoDB loses writes