tet123/debezium-connector-mongodb/NOTES.md
Randall Hauch 12e7cfb8d3 DBZ-2 Created initial Maven module with a MongoDB connector
Added a new `debezium-connector-mongodb` module that defines a MongoDB connector. The MongoDB connector can capture and record the changes within a MongoDB replica set, or when seeded with addresses of the configuration server of a MongoDB sharded cluster, the connector captures the changes from the each replica set used as a shard. In the latter case, the connector even discovers the addition of or removal of shards.

The connector monitors each replica set using multiple tasks and, if needed, separate threads within each task. When a replica set is being monitored for the first time, the connector will perform an "initial sync" of that replica set's databases and collections. Once the initial sync has completed, the connector will then begin tailing the oplog of the replica set, starting at the exact point in time at which it started the initial sync. This equivalent to how MongoDB replication works.

The connector always uses the replica set's primary node to tail the oplog. If the replica set undergoes an election and different node becomes primary, the connector will immediately stop tailing the oplog, connect to the new primary, and start tailing the oplog using the new primary node. Likewise, if connector experiences any problems communicating with the replica set members, it will try to reconnect (using exponential backoff so as to not overwhelm the replica set) and continue tailing the oplog from where it last left off. In this way the connector is able to dynamically adjust to changes in replica set membership and to automatically handle communication failures.

The MongoDB oplog contains limited information, and in particular the events describing updates and deletes do not actually have the before or after state of the documents. Instead, the oplog events are all idempotent, so updates contain the effective changes that were made during an update, and deletes merely contain the deleted document identifier. Consequently, the connector is limited in the information it includes in its output events. Create and read events do contain the initial state, but the update contain only the changes (rather than the before and/or after states of the document) and delete events do not have the before state of the deleted document. All connector events, however, do contain the local system timestamp at which the event was processed and _source_ information detailing the origins of the event, including the replica set name, the MongoDB transaction timestamp of the event, and the transactions identifier among other things.

It is possible for MongoDB to lose commits in specific failure situations. For exmaple, if the primary applies a change and records it in its oplog before it then crashes unexpectedly, the secondary nodes may not have had a chance to read those changes from the primary's oplog before the primary crashed. If one such secondary is then elected as primary, it's oplog is missing the last changes that the old primary had recorded and no longer has those changes. In these cases where MongoDB loses changes recorded in a primary's oplog, it is possible that the MongoDB connector may or may not capture these lost changes.
2016-07-14 13:02:36 -05:00

2.4 KiB

This module builds and runs two containers based upon the mongo:3.2 Docker image. The first primary container starts MongoDB, while the second initiator container initializes the replica set and then terminates.

Using MongoDB

As mentioned in the README.md file, our Maven build can be used to start a container using either one of these images. The mongo:3.2 image is used:

$ mvn docker:start

The command leaves the primary container running so that you can use the running MySQL server. For example, you can establish a bash shell inside the container (named mongo1) by using Docker in another terminal:

$ docker exec -it mongo1 bash

Or you can run integration tests from your IDE, as described in detail in the README.md file.

To stop and remove the mongo1 container, simply use the following Maven command:

$ mvn docker:stop

or use the following Docker commands:

$ docker stop mongo1
$ docker rm mongo1

Using Docker directly

Although using the Maven command is far simpler, the Maven commands really just run for the alt-server profile really just runs (via the Jolokia Maven plugin) a Docker command to start the container, so it's equivalent to:

$ docker run -it --rm --name mongo mongo:latest --replSet rs0 --oplogSize=2 --enableMajorityReadConcern

This will use the mongo:3.2 image to start a new container named mongo. This can be repeated multiple times to start multiple MongoDB secondary nodes:

$ docker run -it --rm --name mongo1 mongo:latest --replSet rs0 --oplogSize=2 --enableMajorityReadConcern

$ docker run -it --rm --name mongo2 mongo:latest --replSet rs0 --oplogSize=2 --enableMajorityReadConcern

Then, run the initiator container to initialize the replica set by assigning the mongo container as primary and the other containers as secondary nodes:

$ docker run -it --rm --name mongoinit --link mongo:mongo --link mongo1:mongo1 --link mongo2:mongo2 -e REPLICASET=rs0 -e debezium/mongo-replicaset-initiator:3.2

Once the replica set is initialized, the mongoinit container will complete and be removed.

Use MongoDB client

The following command can be used to manually start up a Docker container to run the MongoDB command line client:

$ docker run -it --link mongo:mongo --rm mongo:3.2 sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT"'

Note that it must be linked to the Mongo container to which it will connect.