f632fa081e
It turns out that the existing code for chunking a table when taking an incremental snapshot was buggy and did not correctly handle NULL values when building the chunk query. An example of such a situation would be when the user has specified "message.key.columns" to reference a column that is part of a PostgreSQL UNIQUE INDEX that was created with the NULLS NOT DISTINCT option. This commit updates the new AbstractChunkQueryBuilder so that it checks whether a key column is optional. If it is, then additional will appropriately consider NULL values when generating a chunk query using "IS [NOT] NULL" clauses. One complication is that different database engines have different sorting behavior of ORDER BY. It is apparently not well-defined by the SQL standard. Some databases consider NULL values to be higher than any non-NULL values, and others consider them to be lower. To handle this situation, a new nullsSortLast() function is added to the JdbcConnection class. By default, it returns an empty value, indicating that the behavior of the database engine is unknown. When an optional field is encountered by AbstractChunkQueryBuilder in this situation, we throw an error because we don't actually know how to correctly chunk the query: there's no safe assumption that can be made here. Derived JdbcConnection classes can then override the nullsSortLast function, and return a value indicating the actual behavior of that database engine. When this is done, the AbstractChunkQueryBuilder then knows how to correctly build a chunk query that can handle NULL values. To help test this, new tests have been added to AbstractIncrementalSnapshotTest. First, the existing insertsWithoutPks test has been moved and deduplicated from MySQL and PostgreSQL so that the test case can be reused on other engines. Second, a new insertsWithoutPksAndNull test is run, which inserts data with NULL values in the message key columns. To demonstrate that chunk queries are being correctly generated for practically every case, the INCREMENTAL_SNAPSHOT_CHUNK_SIZE is set to 1 so that NULL values are not returned in the middle of a chunk, which can cause us to skip testing the code we need to test. |
||
---|---|---|
.. | ||
src | ||
.gitignore | ||
NOTES.md | ||
pom.xml | ||
README.md |
Ingesting MySQL change events
This module defines the connector that ingests change events from MySQL databases.
Using the MySQL connector with Kafka Connect
The MySQL connector is designed to work with Kafka Connect and to be deployed to a Kafka Connect runtime service. The deployed connector will monitor one or more databases and write all change events to Kafka topics, which can be independently consumed by one or more clients. Kafka Connect can be distributed to provide fault tolerance to ensure the connectors are running and continually keeping up with changes in the database.
Kafka Connect can also be run standalone as a single process, although doing so is not tolerant of failures.
Embedding the MySQL connector
The MySQL connector can also be used as a library without Kafka or Kafka Connect, enabling applications and services to directly connect to a MySQL database and obtain the ordered change events. This approach requires the application to record the progress of the connector so that upon restart the connect can continue where it left off. Therefore, this may be a useful approach for less critical use cases. For production use cases, we highly recommend using this connector with Kafka and Kafka Connect.
Testing
This module contains both unit tests and integration tests.
A unit test is a JUnit test class named *Test.java
or Test*.java
that never requires or uses external services, though it can use the file system and can run any components within the same JVM process. They should run very quickly, be independent of each other, and clean up after itself.
An integration test is a JUnit test class named *IT.java
or IT*.java
that uses a MySQL database server running in a custom Docker container based upon the container-registry.oracle.com/mysql/community-server:8.2 Docker image maintained by the MySQL team. The build will automatically start the MySQL container before the integration tests are run and automatically stop and remove it after all of the integration tests complete (regardless of whether they suceed or fail). All databases used in the integration tests are defined and populated using *.sql
files and *.sh
scripts in the src/test/docker/init
directory, which are copied into the Docker image and run in lexicographical order by MySQL upon startup. Multiple test methods within a single integration test class can reuse the same database, but generally each integration test class should use its own dedicated database(s).
Running mvn install
will compile all code and run the unit and integration tests. If there are any compile problems or any of the unit tests fail, the build will stop immediately. Otherwise, the command will continue to create the module's artifacts, create the Docker image with MySQL and custom scripts, start the Docker container, run the integration tests, stop the container (even if there are integration test failures), and run checkstyle on the code. If there are still no problems, the build will then install the module's artifacts into the local Maven repository.
You should always default to using mvn install
, especially prior to committing changes to Git. However, there are a few situations where you may want to run a different Maven command.
Running some tests
If you are trying to get the test methods in a single integration test class to pass and would rather not run all of the integration tests, you can instruct Maven to just run that one integration test class and to skip all of the others. For example, use the following command to run the tests in the ConnectionIT.java
class:
$ mvn -Dit.test=ConnectionIT install
Of course, wildcards also work:
$ mvn -Dit.test=Connect*IT install
These commands will automatically manage the MySQL Docker container.
Debugging tests
If you want to debug integration tests by stepping through them in your IDE, using the mvn install
command will be problematic since it will not wait for your IDE's breakpoints. There are ways of doing this, but it is typically far easier to simply start the Docker container and leave it running so that it is available when you run the integration test(s). The following command:
$ mvn docker:start
will start the default MySQL container and run the database server. Now you can use your IDE to run/debug one or more integration tests. Just be sure that the integration tests clean up their database before (and after) each test, and that you run the tests with VM arguments that define the required system properties, including:
database.dbname
- the name of the database that your integration test will use; there is no defaultdatabase.hostname
- the IP address or name of the host where the Docker container is running; defaults tolocalhost
which is likely for Linux, but on OS X and Windows Docker it will have to be set to the IP address of the VM that runs Docker (which you can find by looking at theDOCKER_HOST
environment variable).database.port
- the port on which MySQL is listening; defaults to3306
and is what this module's Docker container usesdatabase.user
- the name of the database user; defaults tomysql
and is correct unless your database script uses something differentdatabase.password
- the password of the database user; defaults tomysqlpw
and is correct unless your database script uses something different
For example, you can define these properties by passing these arguments to the JVM:
-Ddatabase.dbname=<DATABASE_NAME> -Ddatabase.hostname=<DOCKER_HOST> -Ddatabase.port=3306 -Ddatabase.user=mysqluser -Ddatabase.password=mysqlpw
When you are finished running the integration tests from your IDE, you have to stop and remove the Docker container before you can run the next build:
$ mvn docker:stop
Please note that when running the MySQL database Docker container, the output is written to the Maven build output and includes several lines with [Warning] Using a password on the command line interface can be insecure.
You can ignore these warnings, since we don't need a secure database server for our transient database testing.
Analyzing the database
Sometimes you may want to inspect the state of the database(s) after one or more integration tests are run. The mvn install
command runs the tests but shuts down and removes the container after the integration tests complete. To keep the container running after the integration tests complete, use this Maven command:
$ mvn integration-test
Stopping the Docker container
This instructs Maven to run the normal Maven lifecycle through integration-test
, and to stop before the post-integration-test
phase when the Docker container is normally shut down and removed. Be aware that you will need to manually stop and remove the container before running the build again:
$ mvn docker:stop
Using MySQL with GTIDs
By default the build will run a MySQL server instance that is not configured to use GTIDs. However, we've provided a Maven profile that will instead run all the same integration tests against a MySQL instance that does use GTIDs. Simply use the mysql-gtids
profile on each of your normal Maven commands. For example, to run a build:
$ mvn clean install -Pmysql-gtids
or to manually start the Docker container and keep it running:
$ mvn docker:start -Pmysql-gtids
or to stop and remove the Docker container:
$ mvn docker:stop -Pmysql-gtids
Using an alternative MySQL Server
All of the above commands will start the MySQL Docker container that is built upon the container-registry.oracle.com/mysql/community-server:8.2 Docker image maintained by the MySQL team. This image has an "optimized" MySQL server that includes only a portion of the full installation (e.g., it excludes some tools such as mysqlbinlog
). However, it starts a little faster and is less verbose in its output.
This module defines the alt-mysql
Maven profile that will instead use the mysql Docker image maintained by Docker (the company). This is a bit more verbose, but it includes all of the MySQL utilities, including mysqlbinlog
that may be necessary to properly debug and analyze the behavior of the integration tests.
To use the alternative Docker image, simply specify the alt-mysql
Maven profile on all of the Maven commands, including a build:
$ mvn clean install -Palt-mysql
or to manually start the Docker container and keep it running:
$ mvn docker:start -Palt-mysql
or to stop and remove the Docker container:
$ mvn docker:start -Palt-mysql
Testing all MySQL configurations
In Debezium builds, the assembly
profile is used when issuing a release or in our continuous integration builds. In addition to the normal steps, it also creates several additional artifacts (including the connector plugin's ZIP and TAR archives) and runs the whole
integration test suite once for each of the MySQL configurations. If you want to make sure that your changes work on all MySQL configurations, add -Passembly
to your Maven commands.