f632fa081e
It turns out that the existing code for chunking a table when taking an incremental snapshot was buggy and did not correctly handle NULL values when building the chunk query. An example of such a situation would be when the user has specified "message.key.columns" to reference a column that is part of a PostgreSQL UNIQUE INDEX that was created with the NULLS NOT DISTINCT option. This commit updates the new AbstractChunkQueryBuilder so that it checks whether a key column is optional. If it is, then additional will appropriately consider NULL values when generating a chunk query using "IS [NOT] NULL" clauses. One complication is that different database engines have different sorting behavior of ORDER BY. It is apparently not well-defined by the SQL standard. Some databases consider NULL values to be higher than any non-NULL values, and others consider them to be lower. To handle this situation, a new nullsSortLast() function is added to the JdbcConnection class. By default, it returns an empty value, indicating that the behavior of the database engine is unknown. When an optional field is encountered by AbstractChunkQueryBuilder in this situation, we throw an error because we don't actually know how to correctly chunk the query: there's no safe assumption that can be made here. Derived JdbcConnection classes can then override the nullsSortLast function, and return a value indicating the actual behavior of that database engine. When this is done, the AbstractChunkQueryBuilder then knows how to correctly build a chunk query that can handle NULL values. To help test this, new tests have been added to AbstractIncrementalSnapshotTest. First, the existing insertsWithoutPks test has been moved and deduplicated from MySQL and PostgreSQL so that the test case can be reused on other engines. Second, a new insertsWithoutPksAndNull test is run, which inserts data with NULL values in the message key columns. To demonstrate that chunk queries are being correctly generated for practically every case, the INCREMENTAL_SNAPSHOT_CHUNK_SIZE is set to 1 so that NULL values are not returned in the middle of a chunk, which can cause us to skip testing the code we need to test. |
||
---|---|---|
.. | ||
src | ||
pom.xml |