There is a small chance the connector doesn't advance and re-reads
the same LSN range. This happens under the following conditions:
* a new capture instance has been added in the current LSN range;
* while reading CDC changes one of existing capture instances
dissapears.
The dissapeared capture instance causes an exception which is catched
and processed in `processErrorFromChangeTableQuery`. This leads to
the current connector iteration to be correctly exited without
advancing. On the next iteration the connector starts from the same
LSN as the previous iteration and finds the same new capture instance.
Although `Set` was used to track the list of tables to be removed
`SqlServerChangeTable` doesn't implement `hashCode` so same table
could be added multiple times to the same set.
The fix is to implement `hashCode` and `equals` methods in `ChangeTable`
which is the parent class of `SqlServerChangeTable`.
Additionally a synchronisation block is needed where the tables
are added to the hash map as it happens in a different thread from
the one that removes the tables from the hash map.
In the case when there is a `#` in a column name it gets replaced with
the capture instance name according to how the CDC data query is built.
This patch makes the replacement more specific by changing the
replacement placeholder from `#` to `#table` in this particular case.
It turns out that the existing code for chunking a table when taking
an incremental snapshot was buggy and did not correctly handle NULL
values when building the chunk query. An example of such a situation
would be when the user has specified "message.key.columns" to reference
a column that is part of a PostgreSQL UNIQUE INDEX that was created with
the NULLS NOT DISTINCT option.
This commit updates the new AbstractChunkQueryBuilder so that it checks
whether a key column is optional. If it is, then additional will
appropriately consider NULL values when generating a chunk query using
"IS [NOT] NULL" clauses.
One complication is that different database engines have different
sorting behavior of ORDER BY. It is apparently not well-defined by the
SQL standard. Some databases consider NULL values to be higher than any
non-NULL values, and others consider them to be lower.
To handle this situation, a new nullsSortLast() function is added to the
JdbcConnection class. By default, it returns an empty value, indicating
that the behavior of the database engine is unknown. When an optional
field is encountered by AbstractChunkQueryBuilder in this situation, we
throw an error because we don't actually know how to correctly chunk the
query: there's no safe assumption that can be made here.
Derived JdbcConnection classes can then override the nullsSortLast
function, and return a value indicating the actual behavior of that
database engine. When this is done, the AbstractChunkQueryBuilder then
knows how to correctly build a chunk query that can handle NULL values.
To help test this, new tests have been added to
AbstractIncrementalSnapshotTest. First, the existing insertsWithoutPks
test has been moved and deduplicated from MySQL and PostgreSQL so that
the test case can be reused on other engines. Second, a new
insertsWithoutPksAndNull test is run, which inserts data with NULL
values in the message key columns. To demonstrate that chunk queries
are being correctly generated for practically every case, the
INCREMENTAL_SNAPSHOT_CHUNK_SIZE is set to 1 so that NULL values are not
returned in the middle of a chunk, which can cause us to skip testing
the code we need to test.