Commit Graph

1939 Commits

Author SHA1 Message Date
James Johnston
3d93516b53 DBZ-5071 Create new RowValueConstructorChunkQueryBuilder
The default query builder has maximum SQL compatibility, but the query
plans it makes are not always optimal, especially in the case of
multi-column keys.  For example, PostgreSQL is unable to to effectively
create a query plan that does index scanning when faced with such a
query.

Some databases support the concept of row value constructors.  In
these cases, we can use these as an alternative to make a much more
simple and easier to understand query.  Not only is it easier for
humans to understand, but most importantly, the query planner also gets
the hint and finally uses the relevant multi-column index!  This commit
starts out with support for PostgreSQL and MySQL.
2024-02-29 13:36:26 +01:00
James Johnston
f632fa081e DBZ-5071 Correctly handle NULL values in incremental snapshots
It turns out that the existing code for chunking a table when taking
an incremental snapshot was buggy and did not correctly handle NULL
values when building the chunk query.  An example of such a situation
would be when the user has specified "message.key.columns" to reference
a column that is part of a PostgreSQL UNIQUE INDEX that was created with
the NULLS NOT DISTINCT option.

This commit updates the new AbstractChunkQueryBuilder so that it checks
whether a key column is optional.  If it is, then additional will
appropriately consider NULL values when generating a chunk query using
"IS [NOT] NULL" clauses.

One complication is that different database engines have different
sorting behavior of ORDER BY.  It is apparently not well-defined by the
SQL standard.  Some databases consider NULL values to be higher than any
non-NULL values, and others consider them to be lower.

To handle this situation, a new nullsSortLast() function is added to the
JdbcConnection class.  By default, it returns an empty value, indicating
that the behavior of the database engine is unknown.  When an optional
field is encountered by AbstractChunkQueryBuilder in this situation, we
throw an error because we don't actually know how to correctly chunk the
query: there's no safe assumption that can be made here.

Derived JdbcConnection classes can then override the nullsSortLast
function, and return a value indicating the actual behavior of that
database engine.  When this is done, the AbstractChunkQueryBuilder then
knows how to correctly build a chunk query that can handle NULL values.

To help test this, new tests have been added to
AbstractIncrementalSnapshotTest.  First, the existing insertsWithoutPks
test has been moved and deduplicated from MySQL and PostgreSQL so that
the test case can be reused on other engines.  Second, a new
insertsWithoutPksAndNull test is run, which inserts data with NULL
values in the message key columns.  To demonstrate that chunk queries
are being correctly generated for practically every case, the
INCREMENTAL_SNAPSHOT_CHUNK_SIZE is set to 1 so that NULL values are not
returned in the middle of a chunk, which can cause us to skip testing
the code we need to test.
2024-02-29 13:36:26 +01:00
James Johnston
9352957244 DBZ-5071 Refactor buildChunkQuery into ChunkQueryBuilder
This commit prepares the way for further dialect-specific improvements
to incremental snapshot chunk queries by extracting the buildChunkQuery
and related functions behind a new ChunkQueryBuilder interface.  The
implementation is mostly in an AbstractChunkQueryBuilder, and a
DefaultChunkQueryBuilder inherits from that.

The chunk builders are instantiated by JdbcConnection via a new
chunkQueryBuilder function.  The base class uses the default, but
derived JdbcConnection classes could use a dialect-specific chunk query
builder.
2024-02-29 13:36:26 +01:00
Chris Cranford
66e23613ce DBZ-7072 Retry Oracle Flashback-based snapshot queries 2024-02-29 13:33:10 +01:00
ani-sha
cb81f75481 DBZ-6858 Add suggestions from code review 2024-02-29 11:55:51 +01:00
ani-sha
7feeed855f DBZ-6858 Simplify metadata logic 2024-02-29 11:55:51 +01:00
ani-sha
3bb5cb4c8e DBZ-6858 Use SignalMetadata class to provide open/close timestamps 2024-02-29 11:55:51 +01:00
ani-sha
f8cd1a6353 DBZ-6858 Timestamp metadata for watermarking signals 2024-02-29 11:55:51 +01:00
Chris Cranford
f7bacc64f1 DBZ-7534 Guarantee per-thread parallel snapshot dispatch order 2024-02-23 13:56:26 -05:00
rkerner
1e69e40ec2 DBZ-7416 Fix duplicate SMTs sometimes returned by Kafka Connect. Moved deduplication from Map to LinkedHashSet.
+ minor fixes added for cleanup and centralization of common code

closes https://issues.redhat.com/browse/DBZ-7416
2024-02-22 13:34:08 -05:00
Jiri Pechanec
68b6591142 DBZ-7416 Fix duplicate SMTs sometimes returned by Kafka Connect. Moved deduplication from Map to LinkedHashSet.
closes https://issues.redhat.com/browse/DBZ-7416
2024-02-22 13:34:08 -05:00
mfvitale
08e46815e4 DBZ-7508 Exit from readChunk after createDataEventsForTable if snapshot is not running anymore 2024-02-22 12:13:37 -05:00
Chris Cranford
4c15d2bd3f DBZ-6236 Fix test failures 2024-02-22 00:14:26 -05:00
Chris Cranford
0516c0cdbb DBZ-6236 Reset ErrorHandler retry counter on successful poll 2024-02-22 00:14:26 -05:00
mfvitale
7ed5649e07 DBZ-7302 Implement Snapshotter for Oracle 2024-02-20 14:45:59 +01:00
mfvitale
7a0ee72b31 DBZ-7302 Improve Snapshotter and SnapshotQuery Java docs 2024-02-20 14:45:59 +01:00
mfvitale
9fe60a698d DBZ-7302 Move snapshot.locking.mode.custom.name, snapshot.query.mode and snapshot.query.mode.custom.name to CommonConnectorConfig 2024-02-20 14:45:59 +01:00
harveyyue
82f5e6ea77 DBZ-7480 Allow special characters in signal table name 2024-02-19 11:43:20 +01:00
Vojtech Juranek
ae53895cd8 DBZ-7495 Define constant for executor shutdown timeout
Unify executor shutdown timeout for executor services in the code base.
2024-02-19 08:45:33 +01:00
jchipmunk
79763211cb DBZ-7479 Refactor code to support re-selection without flashback
Because OracleConnection uses flashback query (AS OF SCN) to re-select row, it is potentially possible to get "ORA-01555 Snapshot too old" error, which can be solved by performing reselection without flashback to get at least its latest row state.
2024-02-17 11:21:45 -05:00
jchipmunk
f50aa7a987 DBZ-7479 PreparedStatement leak in Oracle ReselectColumnsProcessor
Each time, Oracle connector creates a new instance of PreparedStatement because value of commit SCN is added directly to SQL query to reselect column values.
2024-02-17 11:21:45 -05:00
Chris Cranford
6862d04987 DBZ-7107 Add micro/nano second timestamps to source info block 2024-02-16 12:52:20 +01:00
Chris Cranford
f9971cf9cc DBZ-7107 Bump envelope schema from version 1 to 2 2024-02-16 12:52:20 +01:00
Chris Cranford
659fc8df4e DBZ-7107 Fix SerdeTest failure 2024-02-16 12:52:20 +01:00
Chris Cranford
cac38fc484 DBZ-7107 Introduce micro/nano second based envelope timestamps 2024-02-16 12:52:20 +01:00
Chris Cranford
09e1bf1df0 DBZ-7488 Skip re-selection on r (read) events 2024-02-16 12:33:33 +01:00
Debezium Builder
10e327602c [maven-release-plugin] prepare for next development iteration 2024-02-13 09:20:04 +00:00
Debezium Builder
0c5b05738c [maven-release-plugin] prepare release v2.6.0.Alpha2 2024-02-13 09:20:04 +00:00
mfvitale
cb5a4d7a1a DBZ-7481 SnapshotterServiceProvider will check if snapshot mode class is related to the running connector. 2024-02-13 08:42:34 +01:00
Sergey Ivanov
d96c30ef3f DBZ-7437: ReselectColumnsPostProcessor filter not use exclude predicate. 2024-02-06 12:27:55 +01:00
mfvitale
a8a07e35f1 DBZ-7301 Implement SnapshotLock for MySQL connector 2024-02-06 07:12:42 +01:00
mfvitale
1cdf2836dd DBZ-7301 Implement Snapshotter for MySQL connector 2024-02-06 07:12:42 +01:00
mfvitale
c9458f4f58 DBZ-7300 Snapshotter, SnapshotLock and SnapshotQuery are now services registered in the ServiceRegistry 2024-02-06 07:12:42 +01:00
mfvitale
d0e4ad7e14 DBZ-7441 Postpone SignalProcessor start after streaming is initialized
This will avoid that channels that not depends on the event streaming like the 'source' channel, will start processing signals before the IncrementalSnapshotChangeEventSource is initialized.
2024-02-05 14:06:40 +01:00
mfvitale
881442de3d DBZ-7300 Fix duplicate field assignment for CONNECTOR_SNAPSHOT group 2024-02-02 13:41:45 +01:00
mfvitale
3b92786a41 DBZ-7300 Snapshotter, SnapshotLock and SnapshotQuery are now services registered in the ServiceRegistry 2024-02-02 13:41:45 +01:00
mfvitale
fa338b61c4 DBZ-7436 Add empty statistics for SingleDuration to avoid NPE 2024-02-02 12:36:24 +01:00
Chris Cranford
2bfa92e6af DBZ-7429 Resolve primary key values from after struct 2024-01-31 12:05:07 +01:00
Fiore Mario Vitale
1c93a92b84 DBZ-7421 Change access modifier for SnapshotDataCollection to private
Co-authored-by: Jiri Pechanec <jpechane@redhat.com>
2024-01-31 11:05:04 +01:00
mfvitale
f9034f2108 DBZ-7421 Encapsulate data collection queue and its json version into a dedicated class 2024-01-31 11:05:04 +01:00
mfvitale
81298865a5 DBZ-7421 Improve incremental snapshot performance with increasing number of collections to snapshot 2024-01-31 11:05:04 +01:00
nicholas-fwang
d710ee6b9f DBZ-7143 Rollback ValueConverter and handle when parse default value 2024-01-29 13:48:19 +01:00
nicholas-fwang
5aae5f51f4 DBZ-7143 Move event converting failure handler to ValueConverter interface. 2024-01-29 13:48:19 +01:00
nicholas-fwang
2bdeec099a DBZ-7143 Add document for event.converting.failure.handling.mode 2024-01-29 13:48:19 +01:00
nicholas-fwang
7d99605886 DBZ-7143 Add case when EventConvertingFailureHandlingMode is null 2024-01-29 13:48:19 +01:00
nicholas-fwang
4f2bbb023a DBZ-7143 fix checkstyle format 2024-01-29 13:48:19 +01:00
nicholas-fwang
3a5f35594c DBZ-7143 Add description of event.converting.failure.handling.mode 2024-01-29 13:48:19 +01:00
nicholas-fwang
57a46943af DBZ-7143 throw exception in JdbcValueConverters 2024-01-29 13:48:19 +01:00
nicholas-fwang
eeea0f1e70 DBZ-7143 Add event.converting.failure.handling.mode option 2024-01-29 13:48:19 +01:00
Animesh Kumar
2f8114029e DBZ-7380 Offset transaction id only when it's non null 2024-01-24 15:32:34 +01:00