debezium server
1. Support for string and binary serialization formats on debezium api.
2. Allow configuring separate key and value formats on embedded engine.
This change fixes the following issue using outbox event router on
embedded engine:
Outbox event router supports arbitrary payload formats with
BinaryDataConverter as the value.converter which passes payload
transparently. However this is currently not supported with the
embedded engine which handles message conversion using value.format to
specify the format.
In addition, when we want to pass payload transparently, it makes
sense to also pass aggregateid i.e. the event key transparently. The
default outbox table configuration specifies aggregateid as a
varchar which is also not supported by embedded engine.
The default query builder has maximum SQL compatibility, but the query
plans it makes are not always optimal, especially in the case of
multi-column keys. For example, PostgreSQL is unable to to effectively
create a query plan that does index scanning when faced with such a
query.
Some databases support the concept of row value constructors. In
these cases, we can use these as an alternative to make a much more
simple and easier to understand query. Not only is it easier for
humans to understand, but most importantly, the query planner also gets
the hint and finally uses the relevant multi-column index! This commit
starts out with support for PostgreSQL and MySQL.
It turns out that the existing code for chunking a table when taking
an incremental snapshot was buggy and did not correctly handle NULL
values when building the chunk query. An example of such a situation
would be when the user has specified "message.key.columns" to reference
a column that is part of a PostgreSQL UNIQUE INDEX that was created with
the NULLS NOT DISTINCT option.
This commit updates the new AbstractChunkQueryBuilder so that it checks
whether a key column is optional. If it is, then additional will
appropriately consider NULL values when generating a chunk query using
"IS [NOT] NULL" clauses.
One complication is that different database engines have different
sorting behavior of ORDER BY. It is apparently not well-defined by the
SQL standard. Some databases consider NULL values to be higher than any
non-NULL values, and others consider them to be lower.
To handle this situation, a new nullsSortLast() function is added to the
JdbcConnection class. By default, it returns an empty value, indicating
that the behavior of the database engine is unknown. When an optional
field is encountered by AbstractChunkQueryBuilder in this situation, we
throw an error because we don't actually know how to correctly chunk the
query: there's no safe assumption that can be made here.
Derived JdbcConnection classes can then override the nullsSortLast
function, and return a value indicating the actual behavior of that
database engine. When this is done, the AbstractChunkQueryBuilder then
knows how to correctly build a chunk query that can handle NULL values.
To help test this, new tests have been added to
AbstractIncrementalSnapshotTest. First, the existing insertsWithoutPks
test has been moved and deduplicated from MySQL and PostgreSQL so that
the test case can be reused on other engines. Second, a new
insertsWithoutPksAndNull test is run, which inserts data with NULL
values in the message key columns. To demonstrate that chunk queries
are being correctly generated for practically every case, the
INCREMENTAL_SNAPSHOT_CHUNK_SIZE is set to 1 so that NULL values are not
returned in the middle of a chunk, which can cause us to skip testing
the code we need to test.
If the logical replication slot has absolutely zero events in it, then
the searchWalPosition could be looping for quite some time. During
this time, it wasn't sending heartbeats.
This commit fixes that function to send heartbeats.
As a part of this work to handle injection in a cleaner way, this commit
adds two new broad concepts called `BeanRegistry` and `ServiceRegistry`.
A BeanRegistry is a glorified registry of different objects that are not
necessarily services but may be desired by a service. This contract will
allow Debezium to integrate in the future with other CDI providers.
A ServiceRegistry is more of an internal concept, where various systems
can be started based on their dependency order and provides a universal
way to split larger parts of the code into smaller, focused modules that
can be accessed using the Service Locator pattern.
This config will be re-used by possible other implementations of
DebeiumEngine API in the embedded package. As DebeziumEngine API
can have completely different implementations and thus also config,
the class is called `EmbeddedEngineConfig` as it's assumed to be used
only by embedded engine "family" of implementations.
To keep backward compatibility, the config options are extracted into
an interface and `EmbeddedEngine` implements this interface, thus
allowing to use these options in custom classes without any need for the
code changes.
Currently, newly created `ElapsedTimeStrategy` is uninitialized and its
`hasElapsed()` has to be called once `ElapsedTimeStrategy` is created to
initialize the strategy. This is confusing and error prone.
Move initialization of `ElapsedTimeStrategy` into it's constructor, so
it's initialized once it's created.
PostgresConnectorIT#shouldAddNewFieldToSourceInfo fails only when run
together with other tests and the failure is random. It seem there is
a caching issue in Apicuro, when `test_server.s1.a-value` artifact
references `io.debezium.connector.postgresql.Source` version 1, which
hasnt't `newField` field and this reference is used also in
`shouldAddNewFieldToSourceInfo` where artifact with version 2 should
be used. Using dedicated table and thus creating new artifact in
Apucurio should fix this issue.
Also remove unused variable from `CustomPostgresSourceInfoStructMaker`.
We should send heartbeats no matter if there is already any DB record
processed or not. This prevents situation when after the start there is
no new record in the Db and Debezium is not sending neither DB events
nor heartbeats.
The root cause of DBZ-6648 is using long number as a fallback value,
while String is expected and thus falling conversion and returning
`null`.
Use String fallback when interval handling mode is String.