This commit does a few things:
- Refactors snapshot modes to be encapsulated by an interface and
to only use that interface in determining when to snapshot and in
determing the type of the `RecordProducer` interface to instantiate
- Refactors the configuration of existing snapshot modes to tie the
existing snapshot modes to their aligned implementation
- Adds a new snapshot.mode, custom, and a new configuration option to
specify a custom implementation that will be loaded by the class loader
- Changes the visibility of some classes to allow for custom snapshot
modes to get enough context to make an informed choice
- Adds some metadata about slots (the catalog_xmin) to give a full idea
of the state of slots which can be useful in implementing snapshot
modes (which is also configurable, as it can add some overhead)
Together, these changes allow for a much broader flexibility got end
users to implement a snapshot mode that can do more advanced snapshots,
such as partial recovery or for partial snapshots for tables where not
all records are needed.
This could also be seen as superseeding the
`snapshot.select.statement.overrides` to allow for users to dynamically
build queries based on the state of the slot and the offsets consumed.
- wal2json sends the txn commitTime using a function from PostgreSQL's C
library. The value that Debezium recevies is in nanoseconds.
- decoderbufs sends the txn commitTime in microseconds.
- RecordsSnapshotProducer updates SourceInfo.ts_usec by converting
System.currentTimeMillis() to microseconds.
- RecordsSnapshotProducer updates the SourceInfo's ts_usec field using
message.getCommitTime().
This means that when using wal2json, the value of SourceInfo.ts_usec
is in microseconds since epoch during snapshot but is in nanoseconds
during streaming. To fix this, we changed
Wal2JsonReplicationMessage.getCommitTime() to return in microseconds.
* Removing redundant check for date mapping type
* Always using String as fallback value for temporal values where needed
* Pulling fallback temporal values up to JdbcValueConverters
This introduces a new API to the EmbeddedEngine, the ChangeConsumer,
which gives the user a more flexible option for consuming changes by
exposing groups of records as well as the ability to control the
comitting of those records.
This remainds completely backwards compatible with the old API as the
ChangeConsumer wraps the existing Consumer interface with a default
implementation.
The "include-unchanged-toast" option was removed in recent wal2json versions, without a transition phase. So the connector tries not to connect first giving the option, and if that fails, without it.
This will allow consumers to recognize the Debezium connector used for creating a given message, helping them to adjust their behavior for a variety of connectors.
Improves PostgreSQL RecordsStreamProducer performance for processing up
dates to a table with TOASTable columns, where those updates do not
affect said columns. Prior to this fix, these updates would trigger a
refresh of the in-memory table schema. In the worst case, this means a
query for every update. This puts significant load on the database
server and adds tens of milliseconds to the processing of each update
record.
The fix requires a new configuration option, called
'schema.refresh.mode'. This option has values 'columns_diff' (the
default) and 'columns_diff_exclude_unchanged_toast'. 'columns_diff'
maintains the pre-fix behavior. 'columns_diff_exclude_unchanged_toast'
activates the fix.
The fix must be toggleable because it decreases the consistency guarantees
Debezium provides, since in-memory table schemas may not stay
synchronized with their remote counterparts. With type metadata included
in the replication message, inconsistencies are limited to the unchanged
toast columns.
* Using database encoding for string conversion
* Not making hstore schemas optional by default
* Using Jackson instead of GSon for JSON serialization
* Removing superfluous method and log messages
* Adjusting to naming and style conventions