The cache mechanism had to be adapted in order to support non-versioned
and versioned schemas, a test now confirms it's the same valueSchema
instance created once.
To allow more flexibility we should proxy the new Event Schema fields
based on the original table columns which Debezium is capturing.
The Schema is now built one time during the first Record in order to
detect those types which are only available within the Record.
It's a good practice to depend on the Kafka metadata instead of custom
dates in the payload, this way for instance when using KStreams with
Tumbling Window the dates are correctly matched.
This commit does a few things:
- Refactors snapshot modes to be encapsulated by an interface and
to only use that interface in determining when to snapshot and in
determing the type of the `RecordProducer` interface to instantiate
- Refactors the configuration of existing snapshot modes to tie the
existing snapshot modes to their aligned implementation
- Adds a new snapshot.mode, custom, and a new configuration option to
specify a custom implementation that will be loaded by the class loader
- Changes the visibility of some classes to allow for custom snapshot
modes to get enough context to make an informed choice
- Adds some metadata about slots (the catalog_xmin) to give a full idea
of the state of slots which can be useful in implementing snapshot
modes (which is also configurable, as it can add some overhead)
Together, these changes allow for a much broader flexibility got end
users to implement a snapshot mode that can do more advanced snapshots,
such as partial recovery or for partial snapshots for tables where not
all records are needed.
This could also be seen as superseeding the
`snapshot.select.statement.overrides` to allow for users to dynamically
build queries based on the state of the slot and the offsets consumed.
* Removing redundant check for date mapping type
* Always using String as fallback value for temporal values where needed
* Pulling fallback temporal values up to JdbcValueConverters
Reason:
reference to prepareQuery is ambiguous
[ERROR] both method prepareQuery(java.lang.String,io.debezium.jdbc.JdbcConnection.StatementPreparer,io.debezium.jdbc.JdbcConnection.BlockingResultSetConsumer) in io.debezium.jdbc.JdbcConnection and method prepareQuery(java.lang.String,io.debezium.jdbc.JdbcConnection.StatementPreparer,io.debezium.jdbc.JdbcConnection.ResultSetConsumer) in io.debezium.jdbc.JdbcConnection match
* Renaming getTimeSinceLastEvent() to getMilliSecondsSinceLastEvent()
* Further unifying metrics implementation across connectors
* Emitting event in EventDispatcher also if event is filtered out
* Typo fixes
The method allows to define steps which have to be taken just after the
database connection is created (e.g. setting snapshot isolation level).
By default no operation is executed.
This will allow consumers to recognize the Debezium connector used for creating a given message, helping them to adjust their behavior for a variety of connectors.
The "field.blacklist" configuration property is an optional comma-separated list of the fully-qualified names of fields that should be excluded from change event message values. Fully-qualified names for fields are of the form "databaseName.collectionName.fieldName.nestedFieldName", where "databaseName" and "collectionName" may contain the wildcard (*) which matches any characters.
Although the "field.blacklist" configuration property allows you to remove fields from the event values, the "_id" field is always included in the event’s key.
That's simpler to grasp than the approach of passing a supplier lambda to the constructor.
Also it allows to pass on the offset via local variables instead of instance fields in some cases.
If value is binary or string, it should be better to compare the content
of the actual byte arrays.
According to ErrorProne you use reference equality when calling equals on
an array.
Part of https://issues.jboss.org/browse/DBZ-759
This records the DDL for DDL events captured during streaming. For the
initial schema snapshot, a JSON-style representation of the captured
Table objects is used in a new field of HistoryRecord, as the DDL
returned by dbms_metadata.get_ddl() isn't fully parseable by our
grammar.
* Renaming ConnectorTaskContext to CdcSourceTaskContext
* Renaming ReplicationContext to MongoDbTaskContext
* Making relationship from MongoDbTaskContext to ConnectionContext has-a instead of is-a
TypeRegistry introduced for Postgres connector
JDBC column does not have a special componentType
JDBC column provide a database specific type id
OID is a primary type identifier to be used in Postgres connector code - dropping JDB/OID dichotomy
* ChangeEventQueue#enqueue() checks the interrupted state of the calling
thread now, raising an InterruptedException in case the interrupted flag
has been set (because the producer's thread executor has been stopped)
* RecordSnapshotProducer has been adjusted to check for the interrupted
regularly, aborting if it has been set
* Renaming ConfigurationHelper to Instantiator
* Doc improvements and typo fixes
* Bringing getInstance() methods into consistent order
* Raising exception instead of logging error if instantion fails
* shutting down the snapshotting thread and the DB history producer client
if the connector is stopped while trying to write to the history topic
* reducing the time that KafkaProducer#send() will block if Kafka isn't
available; this will release the producer thread quicker in case the
connector is stopped during snapshotting
* not returning from finally block (!) in case the TX is rolled back; This
prevented the failed state to be set by the outer catch clause in execute()
* Adding support for PostGIS geometry types
* Adding support for GEOMETRY, POLYGON and more in MySQL
* For all newly supported types, changes are represented using two new schema types Geometry and Geography, containing the WKB (binary geo data) ans srid (coord system identifier)
* The existing Point type also contains the new (optional) srid field
If you start a cluster (e.g. in a test) without specifying a port
you get a random port. Sometimes you might want to connect to the
embedded zookeeper instance (for instance, to make an assertion about
a znode). To do this you need to know the port number. So let's expose it.
MySQL has special handling of 2-digit years that it deems are ambiguous, such as the year value `17` that is actually treated as `2017`. Apparently the 2-digit values are stored in MySQL and the interpretation is performed when the data is extracted, so therefore the connector needs to also perform this adjustment of the year values. This commit uses the JDK’s `TemporalAdjuster` interface and passes this down to the requisite temporal-related datatype handling code. The MySQL connector then provides its own `TemporalAdjuster` implementation that adjusts the year values via the excellend JDK `Temporal` methods.
A row in one of the MySQL test databases was changed to use a 2-digit year of `16` while the test method still checks that the year is still 2016`, verifying that the year value is properly adjusted.
Added a table and inserted rows that tries to replicate the problem reported in DBZ-195, but the test was unable to replicate the problem. In fact, this really is no different than existing tests. Changed the log messages so that if/when this happens again it will be possible to know which row was problematic.
The MySQL parser now properly handles control blocks such as `BEGIN…END`, `IF…END IF`, `REPEAT…END REPEAT`, and `LOOP…END LOOP`, even in cases where the block is preceded by and terminated by a label.
Apparently not all reserved words must be quoted when using them as colum names, so refactored MySQL’s DDL parser to better handle a variety of unquoted colum names that are reserved words.
Changed the GTID source filters in the MySQL connector to be far more efficient when the filters specify literal UUIDs rather than regex patterns. In these cases, the predicate just checks whether a supplied value is in a hash set, and no regular expression patterns are used.
The GTID source filters can still be a combination of UUID literals and regular expressions, and the predicate will use the best implementation for each. For example, if the filters include all UUID literals, then regular expressions will never be used.
The MySQL DDL parser was not correclty handling `DEFINER` clauses within `CREATE TRIGGER` or `CREATE EVENT` statements. Support for `DEFINER` clauses was recently added for the various forms of `CREATE PROCEDURE`, `CREATE FUNCTION` and `CREATE VIEW` statements. These are the only kinds of statements that have the definer attribute, per the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/stored-programs-security.html).
The KafkaDatabaseHistory class was not behaving well in tests using my local development environment. When restoring from the persisted Kafka topic, the class would set up a Kafka consumer and see repeated messages. It is unclear whether the repeats were due to our test environment and very short poll timeouts. Regardless, the restore logic was refactored to track offsets so as to only process messages once.
The version of the DB server required for this to work is at least 9.4. To be able to stream logical changes, the code relies on enhancements to the JDBC driver which are not yet public. Therefore, the current codebase includes the sources for the JDBC driver.
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
The version of the DB server required for this to work is at least 9.4
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
The MySQL connector uses several threads, so previously upon connector shutdown these threads were simply cancelled. This is fine for the binlog reader (which can stop at any moment), but is a poor approach for the snapshot as we didn’t always properly release the database resources and also didn’t complete the writing of the DDL history.
With this change, the snapshot reader stops in a very controlled manner, basically by having the 10-step snapshot procedure frequently check whether the reader is to continue working, and to completely avoid thread interruption altogether. And, the snapshot procedure will always clean up its database resources (locks, transactions, etc.), even if the procedure is stopped before completion.
This change also refactors how the snapshot and binlog reader are managed. This is no longer done in the MySqlConnectorTask class (which is busy enough), but rather the logic has been encapsulated in a new `ChainedReader` that makes use of a new `Reader` interface. This makes testing of `ChainedReader` easier, and ensure that `ChainedReader` relies only upon the primary methods of `Reader` rather than upon `AbstractReader`. `ChainedReader` handles multiple readers generically, and ensures that when stopped the readers are all handled correctly and completely process all records, yet avoid accidentally starting a subsequent reader(s) when stopping the previous reader.
The MySQL DDL parser was not properly consuming function declarations. For functions, the parser consumes the entire statement without handline the various expressions within the function declaration, but the parser was not properly finding the end of the statement and instead was continuing to try to consume values beyond the end of the statement.
Specifically, when the parser consumes a `BEGIN`, it looks for a corresponding `END`. However, if it encountered an `END IF`, the `IF` plus any remaining tokens were left on the token stream and unprocessed. This confused the parser, which keep looking for statements and ultimately ended with a `No more content` error.
This case was replicated in integration tests, and the code fixed to properly find the end of the statements.
By default the MySQL connector handles `DECIMAL` and `NUMERIC` columns using `java.math.BigDecimal` values and describing them using the `org.apache.kafka.connect.data.Decimal` schema type, which serializes the values to a binary form.
This change adds a configuration option that will keep the default behavior, but will instead allow handling `DECIMAL` adn `NUMERIC` values as Java `double` and a schema type of `FLOAT64`.
Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped.
The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart.
This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.
Improved the error handling of the MySQL connector to ensure that we’re always stopping the connector when we have a problem handling a binlog event or if we have problems starting.
When the MySQL connector is reading the binlog, it outputs INFO log messages reporting status at an exponentially-increasing rate, starting at every 5 seconds and doubling until a max period of 1 hour. This output is useful when the connector starts to know that it is working, but thereafter the usefulness decreases. Once an hour is probably acceptable output.
This is not intended to replace the capturing of metrics, but is merely an aid to easily tell via the logs whether the connector continues to work.
Also improved the log message when the binlog reader stops to capture the total number of events recorded by Kafka Connect and the last recorded offset.
Corrected how the MySQL connector is treating columns of type `BIT(n)`, where _n_ is the number of bits in the value. When `n=1`, the resulting values are booleans; when `n>1`, the resulting values are little endian `byte[]` that have the minimum number of bytes to hold the `n` bits.
The `KafkaDatabaseHistory` was always creating a new producer whenever its `start()` method was called, even if it were called more than once. And, the `MySqlSchema` was calling `start()` twice, resulting in multiple producers being created and registered with JMX. Both issues were fixed.
Also, UUIDs were being used as the name of the JMX MBean for the producer, unless the `database.history.consumer.client.id` and `database.history.producer.client.id` properties were being explicitly set. Now, the MySQL connector will by default set the `client.id` property on both the database history's Kafka consumer and producer to `{connectorName}-dbhistory`. Of course, the `database.history.consumer.client.id` and `database.history.producer.client.id` properties can still be set to define the name of the producer and consumer.
MySQL supports "zero-value" dates and timestamps, but these cannot be represented as valid dates or timestamps using the Java types. For example, the zero-value `0000-00-00` for a date has what Java considers to be an invalid month and day-of-the-month.
This commit changes how the MySQL connector handles these values to not throw exceptions. When columns allow nulls, such values will be treated as nulls; when columns do not allow null values, these values will be converted to a "zero-value" for the corresponding Java representation (e.g., the epoch day or timestamp). A new test case verifies the behaviors.
Anytime we `toString()` a `Configuration`, any values for password properties should be masked. A password property is defined to be a property whose key ends in "password" in a case-insensitive manner.
The MySQL binlog events contain the binary representation of string-like values as encoded per the column's character set. Properly decoding these into Java strings requires capturing the column, table, and database character set when parsing the DDL statements.
Unfortunately, MySQL DDL allows columns (at the time the columns are created or modified) to inherit the default character set for the table, or if that is not defined the default character set for the database, or if that is not defined the character set for the server. So, in addition to modifying the MySQL DDL parser to support capturing the character set name for each column, it also had to be changed to know what these default character set names are.
The default character sets are all available via MySQL server/session/local variables. Although strictly speaking the character set variables cannot be set globally, MySQL DDL does allow session and local variables to be set with `SET` statements. Therefore, this commit enhances the MySQL DDL parser to parse `SET` statements and to track the various global, session, and local variables as seen by the DDL parser. Upon connector startup, a subset of server variables (related to character sets and collations) are read from the database via JDBC and used to initialize the DDL parser via `SET` methods.
In addition to initializing the DDL parser with the system variables related to character sets and collation, it is important to also capture the server and database default character sets in the database history so that the correct character sets are used for columns even when the default character sets have changed on the database and/or the server. Therefore, upon startup or snapshot the MySQL connector records in the database history a `SET` statement for the `character_set_server` and `collation_server` system variables so that, upon a later restart, the history's DDL statements can be re-parsed with the correct default server and database character sets. Also, when the MySQL connector reloads the database history (upon startup), the recorded default server character set is compared with the MySQL instance's current server character set, and if they are different the current character set is recorded with a new `SET` statement.
These extra steps ensure that the connector use the correct character set for each column, even when the connector restarts and reloads the database history captured by a previous version of the connector. IOW, the MySQL connector can be safely upgraded, and the new version will correctly start using the columns' character sets to decode the string-like values.
The DDL parser and in-memory models of the relational schemas were changed to capture the character set for each column whose type is a string (e.g., `CHAR`, `VARCHAR`, etc.). This required handling `SET` statements used to change the system variables that hold the names of the default character set for the server and for each database. So, even if a column does not explicitly define the character set, the column's actual character set is identified from the table's character set, which might default to the current database's character set, which if not set defaults to the system character set.
These changes merely affect how MySQL DDL is parsed and the in-memory relational schema representation to accommodate the character set at various levels. It does not change the behavior of the MySQL connector; that will be done in a subsequent commit.
All tests pass with these changes, including quite a few additional tests for the new functionality.
Changed the MySQL connector to have several new configuration properties for setting up the SSL key store and trust store (which can be used in place of System or JDK properties) used for MySQL secure connections, and another property to specify what kind of SSL connection be used.
Modified several integration tests to ensure all MySQL connections are made with `useSSL=false`.