It also updates EmbeddedEngine to use the Kafka commit callbacks introduced after 0.10 and updates AbstractConnectorTest to better synchronize with the embedded engine
By default the MySQL connector handles `DECIMAL` and `NUMERIC` columns using `java.math.BigDecimal` values and describing them using the `org.apache.kafka.connect.data.Decimal` schema type, which serializes the values to a binary form.
This change adds a configuration option that will keep the default behavior, but will instead allow handling `DECIMAL` adn `NUMERIC` values as Java `double` and a schema type of `FLOAT64`.
Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped.
The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart.
This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.
The MySQL connector (or rather the DDL parser used in the connector) improperly assumed a `CHAR` JDBC type (and Avro schema `STRING` type) for MySQL columns of type `BINARY`. This corrects the error.
Improved the error handling of the MySQL connector to ensure that we’re always stopping the connector when we have a problem handling a binlog event or if we have problems starting.
Make Debezium merge its GTID set with the GTID set on the server that
it's connecting to. This allows Debezium to consume from a MySQL server
that might have a different set of channels (upstream masters), provided
that the server has the data that Debezium needs.
Snapshot Reader will have a dataInclude flag, which will determine whether initial data in whitelisted database and tables have to
read or not. In schema only mode, will not read inital data, will capture only database table schema
Added unit test for validating checks that initial data is not copied
MySQL records the timestamp with second precision in binlog events, but the library we use multiplies by 1000 to return the padded value in milliseconds (even though the value still has second precision). The BinlogReader converts this back to seconds, so the SourceInfo should not also be dividing by 1000.
Adds support for MySQL 5.7's `JSON` type, which is capable of holding JSON objects, JSON arrays, and scalar values. The Debezium MySQL connector represents `JSON` values as string with a `io.debezium.data.Json` semantic type (which is basically a string schema that has a special name to denote the semantics), and the _contents_ of that string will be the JSON representation of the object, array, or scalar value.
When a connector is originally connected to a MySQL server, it will record the GTID set that identifies the position in the binlog. When all of the interesting transactions originate on a different server (i.e., the server we're listening to is a replica), the server we're listening to will still include some transactions in the binlog (e.g., for the information schema, performance, or other internal databases), and so the GTID set will include a GTID range for our server. If we stop the connector and want to point it to a different MySQL server, asking MySQL to position the binlog using the complete GTID set (including the GTID range for our old replica) will cause an error, since the new server does not have any GTID ranges from the old replica. Therefore, the connector needs to be able to exclude some GTID ranges that originated on the original replica, using the `server_uuid` property of the replica server.
This change adds two configuration properties: `gtid.source.includes` and `gtid.source.excludes`. Both are optional, but at most only one of these can be used. These properties contain a comma-separated list of GTID sources (i.e., the `server_uuid` value for the server where the transaction originated) or regular expressions matching GTID sources, and upon connector startup the connector uses the list to filter the previously-recorded GTID set against the available GTID set in the current MySQL server. By including specific GTID sources, an administrator can control the subset of GTID ranges that govern the binlog position.
These properties will not be useful in some topologies, especially when the MySQL server from which the binlog is being read is the originating server for some of the transactions. However, these properties may be very useful in any topology where the connector is _only_ reading from replicas, so that the connector can be switched to another replica at any time. In some cases it may be easier to exclude all of the replicas' `server_uuid` values, while in other cases it may be easier to include all of the `server_uuid` values where transactions can originate.
Updated MysqlParser to return list of String for allowed enum and set values
And also added code fix to get a enum value at a particular index and for set option too.
Used debezium string utility to join list of string into deliminator seperated String.
Updating old test cases as per required to handle list of strings.
When the MySQL connector is reading the binlog, it outputs INFO log messages reporting status at an exponentially-increasing rate, starting at every 5 seconds and doubling until a max period of 1 hour. This output is useful when the connector starts to know that it is working, but thereafter the usefulness decreases. Once an hour is probably acceptable output.
This is not intended to replace the capturing of metrics, but is merely an aid to easily tell via the logs whether the connector continues to work.
Also improved the log message when the binlog reader stops to capture the total number of events recorded by Kafka Connect and the last recorded offset.
Added logic to verify that MySQL's row-level binlog is enabled, and whether it is likely that when snapshots are not performed that the binlog is likely to have been purged. Some situations will result in an error, while others are logged as warnings.