Commit Graph

144 Commits

Author SHA1 Message Date
Randall Hauch
094f9a4925 DBZ-139 Corrected binlog timestamp handling
MySQL records the timestamp with second precision in binlog events, but the library we use multiplies by 1000 to return the padded value in milliseconds (even though the value still has second precision). The BinlogReader converts this back to seconds, so the SourceInfo should not also be dividing by 1000.
2016-10-20 09:31:02 -05:00
Randall Hauch
25b8055642 DBZ-134 Enabled JMX metrics for MySQL connector
Added an MXBean for the MySQL connector that captures various metrics while reading the binlog.
2016-10-19 16:48:11 -05:00
Randall Hauch
4a62b09ead DBZ-126 Added support for MySQL JSON type
Adds support for MySQL 5.7's `JSON` type, which is capable of holding JSON objects, JSON arrays, and scalar values. The Debezium MySQL connector represents `JSON` values as string with a `io.debezium.data.Json` semantic type (which is basically a string schema that has a special name to denote the semantics), and the _contents_ of that string will be the JSON representation of the object, array, or scalar value.
2016-10-18 17:32:55 -05:00
Randall Hauch
2f5772712a DBZ-129 Fix for GTID updates
Workaround for https://github.com/shyiko/mysql-binlog-connector-java/issues/122.
2016-10-18 14:32:06 -05:00
Randall Hauch
7387654bfa DBZ-129 Additional improvements for MySQL connector GTID-based startup
Added more integration tests to verify the behavior of the MySQL connector when it is (re)starting using GTIDs.
2016-10-18 14:30:10 -05:00
Randall Hauch
305c4c5ac6 DBZ-129 MySQL connector can now use subset of GTID set when reconnecting to MySQL
When a connector is originally connected to a MySQL server, it will record the GTID set that identifies the position in the binlog. When all of the interesting transactions originate on a different server (i.e., the server we're listening to is a replica), the server we're listening to will still include some transactions in the binlog (e.g., for the information schema, performance, or other internal databases), and so the GTID set will include a GTID range for our server. If we stop the connector and want to point it to a different MySQL server, asking MySQL to position the binlog using the complete GTID set (including the GTID range for our old replica) will cause an error, since the new server does not have any GTID ranges from the old replica. Therefore, the connector needs to be able to exclude some GTID ranges that originated on the original replica, using the `server_uuid` property of the replica server.

This change adds two configuration properties: `gtid.source.includes` and `gtid.source.excludes`. Both are optional, but at most only one of these can be used. These properties contain a comma-separated list of GTID sources (i.e., the `server_uuid` value for the server where the transaction originated) or regular expressions matching GTID sources, and upon connector startup the connector uses the list to filter the previously-recorded GTID set against the available GTID set in the current MySQL server. By including specific GTID sources, an administrator can control the subset of GTID ranges that govern the binlog position.

These properties will not be useful in some topologies, especially when the MySQL server from which the binlog is being read is the originating server for some of the transactions. However, these properties may be very useful in any topology where the connector is _only_ reading from replicas, so that the connector can be switched to another replica at any time. In some cases it may be easier to exclude all of the replicas' `server_uuid` values, while in other cases it may be easier to include all of the `server_uuid` values where transactions can originate.
2016-10-18 14:29:58 -05:00
Horia Chiorean
1a99f5bbc7 DBZ-135 Fixes the parsing of line separators by GtidSet (#118) 2016-10-13 10:18:33 -05:00
Randall Hauch
d955ed2e4b DBZ-132 Cleanup of code (#117)
Additional cleanup of changes made for DBZ-132.
2016-10-11 15:36:07 -05:00
Prannoy Mittal
301d60411f Using debezium String Library to get join to list of strings 2016-10-12 00:53:36 +05:30
Prannoy Mittal
a36700e51b Enum and Set were assumed to single character.
Updated MysqlParser to return list of String for allowed enum and set values
And also added code fix to get a enum value at a particular index and for set option too.
Used debezium string utility to join list of string into deliminator seperated String.
Updating old test cases as per required to handle list of strings.
2016-10-12 00:41:08 +05:30
Willie
fda76c875e DBZ-115 Add support to recognize older row_event formats 2016-10-08 11:42:12 -07:00
Randall Hauch
99a86ad289 Merge pull request #112 from rhauch/dbz-123
DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
2016-10-07 17:16:37 -05:00
Randall Hauch
beb47dd2de DBZ-131 Improved logging while reading binlog
When the MySQL connector is reading the binlog, it outputs INFO log messages reporting status at an exponentially-increasing rate, starting at every 5 seconds and doubling until a max period of 1 hour. This output is useful when the connector starts to know that it is working, but thereafter the usefulness decreases. Once an hour is probably acceptable output.

This is not intended to replace the capturing of metrics, but is merely an aid to easily tell via the logs whether the connector continues to work.

Also improved the log message when the binlog reader stops to capture the total number of events recorded by Kafka Connect and the last recorded offset.
2016-10-07 17:10:01 -05:00
Randall Hauch
50eb4094ac DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
The DDL parser now properly handles bit-set literals, and several minor case-sensitivity bugs dealing with other escaped literals.
2016-10-06 13:25:38 -05:00
Randall Hauch
64bab3b3cf DBZ-104 Added test to verify behavior of CREATE TABLE LIKE expressions with and without snapshot 2016-09-23 12:11:38 -05:00
Randall Hauch
dc03335049 DBZ-128 Additional fix to MySQL compatibility message. 2016-09-23 11:03:40 -05:00
Randall Hauch
7654321cfd DBZ-128 Improved checking of MySQL status and configuration
Added logic to verify that MySQL's row-level binlog is enabled, and whether it is likely that when snapshots are not performed that the binlog is likely to have been purged. Some situations will result in an error, while others are logged as warnings.
2016-09-22 17:06:14 -05:00
Randall Hauch
730603976d Merge pull request #107 from rhauch/dbz-123
DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
2016-09-21 15:22:00 -05:00
Randall Hauch
bcf60940db DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
Corrected how the MySQL connector is treating columns of type `BIT(n)`, where _n_ is the number of bits in the value. When  `n=1`, the resulting values are booleans; when `n>1`, the resulting values are little endian `byte[]` that have the minimum number of bytes to hold the `n` bits.
2016-09-21 15:04:20 -05:00
Randall Hauch
9aae6c62d9 DBZ-124 Eliminated the JMX "already registered" warning in the MySQL connector
The `KafkaDatabaseHistory` was always creating a new producer whenever its `start()` method was called, even if it were called more than once. And, the `MySqlSchema` was calling `start()` twice, resulting in multiple producers being created and registered with JMX. Both issues were fixed.

Also, UUIDs were being used as the name of the JMX MBean for the producer, unless the `database.history.consumer.client.id` and `database.history.producer.client.id` properties were being explicitly set. Now, the MySQL connector will by default set the `client.id` property on both the database history's Kafka consumer and producer to `{connectorName}-dbhistory`. Of course, the `database.history.consumer.client.id` and `database.history.producer.client.id` properties can still be set to define the name of the producer and consumer.
2016-09-21 10:05:15 -05:00
Randall Hauch
54b737edc1 DBZ-114 MySQL connector now handles "zero-value" dates and timestamps
MySQL supports "zero-value" dates and timestamps, but these cannot be represented as valid dates or timestamps using the Java types. For example, the zero-value `0000-00-00` for a date has what Java considers to be an invalid month and day-of-the-month.

This commit changes how the MySQL connector handles these values to not throw exceptions. When columns allow nulls, such values will be treated as nulls; when columns do not allow null values, these values will be converted to a "zero-value" for the corresponding Java representation (e.g., the epoch day or timestamp). A new test case verifies the behaviors.
2016-09-21 09:23:12 -05:00
Akshath
8a1a9c3542 Changed server.id to support Long instead of Int 2016-09-06 15:09:05 -07:00
Randall Hauch
de1edce895 DBZ-116 Improved logging when MySQL connector is reading binlog
The MySQL connector now outputs an INFO log message whenever its task's `poll()` method returns a non-empty list of `SourceRecord` objects, where the message includes the number of records and the offset of the last record.
2016-09-06 11:31:54 -05:00
Randall Hauch
330a27ce52 Merge pull request #97 from rhauch/dbz-102
DBZ-102 MySQL connector support for column charsets
2016-08-29 15:12:24 -05:00
Randall Hauch
cc8f45309a Merge pull request #98 from rhauch/dbz-112
DBZ-112 Corrected the logic of setting the MySQL driver's SSL-related system properties
2016-08-29 15:00:34 -05:00
Randall Hauch
5cef237aac DBZ-111 Corrected GTID set comparison logic of the MySQL connector
The MySQL connector was improperly comparing the GTID set required by the connector to the GTID set of the MySQL instance. In particular, when the GTID set of the MySQL server contained a newline character, the comparison logic failed. (This should have been fixed as part of DBZ-107.)
2016-08-29 14:53:21 -05:00
Randall Hauch
0861518788 DBZ-112 Corrected the logic of setting the MySQL driver's SSL-related system properties 2016-08-29 14:27:43 -05:00
Randall Hauch
a46a427b57 DBZ-102 Added MySQL integration test that verifies character encodings
Added a table with data to one of the MySQL databases used in the integration tests. It verifies that the UTF-8 data stored in the table is able to be handled properly when obtaining a snapshot and reading the binlog.
2016-08-29 13:42:10 -05:00
Randall Hauch
cc94bbc697 DBZ-102 MySQL connector now processes character sets
The MySQL binlog events contain the binary representation of string-like values as encoded per the column's character set. Properly decoding these into Java strings requires capturing the column, table, and database character set when parsing the DDL statements.

Unfortunately, MySQL DDL allows columns (at the time the columns are created or modified) to inherit the default character set for the table, or if that is not defined the default character set for the database, or if that is not defined the character set for the server. So, in addition to modifying the MySQL DDL parser to support capturing the character set name for each column, it also had to be changed to know what these default character set names are.

The default character sets are all available via MySQL server/session/local variables. Although strictly speaking the character set variables cannot be set globally, MySQL DDL does allow session and local variables to be set with `SET` statements. Therefore, this commit enhances the MySQL DDL parser to parse `SET` statements and to track the various global, session, and local variables as seen by the DDL parser. Upon connector startup, a subset of server variables (related to character sets and collations) are read from the database via JDBC and used to initialize the DDL parser via `SET` methods.

In addition to initializing the DDL parser with the system variables related to character sets and collation, it is important to also capture the server and database default character sets in the database history so that the correct character sets are used for columns even when the default character sets have changed on the database and/or the server. Therefore, upon startup or snapshot the MySQL connector records in the database history a `SET` statement for the `character_set_server` and `collation_server` system variables so that, upon a later restart, the history's DDL statements can be re-parsed with the correct default server and database character sets. Also, when the MySQL connector reloads the database history (upon startup), the recorded default server character set is compared with the MySQL instance's current server character set, and if they are different the current character set is recorded with a new `SET` statement.

These extra steps ensure that the connector use the correct character set for each column, even when the connector restarts and reloads the database history captured by a previous version of the connector. IOW, the MySQL connector can be safely upgraded, and the new version will correctly start using the columns' character sets to decode the string-like values.
2016-08-29 12:19:24 -05:00
Randall Hauch
257e81c540 DBZ-102 MySQL in-memory models of tables capture column character sets
The DDL parser and in-memory models of the relational schemas were changed to capture the character set for each column whose type is a string (e.g., `CHAR`, `VARCHAR`, etc.). This required handling `SET` statements used to change the system variables that hold the names of the default character set for the server and for each database. So, even if a column does not explicitly define the character set, the column's actual character set is identified from the table's character set, which might default to the current database's character set, which if not set defaults to the system character set.

These changes merely affect how MySQL DDL is parsed and the in-memory relational schema representation to accommodate the character set at various levels. It does not change the behavior of the MySQL connector; that will be done in a subsequent commit.

All tests pass with these changes, including quite a few additional tests for the new functionality.
2016-08-29 11:50:51 -05:00
Randall Hauch
93d0fae02b DBZ-109 Captured MySQL error code and SQLSTATE code in exceptions
The binlog reader and JDBC operations might throw exceptions with this information, so in these cases the connector now captures the error code and SQLSTATE code from the exception and includes them in the message.
2016-08-25 08:11:50 -05:00
Randall Hauch
638b459484 DBZ-108 Removed the TimeZoneAdapter and test, which is no longer used 2016-08-24 16:31:35 -05:00
Randall Hauch
4de56fd657 Merge pull request #94 from hchiorean/DZB-header-fix
Fixes the DBZ header required by checkstyle
2016-08-24 14:28:43 -05:00
Randall Hauch
ce2b2db80c DBZ-99 Added support for MySQL connector to connect securely to MySQL
Changed the MySQL connector to have several new configuration properties for setting up the SSL key store and trust store (which can be used in place of System or JDK properties) used for MySQL secure connections, and another property to specify what kind of SSL connection be used.

Modified several integration tests to ensure all MySQL connections are made with `useSSL=false`.
2016-08-24 13:27:35 -05:00
Horia Chiorean
2732d26ff0 Fixes the DBZ header required by checkstyle
This commit removes an extra space character from the first blank line of the header
2016-08-24 13:41:15 +03:00
Randall Hauch
40318f87a3 Merge pull request #92 from rhauch/dbz-107
DBZ-107 MySQL Connector should tolerate newlines in GTID sets read during snapshot
2016-08-23 17:45:58 -05:00
Randall Hauch
3051e3b2d7 DBZ-107 MySQL Connector should tolerate newlines in GTID sets read during snapshot 2016-08-23 17:37:48 -05:00
Randall Hauch
448d514c81 DBZ-106 Corrected the MySQL DDL parser to properly handled quoted keywords as column names. 2016-08-23 17:03:53 -05:00
Randall Hauch
e86fb83459 [maven-release-plugin] prepare for next development iteration 2016-08-16 09:56:47 -05:00
Randall Hauch
ccdb0a1a63 [maven-release-plugin] prepare release v0.3.0 2016-08-16 09:56:47 -05:00
Randall Hauch
83b2d2eab6 DBZ-92 Added a few log statements upon MySQL connector task startup 2016-08-15 21:57:09 -05:00
Randall Hauch
b1277ed196 DBZ-94 Corrections to mark the end of the snapshot 2016-08-15 21:56:20 -05:00
Randall Hauch
2fea8c8ee0 DBZ-100 Corrected JavaDoc and renamed methods to be more explicit 2016-08-15 12:37:39 -05:00
Randall Hauch
d8a5d2b50f DBZ-100 Corrected MySQL connector's use of ENUM and SET values
The ENUM and SET values read from the binlog contain the indexes of the options that are included in the value, but this doesn't compared with the string values returned by MySQL and JDBC that contain the comma-separated options. With this change, the values read from the binlog will also be comma-separated strings.
2016-08-15 12:11:35 -05:00
Randall Hauch
629542458e DBZ-91 Added option to force use Kafka Connect temporal types. 2016-08-11 10:48:07 -05:00
Randall Hauch
31641fb43e DBZ-91 Changed how temporal values are treated in MySQL connector
Rewrote how the MySQL connector converts temporal values to use schemas with names that identify the semantic
type of temporal value, and customized how the MySQL binlog client library creates Java object values from the
raw binlog events.

Several new "semantic" schema types were defined:

* `io.debezium.time.Year` represents a year number as an INT32 value (e.g., 2016, -345, etc.).
* `io.debezium.time.Date` represents a date by storing the epoch seconds (that is, the number of seconds past the epoch) as an INT64 value.
* `io.debezium.time.Time` represents a time by storing the milliseconds past midnight as an INT32 value.
* `io.debezium.time.MicroTime` represents a time by storing the microsconds past midnight as an INT32 value.
* `io.debezium.time.NanoTime` represents a time by storing the nanoseconds past midnight as an INT32 value.
* `io.debezium.time.Timestamp` represents a date and time (without timezone information) by storing the milliseconds past epoch as an INT64 value.
* `io.debezium.time.MicroTimestamp` represents a date and time (without timezone information) by storing the microseconds past epoch as an INT64 value.
* `io.debezium.time.NanoTimestamp` represents a date and time (without timezone information) by storing the nanoseconds past epoch as an INT64 value.
* `io.debezium.time.ZonedTime` represents a time with timezone and optional fractions of a second (but no date) by storing the ISO8601 form as a STRING value (e.g., `10:15:30+01:00`)
* `io.debezium.time.ZonedTimestamp` represents a date and time with timezone and optional fractions of a second by storing the ISO8601 form as a STRING value (e.g., `2011-12-03T10:15:30.030431+01:00`)

This range of semantic types allows for a far more accurate representation in the events of the temporal values stored within the database. The MySQL connector chooses the semantic type based upon the precision of the MySQL type (e.g., `TIMESTAMP(6)` will be represented with `io.debezium.time.MicroTimestamp`, whereas `TIMESTAMP(3)` will be represented with `io.debezium.time.Timestamp`). This ensures that the events do not lose precision and that the semantics of the database column values are retained in the events even though the values are represented with primitive values.

Obviously these Kafka Connect schema representations are different and more precise than the built-in `org.apache.kafka.connect.data.Date`, `org.apache.kafka.connect.data.Time`, and `org.apache.kafka.connect.data.Timestamp` logical types provided by Kafka Connect and used by the MySQL connector in all 0.2.x and 0.1.x versions. Migration to the new MySQL connector should be possible, although consumers may still need to know about these types to properly handle temporal values and the correct precision (i.e., consumers can just assume all date INT64 values represent milliseconds).

The MySQL binlog client library converted the raw binary event information to JDBC types using a local Calendar instance, which obviously incorporates the local timezone and cannot retain more than millisecond precision. This change extends the library's deserializers to instead use the Java 8 `javax.time` classes and to retain the exact semantics of the database values and to not lose any precisions (since the `javax.time` classes have nanosecond precision).

The same logic is also used to convert the JDBC values obtained during a snapshot from the MySQL Connect/J JDBC driver. The latter has a few quirks, such as not returning any fractional seconds for `TIME` columns, even though `java.sql.Time` can store up to milliseconds.

Most of the logic of the conversions of values and mapping to Kafka Connect schemas is handled in the new `JdbcValueConverters`, which was extracted from the existing `TableSchemaBuilder`. The MySQL connector reuses and actually extends the `JdbcValueConverters` class with its own `MySqlValueConverters` class that also adds support for MySQL-specific types such as `YEAR`. Other connectors whose values are based on JDBC types should be able to reuse and/or extend the `JdbcValueConverters` class.

Integration tests that deal with temporal types were modified to use proper expected values and comparisons.
2016-08-10 15:51:07 -05:00
Randall Hauch
774f105670 Merge pull request #85 from hchiorean/DBZ-95
DBZ-95 Adds support for `null` binlog filename in certain cases
2016-08-10 15:15:24 -05:00
Horia Chiorean
616e7dea72 DBZ-95 Adds support for null binlog filename in certain cases 2016-08-09 20:03:43 +03:00
Horia Chiorean
008263ea00 DBZ-92, DBZ-97 Makes logging more verbose and changes the snapshot reader to produce separate events for each DDL change 2016-08-09 19:04:51 +03:00
Horia Chiorean
ab24f013d1 DBZ-96 Removes some asserts on tables created by another test case 2016-08-08 14:25:38 +03:00