Commit Graph

3052 Commits

Author SHA1 Message Date
Randall Hauch
83b2d2eab6 DBZ-92 Added a few log statements upon MySQL connector task startup 2016-08-15 21:57:09 -05:00
Randall Hauch
b1277ed196 DBZ-94 Corrections to mark the end of the snapshot 2016-08-15 21:56:20 -05:00
Randall Hauch
b8fec14f7a Upgraded the Docker Maven Plugin 2016-08-15 13:02:37 -05:00
Randall Hauch
2792aa4555 Merge pull request #88 from rhauch/dbz-100
DBZ-100 Corrected MySQL connector's use of ENUM and SET values
2016-08-15 12:40:25 -05:00
Randall Hauch
ed7d1ee8e6 Merge pull request #87 from rhauch/dbz-62
DBZ-62 Upgraded to Kafka 0.10.0.1 and Zookeeper 3.4.8
2016-08-15 12:39:11 -05:00
Randall Hauch
2fea8c8ee0 DBZ-100 Corrected JavaDoc and renamed methods to be more explicit 2016-08-15 12:37:39 -05:00
Randall Hauch
918a523f12 DBZ-100 Changed the MongoDB connector to use a new JSON semantic type
Added a semantic type for JSON strings, and used it in the MongoDB connector.
2016-08-15 12:11:35 -05:00
Randall Hauch
db49f0b17b DBZ-100 Removed unused IsoTimestamp and IsoTime semantic types 2016-08-15 12:11:35 -05:00
Randall Hauch
d8a5d2b50f DBZ-100 Corrected MySQL connector's use of ENUM and SET values
The ENUM and SET values read from the binlog contain the indexes of the options that are included in the value, but this doesn't compared with the string values returned by MySQL and JDBC that contain the comma-separated options. With this change, the values read from the binlog will also be comma-separated strings.
2016-08-15 12:11:35 -05:00
Randall Hauch
478e7f8ca6 Merge pull request #86 from rhauch/dbz-91b
DBZ-91 Changed how temporal values are treated in MySQL connector
2016-08-15 12:10:45 -05:00
Randall Hauch
6b591fc9b0 DBZ-91 Added a unit test for temporal conversions
Also removed a non-unit-test test.
2016-08-15 10:29:16 -05:00
Randall Hauch
c2d210bbda DBZ-62 Upgraded to Kafka 0.10.0.1 and Zookeeper 3.4.8. 2016-08-11 12:31:59 -05:00
Randall Hauch
ba553c91e8 DBZ-91 Changed MicroTime to use INT64
There are more microseconds per day than can be represented with INT32, so this was changed to INT64.
2016-08-11 12:09:24 -05:00
Randall Hauch
19fc95fe08 DBZ-91 Simplified the temporal conversion functions to use primitives. 2016-08-11 10:48:38 -05:00
Randall Hauch
629542458e DBZ-91 Added option to force use Kafka Connect temporal types. 2016-08-11 10:48:07 -05:00
Randall Hauch
31641fb43e DBZ-91 Changed how temporal values are treated in MySQL connector
Rewrote how the MySQL connector converts temporal values to use schemas with names that identify the semantic
type of temporal value, and customized how the MySQL binlog client library creates Java object values from the
raw binlog events.

Several new "semantic" schema types were defined:

* `io.debezium.time.Year` represents a year number as an INT32 value (e.g., 2016, -345, etc.).
* `io.debezium.time.Date` represents a date by storing the epoch seconds (that is, the number of seconds past the epoch) as an INT64 value.
* `io.debezium.time.Time` represents a time by storing the milliseconds past midnight as an INT32 value.
* `io.debezium.time.MicroTime` represents a time by storing the microsconds past midnight as an INT32 value.
* `io.debezium.time.NanoTime` represents a time by storing the nanoseconds past midnight as an INT32 value.
* `io.debezium.time.Timestamp` represents a date and time (without timezone information) by storing the milliseconds past epoch as an INT64 value.
* `io.debezium.time.MicroTimestamp` represents a date and time (without timezone information) by storing the microseconds past epoch as an INT64 value.
* `io.debezium.time.NanoTimestamp` represents a date and time (without timezone information) by storing the nanoseconds past epoch as an INT64 value.
* `io.debezium.time.ZonedTime` represents a time with timezone and optional fractions of a second (but no date) by storing the ISO8601 form as a STRING value (e.g., `10:15:30+01:00`)
* `io.debezium.time.ZonedTimestamp` represents a date and time with timezone and optional fractions of a second by storing the ISO8601 form as a STRING value (e.g., `2011-12-03T10:15:30.030431+01:00`)

This range of semantic types allows for a far more accurate representation in the events of the temporal values stored within the database. The MySQL connector chooses the semantic type based upon the precision of the MySQL type (e.g., `TIMESTAMP(6)` will be represented with `io.debezium.time.MicroTimestamp`, whereas `TIMESTAMP(3)` will be represented with `io.debezium.time.Timestamp`). This ensures that the events do not lose precision and that the semantics of the database column values are retained in the events even though the values are represented with primitive values.

Obviously these Kafka Connect schema representations are different and more precise than the built-in `org.apache.kafka.connect.data.Date`, `org.apache.kafka.connect.data.Time`, and `org.apache.kafka.connect.data.Timestamp` logical types provided by Kafka Connect and used by the MySQL connector in all 0.2.x and 0.1.x versions. Migration to the new MySQL connector should be possible, although consumers may still need to know about these types to properly handle temporal values and the correct precision (i.e., consumers can just assume all date INT64 values represent milliseconds).

The MySQL binlog client library converted the raw binary event information to JDBC types using a local Calendar instance, which obviously incorporates the local timezone and cannot retain more than millisecond precision. This change extends the library's deserializers to instead use the Java 8 `javax.time` classes and to retain the exact semantics of the database values and to not lose any precisions (since the `javax.time` classes have nanosecond precision).

The same logic is also used to convert the JDBC values obtained during a snapshot from the MySQL Connect/J JDBC driver. The latter has a few quirks, such as not returning any fractional seconds for `TIME` columns, even though `java.sql.Time` can store up to milliseconds.

Most of the logic of the conversions of values and mapping to Kafka Connect schemas is handled in the new `JdbcValueConverters`, which was extracted from the existing `TableSchemaBuilder`. The MySQL connector reuses and actually extends the `JdbcValueConverters` class with its own `MySqlValueConverters` class that also adds support for MySQL-specific types such as `YEAR`. Other connectors whose values are based on JDBC types should be able to reuse and/or extend the `JdbcValueConverters` class.

Integration tests that deal with temporal types were modified to use proper expected values and comparisons.
2016-08-10 15:51:07 -05:00
Randall Hauch
774f105670 Merge pull request #85 from hchiorean/DBZ-95
DBZ-95 Adds support for `null` binlog filename in certain cases
2016-08-10 15:15:24 -05:00
Randall Hauch
bcaacbd29b Merge pull request #84 from hchiorean/DBZ-92-97
DBZ-92, DBZ-97 Makes logging more verbose and changes the snapshot reader to produce separate events for each DDL change
2016-08-10 15:11:18 -05:00
Horia Chiorean
616e7dea72 DBZ-95 Adds support for null binlog filename in certain cases 2016-08-09 20:03:43 +03:00
Horia Chiorean
008263ea00 DBZ-92, DBZ-97 Makes logging more verbose and changes the snapshot reader to produce separate events for each DDL change 2016-08-09 19:04:51 +03:00
Randall Hauch
1e2cadf30e Merge pull request #83 from hchiorean/DBZ-96
DBZ-96 Removes some asserts on tables created by another test case
2016-08-08 09:38:44 -05:00
Horia Chiorean
ab24f013d1 DBZ-96 Removes some asserts on tables created by another test case 2016-08-08 14:25:38 +03:00
Randall Hauch
cc496201d2 Merge pull request #82 from criccomini/patch-1
Update README.md
2016-08-05 15:32:44 -05:00
Chris Riccomini
265c2e8c88 Update README.md 2016-08-05 13:31:46 -07:00
Randall Hauch
5cd6887d92 Merge pull request #81 from rhauch/dbz-94
DBZ-94 Added support for copying very large tables during snapshot
2016-08-04 16:34:09 -05:00
Randall Hauch
2ae26819af DBZ-94 Added support for copying very large tables during snapshot
By default the MySQL JDBC driver will put the entire result set into memory, which obviously doesn't work for tables of even moderate sizes. This change adds support for streaming rows in result sets when the tables have more than a configurable number of rows (defaults to 1,000).

This posed a problem for how we were previously finding the last row in the last table; the MySQL driver does not support `ResultSet.isLast()` on result sets that are streamed. Instead, this commit wraps the consumer to which the snapshot reader writes all source records, with a consumer that buffers the last record. When the snapshot completes, the offset is updated (denoting the end of the snapshot) and set on the last buffered record before that record is flushed to the normal consumer. This should add minimal overhead while simplifying the logic to ensure the last source record has the updated offset.

This also improves the log output of the snapshot process.
2016-08-04 16:06:50 -05:00
Randall Hauch
c159ca88cb Merge pull request #79 from hchiorean/DBZ-92
DBZ-92 Adds more logging information during MySQL snapshot recreation
2016-08-03 09:22:07 -05:00
Horia Chiorean
bb1b7d5734 DBZ-92 Adds more logging information during MySQL snapshot recreation 2016-08-03 16:54:17 +03:00
Randall Hauch
19773fc454 Merge pull request #80 from rhauch/fix-alt-mysql-build
Corrected build to look for updated output of alt-mysql container
2016-08-03 08:26:02 -05:00
Randall Hauch
b9e9f0fdf9 Corrected build to look for updated output of alt-mysql container
The mysql:5.7 docker image changed its output to be more like mysql/mysql-server:5.7, and this broke our build because of what our build is looking for while waiting to for the server to completely intialize. Simply changing the pattern corrects the problem.
2016-08-03 08:24:13 -05:00
Randall Hauch
6894e9c30d Merge pull request #78 from hchiorean/mysql-tests-fix
Fixes some more tests around date handling in the MySQL connector
2016-08-02 20:12:04 -05:00
Randall Hauch
8cb39eacf0 Reverted back to 0.3.0-SNAPSHOT, since the 0.3 candidate release was not acceptable. 2016-08-01 12:25:58 -05:00
Horia Chiorean
eaf295fbf0 Fixes some more tests around date handling in the MySQL connector 2016-07-29 08:57:47 +03:00
Randall Hauch
096885ec8d Merge pull request #77 from hchiorean/tests-fix-core
Fixes a couple of test related issues for debezium-core
2016-07-26 09:00:41 -05:00
Horia Chiorean
a6dddaed92 Fixes a couple of test related issues for debezium-core
* fixes a java.sql.Date conversion test to take into account zone offsets
* makes sure the ZK DB is closed during testing, otherwise file handles may leak and cause test failures
2016-07-26 14:17:31 +03:00
Randall Hauch
a8efb99b7d Updated changelog 2016-07-25 18:32:58 -05:00
Randall Hauch
00226a4591 Updated changelog 2016-07-25 18:32:07 -05:00
Randall Hauch
7993cfdb1e Updated changelog 2016-07-25 18:29:49 -05:00
Randall Hauch
517272278d [maven-release-plugin] prepare for next development iteration 2016-07-25 17:50:31 -05:00
Randall Hauch
b89296e646 [maven-release-plugin] prepare release v0.3.0 2016-07-25 17:50:31 -05:00
Randall Hauch
cb8904819c Upgraded Docker Maven Plugin to 0.15.12 2016-07-25 17:46:35 -05:00
Randall Hauch
a8fa33e44b DBZ-85 Corrected log statements to be debug 2016-07-25 16:59:46 -05:00
Randall Hauch
e3a00e1992 DBZ-87 Added support for SIGNED in all numeric types in MySQL 2016-07-25 16:07:56 -05:00
Randall Hauch
c14cacc059 Merge pull request #75 from rhauch/dbz-62
DBZ-62 Upgraded to Kafka and Kafka Connect 0.10.0.0
2016-07-25 15:56:56 -05:00
Randall Hauch
447acb797d DBZ-62 Upgraded to Kafka and Kafka Connect 0.10.0.0
Upgraded from Kafka 0.9.0.1 to Kafka 0.10.0. The only required change was to override the `Connector.config()` method, which returns `null` or a `ConfigDef` instance that contains detailed metadata for each of the configuration fields, including supporting recommended values and marking fields as not visible (e.g., if they don't make sense given other configuration field values). This can be used by user interfaces to data-drive the configuration of a connector. Also, the default validation logic of the Connector implementations uses a `Validator` that is pretty restrictive in its functionality.

Debezium already had a fairly decent and simple `Configuration` framework. After several attempts to try and merge these concepts, reconciling the two validation mechanisms was very complicated and involved a lot of changes. It was easier to simply continue Debezium-specific validation and to override the `Connector.validate(...)` method to use Debezium's `Configuration`-based validation. Connector-based validation logic includes determining recommended values, so Debezium's `Field` class (used to define each configuration property) was enhanced with a new `Recommender` class that is similar to Kafka's.

Additional integration tests were added to verify that the `ConfigDef` result is acceptable and that the new connector validation logic works as expected, including getting recommended values for some fields (e.g., database names, table/collection names) from MySQL and MongoDB by connecting and dynamically reading the values. This was done in a way that remains backward compatible with the regular expression formats of these fields, but in a user interface that uses the `ConfigDef` mechanism the user can simply select the databases and table/collection identifiers.
2016-07-25 14:21:31 -05:00
Randall Hauch
4f749e84e2 Merge pull request #74 from rhauch/dbz-85
DBZ-85 Added test case and made small correction to temporal values
2016-07-21 09:03:49 -05:00
Randall Hauch
30777e3345 DBZ-85 Added test case and made correction to temporal values
Added an integration test case to diagnose the loss of the fractional seconds from MySQL temporal values. The problem appears to be a bug in the MySQL Binary Log Connector library that we used, and this bug was reported as https://github.com/shyiko/mysql-binlog-connector-java/issues/103. That was fixed in version 0.3.2 of the library, which Stanley was kind enough to release for us.

During testing, though, several issues were discovered in how temporal values are handled and converted from the MySQL events, through the MySQL Binary Log client library, and through the Debezium MySQL connector to conform with Kafka Connect's various temporal logical schema types. Most of the issues involved converting most of the temporal values from local time zone (which is how they are created by the MySQL Binary Log client) into UTC (which is how Kafka Connect expects them). Really, java.util.Date doesn't have time zone information and instead tracks the number of milliseconds past epoch, but the conversion of normal timestamp information to the milliseconds past epoch in UTC depends on the time zone in which that conversion happens.
2016-07-20 17:07:56 -05:00
Randall Hauch
4a84a1d8d9 Merge pull request #73 from rhauch/dbz-87
DBZ-87 Changed mapping of MySQL TINYINT and SMALLINT columns from INT32 to INT16
2016-07-19 11:25:01 -05:00
Randall Hauch
a5f4d0bf31 DBZ-87 Changed mapping of MySQL TINYINT and SMALLINT columns from INT32 to INT16
The MySQL connector now maps TINYINT and SMALLINT columns to INT16 (rather than INT32) because INT16 is smaller and yet still large enough for all TINYINT and SMALLINT values. Note that the range of TINYINT values is either -128 to 127 for signed or 0 to 255 for unsigned, and thus INT8 is not an acceptable choice since it can only handle values in the range 0 to 255. Additionally, the JDBC Specification also suggests the proper Java type for SQL-99's TINYINT is short, which maps to Kafka Connect's INT16.

This change will be backward compatible, although the generated Kafka Connect schema will be different than in previous versions. This shouldn't cause a problem, since clients should expect to handle schema changes, and this schema change does comply with Avro schema evolution rules.
2016-07-19 11:11:05 -05:00
Randall Hauch
fc36fe1d54 Merge pull request #72 from rhauch/dbz-84
DBZ-84 Tried to replicate error with MySQL TINYINT columns
2016-07-19 11:05:55 -05:00