Commit Graph

340 Commits

Author SHA1 Message Date
Randall Hauch
1c7aabf14f Changed MySQL file comment format to use standard prefix 2016-06-22 18:19:50 -05:00
Randall Hauch
49150689af Updated changelog for the 0.2.2 release 2016-06-22 16:15:06 -05:00
Randall Hauch
83c44ba046 Merge pull request #65 from rhauch/dbz-79
DBZ-79 Changed public methods in GtidSet to reflect the MySQL Binary Log Connector's class
2016-06-16 11:06:16 -05:00
Randall Hauch
a589d9ea84 DBZ-79 Changed public methods in GtidSet to reflect the MySQL Binary Log Connector's class
Removed several of the `GtidSet` convenience methods that are not in the [improved](https://github.com/shyiko/mysql-binlog-connector-java/pull/100) `com.github.shyiko.mysql.binlog.GtidSet` class. Getting these out of our API will make it easier to reuse the improved `com.github.shyiko.mysql.binlog.GtidSet` class.
2016-06-16 10:04:02 -05:00
Randall Hauch
88ceab3c48 Merge pull request #63 from rhauch/dbz-73
DBZ-73, DBZ-77 Added offset tests and fix for incomplete snapshot bug
2016-06-15 12:23:16 -05:00
Randall Hauch
d9cca5d254 DBZ-77 Corrected completion of offset snapshot mode
The snapshot mode within the offsets now are marked as complete with the last source record produced during the snapshot. This is the only sure way to update the offset.

Note that the `source` field shows the snapshot is in effect for _all_ records produced during the snapshot, including the very last one. This distinction w/r/t the offset was made possible due to recent changes for DBZ-73.

Previously, when the snapshot reader completed all generation of records, it then attempted to record an empty DDL statement. However, since this statement had no net effect on the schemas, no source record was produced and thus the offset's snapshot mode was never changed. Consequently, if the connector were stopped immediately after the snapshot completed but before other events could be read or produced, upon restart the connector would perform another snapshot.
2016-06-15 12:01:16 -05:00
Randall Hauch
ed27faa5f6 DBZ-73 Added unit tests to verify behavior of SourceInfo 2016-06-15 11:51:42 -05:00
Randall Hauch
84427e3648 Merge pull request #61 from rhauch/dbz-73
DBZ-73, DBZ-76 Corrected how binlog coordinates are recorded and put into change events
2016-06-14 17:57:12 -05:00
Randall Hauch
49322dc9c1 DBZ-73, DBZ-76 Corrected how binlog coordinates are recorded and put into change events
Fixes two issues with how the binlog coordinates are handled.

The first, DBZ-73, fixes how the offsets are recording the _next_ binlog coordinates within the offsets, which is fine for single-row events but which can result in dropped events should Kafka Connect flush the offset of some but not all of the rows before the Kafka Connect crashes. Upon restart, the offset contains the binlog coordinates for the _next_ event, so any of the last rows from the previous events will be lost.

With this fix, the offset used with all but the last row (in the binlog event) has the binlog coordinates of the current event, with the event row number set to be the next row that needs to be processed. The offset for the last row will have the binlog coordinates of the next event.

The second issue, DBZ-76, is somewhat related: the `source` field of the change events has the binlog coordinates of the _next_ issue. The fix involves putting the binlog coordinates for the _current_ event into the `source` field.

Both of these issues are related and influenced a fix that could address both problems. Essentially, the `SourceInfo` is now recording the previous and next position, and the next and previous row numbers. The offset is created with parameters that specify the row number and the total number of rows, so this method correctly adjusts the binlog coordinates of the offset. The `struct` field produces the value for the `source` field, and it is always using the previous position and previous row number that reflect the change event in which it is used.
2016-06-14 17:43:58 -05:00
Randall Hauch
f02c1458ce Updated RELEASE.md with additional validation steps 2016-06-10 14:03:47 -05:00
Randall Hauch
f565932dd2 Added commit log for 0.2.1 and placeholder for 0.3 changes 2016-06-10 10:06:22 -05:00
Randall Hauch
270150bcad DBZ-72 Corrected the naming of the Schemas for the keys and values 2016-06-09 21:30:29 -05:00
Randall Hauch
0f3ed9f50f DBZ-71 Corrected MySQL connector plugin archives and upgraded MySQL JDBC driver from 5.1.38 to 5.1.39 (the latest) 2016-06-09 21:15:34 -05:00
Randall Hauch
d2e930847c Documented the release process 2016-06-08 14:38:39 -05:00
Randall Hauch
6749518f66 [maven-release-plugin] prepare for next development iteration 2016-06-08 13:00:50 -05:00
Randall Hauch
d5bbb116ed [maven-release-plugin] prepare release v0.2.0 2016-06-08 13:00:50 -05:00
Randall Hauch
3b7db43bf9 Updated change log for 0.2 2016-06-08 12:56:50 -05:00
Randall Hauch
cc11f23cd0 Merge pull request #58 from rhauch/dbz-37
DBZ-37 Changed build to support running integration tests against multiple MySQL configurations
2016-06-08 12:00:42 -05:00
Randall Hauch
ff49ba1742 DBZ-37 Renamed MySQL Docker images used in integration tests 2016-06-08 11:45:35 -05:00
Randall Hauch
d63a2e17a0 DBZ-37 Added documentation of various profiles to the MySQL module's README 2016-06-08 11:19:03 -05:00
Randall Hauch
825dee3eab Changed Travis build to use assembly profile 2016-06-08 11:03:43 -05:00
Randall Hauch
3c7882ee9d DBZ-37 Run integration tests against MySQL and MySQL w/ GTIDs
Changed the build so that the `assembly` profile runs the MySQL integration tests three times, once against each of the three MySQL configurations:

# MySQL server w/o GTIDs
# MySQL server w/ GTIDs
# The Docker team's MySQL server image w/o GTIDs

The normal profiles are still available:

# The default profile runs the integration tests once against MySQL server w/o GTIDs
# `gtid-mysql` runs the integration tests against MySQL server w/ GTIDs
# `alt-mysql` runs the integration tests against the Docker team's MySQL server image w/o GTIDs
# `skip-integration-tests` (or `-DskipITs`) skips the integration tests altogether
2016-06-08 11:03:03 -05:00
Randall Hauch
b80ed3d5ed Merge pull request #57 from rhauch/pom-fix
Removed duplicate versions in POMs
2016-06-08 10:05:32 -05:00
Randall Hauch
cf26a5c4e0 Removed duplicate versions in POMs 2016-06-08 09:46:05 -05:00
Randall Hauch
0a9133d276 Merge pull request #56 from rhauch/dbz-61
DBZ-61 Improved MySQL connector's handling of binary values
2016-06-07 20:32:28 -05:00
Randall Hauch
a143871abd DBZ-61 Improved MySQL connector's handling of binary values
Binary values read from the MySQL binlog may include strings, in which case they need to be converted to binary values.

Interestingly, work on this uncovered [KAFKA-3803](https://issues.apache.org/jira/browse/KAFKA-3803) whereby Kafka Connect's `Struct.equals` method does not properly handle comparing `byte[]` values. Upon researching the problem and potentially supplying a patch, it was discovered that the Kafka Connect codebase and the Avro converter all use `ByteBuffer` objects rather than `byte[]`. Consequently, the Debezium code that converts JDBC values to Kafka Connect values was changed to return `ByteBuffer` objects rather than `byte[]` objects.

Unfortunately, the JSON converter rehydrates objects with just `byte[]`, so that still means that Debezium's `VerifyRecords` logic cannot rely upon `Struct.equals` for comparison, and instead needs custom logic.
2016-06-07 17:53:07 -05:00
Randall Hauch
4f02efc788 Merge pull request #55 from rhauch/dbz-37
DBZ-37 Added support for MySQL GTIDs
2016-06-07 12:47:43 -05:00
Randall Hauch
f48d48e114 DBZ-37 Added integration test with MySQL GTIDs
Added a Maven profile to the MySQL connector component with a Docker image that runs MySQL with GTIDs enabled. The same integration tests can be run with it using `-Pgtid-mysql` or `-Dgtid-mysql` in the Maven build.

When the MySQL connector starts up, it now queries the MySQL server to detect whether GTIDs are enabled, and if they are it will also verify that any GTID sets from the most recently recorded offset are still available in the MySQL server (similarly to how it was already doing this for binlog filenames). If the server does not have the correct coordinates/GTIDs, the connector fails with a useful error message.

This commit also tests and adjusts the `GtidSet` class to better deal with comparisons of GTID sets for proper ordering.

It also changes the connector to output MySQL's timestamp for each event using _second_ precision rather than artificially in _millisecond_ precision. To clarify the different, this change renames the field in the event's `source` structure that records the MySQL timestamp from `ts` to `ts_sec`. Similarly, the envelope's field that records the time that the connector processed each record was renamed from `ts` to `ts_ms`.

All unit and integration tests pass with the default profile and with the new GTID-enabled profile.
2016-06-07 12:01:51 -05:00
Randall Hauch
a276d983f5 DBZ-37 Changed several constants related to MySQL offsets. This does not affect the offsets themselves. 2016-06-04 16:32:26 -05:00
Randall Hauch
e91aac5b18 DBZ-37 DatabaseHistory can now use custom logic to compare offsets
DatabaseHistory stores the DDL changes with the offset describing the position in the source where those DDL statements were found. When a connector restarts at a specific offset (supplied by Kafka Connect), connectors such as the MySQL connector reconstruct the database schemas by having DatabaseHistory load the history starting from the beginning and stopping at (or just before) the connector's starting offset. This change allows connectors to supply a custom comparison function.

To support GTIDs, the MySQL connector needed to store additional information in the offsets. This means the logic needed to compare offsets with and without GTIDs is non-trivial and unique to the MySQL connector. This commit adds a custom comparison function for offsets.

Per [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-failover.html), slaves are always expected to start with the same set of GTIDs as the master, so no matter which the MySQL connector follows it should always have the complete set of GTIDs seen by that server. Therefore:

* Two offsets with GTID sets can be compared using only the GTID sets.
* Any offset with a GTID set is always assumed to be newer than an offset without, since it is assumed once GTIDs are enabled they will remain enabled. (Otherwise, the connector likely needs to be restarted with a snapshot and tied to a specific master or slave with no failover.)
* Two offsets without GTIDs are compared using the binlog coordinates (filename, position, and row number).
* An offsets that is identical to another except for being in snapshot mode is considered earlier than without the snapshot. This is because snapshot mode begins by recording the position of the snapshot, and once complete the offset is recorded without the snapshot flag.
2016-06-04 16:20:26 -05:00
Randall Hauch
8fd89dacbf DBZ-37 Corrected JavaDoc 2016-06-02 19:06:13 -05:00
Randall Hauch
655aac7d4f DBZ-37 Added support for MySQL GTIDs
The BinlogClient library our MySQL connector uses already has support for GTIDs. This change makes use of that and adds the GTIDs from the server to the offsets created by the connector and used upon restarts.
2016-06-02 18:30:26 -05:00
Randall Hauch
096ea24000 DBZ-37 Upgraded the BinlogClient library from 0.2.4 to 0.3.1, which is the latest 2016-06-02 17:08:46 -05:00
Randall Hauch
40663cb595 Merge pull request #54 from rhauch/dbz-64
DBZ-64 Added Avro Converter to record verification utilities
2016-06-02 17:00:11 -05:00
Randall Hauch
264a9041df DBZ-64 Added Avro Converter to record verification utilities
The `VerifyRecord` utility class has methods that will verify a `SourceRecord`, and is used in many of our integration tests to check whether records are constructed in a valid manner. The utility already checks whether the records can be serialized and deserialized using the JSON converter (provided with Kafka Connect); this change also checks with the Avro Converter (which produces much smaller records and is more suitable for production).

Note that version 3.0.0 of the Confluent Avro Converter is required; version 2.1.0-alpha1 could not properly handle complex Schema objects with optional fields (see https://github.com/confluentinc/schema-registry/pull/280).

Also, the names of the Kafka Connect schemas used in MySQL source records has changed.

# The record's envelope Schema used to be "<serverName>.<database>.<table>" but is now "<serverName>.<database>.<table>.Envelope".
# The Schema for record keys used to be named "<database>.<table>/pk", but the '/' character is not valid within a Avro name, and has been changed to "<serverName>.<database>.<table>.Key".
# The Schema for record values used to be named "<database>.<table>", but to better fit with the other Schema names it has been changed to "<serverName>.<database>.<table>.Value".

Thus, all of the Schemas for a single database table have the same Avro namespace "<serverName>.<database>.<table>" (or "<topicName>") with Avro schema names of "Envelope", "Key", and "Value".

All unit and integration tests pass.
2016-06-02 16:54:21 -05:00
Randall Hauch
a25d380214 Merge pull request #53 from rhauch/dbz-58
DBZ-58 Added MDC logging contexts to connector
2016-06-02 14:52:41 -05:00
Randall Hauch
46c0ce9882 DBZ-58 Added MDC logging contexts to connector
Changed the MySQL connector to make use of MDC logging contexts, which allow thread-specific parameters that can be written out on every log line by simply changing the logging configuration (e.g., Log4J configuration file).

We adopt a convention for all Debezium connectors with the following MDC properties:

* `dbz.connectorType` - the type of connector, which would be a single well-known value for each connector (e.g., "MySQL" for the MySQL connector)
* `dbz.connectorName` - the name of the connector, which for the MySQL connector is simply the value of the `server.name` property (e.g., the logical name for the MySQL server/cluster). Unfortunately, Kafka Connect does not give us its name for the connector.
* `dbz.connectorContext` - the name of the thread, which is "main" for thread running the connector; the MySQL connector uses "snapshot" for the thread started by the snapshot reader, and "binlog" for the thread started by the binlog reader.

Different logging frameworks have their own way of using MDC properties. In a Log4J configuration, for example, simply use `%X{name}` in the logger's layout, where "name" is one of the properties listed above (or another MDC property).
2016-06-02 14:05:06 -05:00
Randall Hauch
1eb1ccfa9d Merge pull request #52 from rhauch/dbz-31
DBZ-31 MySQL connector can now start with a consistent snapshot
2016-06-02 11:14:52 -05:00
Randall Hauch
aca863c225 DBZ-31 Write MySQL schema changes to topic by default 2016-06-02 11:00:04 -05:00
Randall Hauch
58a5d8c033 DBZ-31 Added support for possibly performing snapshot upon startup
Refactored the MySQL connector to break out the logic of reading the binlog into a separate class, added a similar class to read a full snapshot, and then updated the MySQL connector task class to use both. Added several test cases and updated the existing tests.
2016-06-01 21:40:53 -05:00
Randall Hauch
e6c0ff5e4d DBZ-31 Refactored the MySQL Connector
Several of the MySQL connector classes were fairly large and complicated, and to prepare for upcoming changes/enhancements these larger classes were refactored to pull out units of functionality. Currently all unit tests pass with these changes, with additional unit tests for these new components.
2016-05-26 15:58:58 -05:00
Randall Hauch
24e99fb28f DBZ-31 DDL parser now supports '#' as comment line prefix 2016-05-26 15:40:50 -05:00
Randall Hauch
048a8839ad Merge pull request #51 from DataPipelineInc/FixAvroSchemaParseExceptionIllegalcharacterInServerId
DBZ-63 Fix POM dependency management.
2016-05-25 09:51:05 -05:00
David Chen
339f03859c DBZ-63 Fix POM dependency management.
Thanks for the reminding from https://issues.jboss.org/browse/DBZ-63\?focusedCommentId\=13242595\&page\=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel\#comment-13242595
2016-05-25 15:21:45 +01:00
Randall Hauch
c1a098552a Merge pull request #50 from DataPipelineInc/FixAvroSchemaParseExceptionIllegalcharacterInServerId
DBZ-63 Rename "server-id" to "server_id" to fix org.apache.avro.SchemaParseE…
2016-05-25 08:44:05 -05:00
David Chen
b1a71318df DBZ-63 Rename "server-id" to "server_id" to fix org.apache.avro.SchemaParseException: Illegal character in: server-id 2016-05-25 14:33:20 +01:00
Randall Hauch
57e6c73a7a Merge pull request #49 from rhauch/dbz-55
DBZ-55 Corrected filtering of DDL statements based upon affected database
2016-05-23 19:45:38 -05:00
Randall Hauch
dc5a379764 DBZ-55 Corrected filtering of DDL statements based upon affected database
Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. However, that database name is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database _affected_ by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database).

With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements is still maintained, but since each DDL statement can apply to a separate database the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting "db1" will get batched together into one schema change record, followed by one statement affecting "db2" as a second schema change record, followed by another statement affecting "db1" as a third schema record.

Meanwhile, this change does not affect how the database history records the changes: it still records them as submitted using a single record for each separate binlog event/position. This is much safer as each binlog event (with specific position) is written atomically to the history stream. Also, since the database history stream is what the connector uses upon recovery, the database history records are now written _after_ any schema change records to ensure that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).
2016-05-23 11:01:27 -05:00
Randall Hauch
4840650c41 Merge pull request #48 from rhauch/dbz-45
DBZ-45 Confirmed and tested support for 'before' and 'after' states in UPDATE events
2016-05-20 12:19:35 -05:00
Randall Hauch
bb40875b2b DBZ-45 Confirmed and tested support for 'before' and 'after' states in UPDATE events
Added integration test logic to verify that UPDATE events include both 'before' and 'after' states (previously added as part of DBZ-52), to verify that altering a table does not generate events for the rows in that table, and that the 'before' and 'after' states (read from the binlog) are always defined in terms of the _current_ table schema. IOW, no special logic is needed to handle a 'before' state that has different columns than defined in the current table's definition.
2016-05-20 12:06:06 -05:00