Commit Graph

529 Commits

Author SHA1 Message Date
Gunnar Morling
7a7d43c237 DBZ-516 Returning control also from BlockingReader 2018-01-18 14:13:58 +01:00
Gunnar Morling
0c4190c493 DBZ-516 Using Duration instead of long in a few more places 2018-01-18 14:13:58 +01:00
Jiri Pechanec
24bdcaf059 DBZ-516 Return control to Connect periodically 2018-01-18 14:13:58 +01:00
Gunnar Morling
1d967a6d47 DBZ-392 Propagating value of tableIdCaseInsensitive 2018-01-17 11:21:02 +01:00
Gunnar Morling
4d1d016acb DBZ-392 Making table map and set dedicated classes just with the needed API 2018-01-17 11:21:02 +01:00
Jiri Pechanec
9f0f0fe5d4 DBZ-392 Support for MySQL running on lowercase filesystems 2018-01-17 11:21:02 +01:00
Jiri Pechanec
446b102ad9 DBZ-534 Using correct datattype in list lookup 2018-01-12 12:30:44 +01:00
Jiri Pechanec
0b269a6e41 DBZ-538 Improve invalid DDL statement reporting 2018-01-12 12:26:30 +01:00
Jiri Pechanec
374999ee45 DBZ-411 ROLLBACK TO savepoint is parseable 2018-01-11 09:19:11 +01:00
Jiri Pechanec
3d7cc6a7e3 DBZ-530 Multiple PARTITION defintions supported 2018-01-08 11:58:09 +01:00
Jiri Pechanec
42991d260f DBZ-524 RESTRICT in DROP COLUMN 2018-01-02 09:26:07 +01:00
Peter Goransson
5f3902d240 DBZ-522 Adding null check when encounting null TIME fields during snapshot 2018-01-02 07:00:03 +01:00
Jenkins user
6bb34b42f9 [maven-release-plugin] prepare for next development iteration 2017-12-20 07:15:12 +00:00
Jenkins user
16dcd4c980 [maven-release-plugin] prepare release v0.7.1 2017-12-20 07:15:12 +00:00
Jenkins user
5e09932cb9 [maven-release-plugin] prepare for next development iteration 2017-12-15 05:10:23 +00:00
Jenkins user
6c1d61e03b [maven-release-plugin] prepare release v0.7.0 2017-12-15 05:10:23 +00:00
Jiri Pechanec
196f6b3571 DBZ-406 Fixing test, adding warning for disabled rollback 2017-12-13 14:12:27 +01:00
Gunnar Morling
06e72308d9 DBZ-406 Misc improvements;
* Making isReplayingEventsBeyondBufferCapacity() a "query method" only
* Using constant for buffer default size
* Misc. typo and doc fixes
* Removing unused variable
2017-12-13 14:08:11 +01:00
Jiri Pechanec
22dca4e498 DBZ-406 Buffering is disabled by default 2017-12-13 14:08:11 +01:00
Jiri Pechanec
5a5a85d920 DBZ-406 Support for look-ahead buffer with unlimited tx size
Also incorporates DBZ-405
2017-12-13 14:08:11 +01:00
Gunnar Morling
c88a521a52 DBZ-349 Avoiding need for map with overrides in config context 2017-12-13 12:34:30 +01:00
Andras Istvan Nagy
cc7459f4fb DBZ-349 removed jackson databind dependency originally introduced for DBZ-349 2017-12-13 12:34:30 +01:00
Andras Istvan Nagy
05a232d435 DBZ-349 refactoring to use a comma-separated list of tables in the snapshot.select.statement.overrides property and a separate config property for each override 2017-12-13 12:34:30 +01:00
Attila Szucs
c0d1c256ad DBZ-349 fix integration test 2017-12-13 12:34:30 +01:00
Andras Istvan Nagy
631c518d8e DBZ-349 Better support for large append-only tables by making the snapshotting process restartable 2017-12-13 12:34:30 +01:00
Jiri Pechanec
68f0a96a08 DBZ-461 Changed default bigint conversion to long 2017-12-02 21:04:29 +01:00
Gunnar Morling
db61ad2980 DBZ-491 Asserting offsets as per the given time and timezone (instead of now) 2017-12-01 10:55:43 +01:00
Gunnar Morling
79331608ae DBZ-342 Adding regression test for existing "adaptive" time handling mode 2017-11-30 09:07:30 +01:00
rkerner
c7ac481c43 [DBZ-342] fix broken MySQL data type "TIME" handling 2017-11-29 20:34:12 +01:00
Peter Goransson
f4d6af9429 DBZ-491 Normalizing date to UTC before asserting 2017-11-29 08:56:13 +01:00
Gunnar Morling
08310f9ea5 DBZ-483 Removing two unused methods from MySqlSchema 2017-11-24 17:04:01 +01:00
Jiri Pechanec
20a2cdfdea DBZ-476 Doubled quotes are parsed as escaped 2017-11-23 14:51:51 +01:00
Gunnar Morling
cc2afb3419 DBZ-473 Preventing unbounded polling in case connector is stopped during snapshotting 2017-11-21 11:24:05 +01:00
Jiri Pechanec
a1e0553b4d DBZ-475 Support for RENAME USER statement 2017-11-21 08:43:44 +01:00
Gunnar Morling
fe9222a633 DBZ-474 Adding Attila Szucs to COPYRIGHT.txt 2017-11-20 17:39:51 +01:00
Attila Szucs
07da3c8718 DBZ-474 Fix failure when COLLATE comes after NOT NULL 2017-11-20 17:39:47 +01:00
Gunnar Morling
6f9086d608 DBZ-217 Typo fix 2017-11-14 18:01:26 +01:00
Gunnar Morling
951abb5c62 DBZ-217 Making behavior for dealing with event deserialization failures configurable 2017-11-14 18:01:26 +01:00
Gunnar Morling
f705ae4ac6 DBZ-217 Handling exceptions ocurring during binlog event deserialization;
* Catching the exception and wrapping it into a pseudo INCIDENT event
* Logging the exception and the current SourceInfo state
2017-11-14 18:01:26 +01:00
Jiri Pechanec
55a252521f DBZ-447 Remove double initialization 2017-11-14 15:12:04 +01:00
Gunnar Morling
79a2e4a6c2 DBZ-396 Formatting and typo fix;
Also adding Peter to COPYRIGHT.txt
2017-11-14 11:37:28 +01:00
Peter Goransson
ba7478ffd7 DBZ-396 Adding BlockingReader to avoid high CPU utilization in
"snapshot_only" mode;

No binlog readers will be set up in this mode, causing the idle snapshot
reader to be polled with high frequency after it has done its work.
Therefore a blocking reader is added in that case, which prevents the
polling.
2017-11-14 11:32:27 +01:00
Jiri Pechanec
28258ece3f DBZ-260 Timestamp correctly handles timezones 2017-11-14 10:38:32 +01:00
Gunnar Morling
c4a29f6fb7 DBZ-437 Some clean-up 2017-11-14 09:40:54 +01:00
Jiri Pechanec
4d253d2987 DBZ-437 String tokens and SQL words are handled separately in procedure parsing 2017-11-14 09:40:54 +01:00
Jiri Pechanec
1611159a85 DBZ-415 Can parse embedded IFs, can parse IF with ( as first char 2017-11-14 09:00:58 +01:00
Jiri Pechanec
6db4e9b599 DBZ-425 Fix ALTER column for changing default value 2017-11-13 22:51:59 +01:00
Gunnar Morling
2744f54e9d DBZ-438 Renaming BufferedBlockingConsumer#flush() to close() to make clear it's onyl meant to be called once at the end 2017-11-13 14:19:39 +01:00
Jiri Pechanec
c905c7a558 DBZ-439 Support for PRECISION keyword 2017-11-13 13:58:53 +01:00
Gunnar Morling
5fbe742be8 DBZ-285 Specifying scope of dependencies in the individual POMs for the sake of comprehensibility 2017-11-10 16:48:32 +01:00
Jiri Pechanec
039566e717 DBZ-401 Log binlog reader position in case of exception 2017-11-03 09:12:09 +01:00
Jiri Pechanec
c1370556ff DBZ-428 TEXT datatype can have an optional length 2017-11-03 08:37:00 +01:00
Jiri Pechanec
e841cbd609 DBZ-429 Fixed signed default value handling 2017-11-03 06:13:05 +01:00
Jiri Pechanec
1fe753717f DBZ-412 Definition for column named column can be changed 2017-11-01 16:32:44 +01:00
Jiri Pechanec
decf822c65 DBZ-419 PK constraint name is optional 2017-10-30 17:31:08 +01:00
Gunnar Morling
ce16ef5eb3 DBZ-395 Reducing log level from info to debug 2017-10-26 11:31:55 +02:00
Jiri Pechanec
f93e1e9bcd DBZ-395 Make temp table regex stricter 2017-10-26 11:11:22 +02:00
Jiri Pechanec
1c6e652c71 DBZ-395 Filter out DROP TEMPORARY TABLE statements 2017-10-26 11:11:22 +02:00
Gunnar Morling
f86feaf9bd DBZ-408 Allowing for columns named "column" (escaped) in ALTER TABLE ADD/DROP... without COLUMN word 2017-10-26 10:20:25 +02:00
Gunnar Morling
1d7f41af26 DBZ-408 Consuming COLUMN token only for ALTER TABLE statements, so to support columns named "column" (escaped) in CREATE TABLE statements 2017-10-26 10:20:25 +02:00
Gunnar Morling
a38c4254f1 DBZ-402 Removing redundant call to initialize() 2017-10-19 21:43:35 +02:00
Randall Hauch
c71f6d2752 DBZ-402 Reader now has ability to initialize resource prior to starting
Since some Reader implementations are combinations of other Readers, it is sometimes necessary for Reader implementations to initialize resources up front and not wait until they are called. Thus, a new `initialize()` method is added to Reader that will, for the combination readers, initialize all of the readers right away.

This makes it easier to correctly register JMX metrics, for example, up front once all of the readers have been configured for use, rather than to do so upon construction when it is not clear that a particular Reader will be used after it is constructed.
2017-10-19 21:43:35 +02:00
Gunnar Morling
6a8e08db5d DBZ-363 Indentation fix; adding Ben Williams to copyright.txt 2017-10-18 10:31:00 +02:00
Ben Williams
a3b4fedd5f DBZ-363 Add support for BIGINT UNSIGNED handling for MySQL 2017-10-18 10:20:03 +02:00
Jiri Pechanec
875c3ac87a DBZ-336 Test for PAGE_CHECKSUM 2017-10-13 19:08:27 +02:00
Moira Tagle
4bcec0587a Fix inaccurate comment on MySQLSchema.tables method 2017-10-03 05:23:53 +02:00
Gunnar Morling
9958faee7b DBZ-378 Preventing NPE in MySqlConnectorTask#close() 2017-09-26 12:52:25 +02:00
Jenkins user
75937711fa [maven-release-plugin] prepare for next development iteration 2017-09-21 04:42:02 +00:00
Jenkins user
a89b9332e4 [maven-release-plugin] prepare release v0.6.0 2017-09-21 04:42:02 +00:00
LiuHanlin
4543c001b1 DBZ-359 Add support for dec and fixed for mysql ddl parser.
According to Mysql document, https://dev.mysql.com/doc/refman/5.7/en/numeric-types.html
keywords DEC and FIXED are synonyms for DECIMAL.
2017-09-19 09:12:09 +02:00
Gunnar Morling
99f39038bb DBZ-347 Formatting 2017-09-18 17:04:14 +02:00
Jiri Pechanec
38f8364647 DBZ-347 Simplified exception rethrowing 2017-09-18 17:02:21 +02:00
Jiri Pechanec
61ccefcc5c DBZ-347 Configurable behaviour for unknown DDL, failing by default 2017-09-18 17:02:21 +02:00
Jiri Pechanec
1c5326e731 DBZ-304 MySQL test conversion to dynamic database creation 2017-09-18 16:23:55 +02:00
Jiri Pechanec
7b9c29efe6 DBZ-210 Lock tables before starting tx in snapshot 2017-09-15 12:20:35 +02:00
Jiri Pechanec
81edc3d7ee DBZ-333 Filter database.history props out of JDBC URL 2017-09-15 11:45:53 +02:00
Jiri Pechanec
16671152f9 DBZ-346 Correctly parse MySQL merge tables 2017-09-15 10:03:44 +02:00
Jiri Pechanec
6fad4c62aa Added log statements for error events 2017-08-31 06:37:33 +02:00
Jiri Pechanec
8cb681f106 DBZ-219 Stop BinlogReaderClient background jobs 2017-08-29 14:55:04 +02:00
Jenkins user
214696ef0c [maven-release-plugin] prepare for next development iteration 2017-08-17 11:51:05 +00:00
Jenkins user
c867e6fea6 [maven-release-plugin] prepare release v0.5.2 2017-08-17 11:51:05 +00:00
Satyajit Vegesna
b4a0ac0c35 DBZ-324 Support for PAGE_CHECKSUM=1 2017-08-17 08:56:53 +02:00
Gunnar Morling
4d90fb5154 DBZ-284 Removing some trailing whitespace 2017-06-13 17:10:29 +02:00
Omar Al-Safi
107a3a8b56 DBZ-284 Fix NPE when inserting a NULL in a POINT column 2017-06-13 15:33:37 +02:00
Gunnar Morling
a8d1817c22 [maven-release-plugin] prepare for next development iteration 2017-06-09 16:14:31 +00:00
Gunnar Morling
3f512aace7 [maven-release-plugin] prepare release v0.5.1 2017-06-09 16:14:31 +00:00
Gunnar Morling
46bc2ab38b DBZ-228 Avoiding some auto-boxing; extracting some additions to initialization time 2017-06-09 13:53:45 +02:00
Omar Al-Safi
730767f6c5 DBZ-228 Fix formatting issue 2017-06-09 13:53:45 +02:00
Omar Al-Safi
451bd95a75 DBZ-228 Add UNSIGNED BIGINT support 2017-06-09 13:53:45 +02:00
Omar Al-Safi
5d86ae592f DBZ-228 Fix MySQL connector unsigned integer types 2017-06-09 11:52:14 +02:00
Randall Hauch
9581140c36 DBZ-216 MySQL connector should ignore DELETE FROM statements
Parse and ignore any `DELETE` statements that might be seen in the binlog.

Theoretically, the binlog of a properly-configured MySQL server with row-level binlog enabled should never see these statements. However, users on Amazon RDS run into this quite frequently, and we should just handle and ignore them.
2017-06-01 20:04:45 +02:00
Gunnar Morling
68fc0966a2 Fixing wrong display name 2017-06-01 14:52:33 +02:00
Gunnar Morling
2a22f3d4b0 DBZ-254 Right-padding values for fixed-length BINARY columns with 0x00 (zero byte) characters for MySQL;
Also fixing JDBC types for binary data types for MySQL.
2017-05-30 15:04:38 +02:00
Gunnar Morling
559fb1f600 DBZ-262 Passing enum constants to configuration in one more test 2017-05-30 11:18:56 +02:00
Gunnar Morling
f17ced3b4a DBZ-262 Letting configuration enum types implement EnumeratedValue 2017-05-23 21:26:22 +02:00
Gunnar Morling
c6674f8a4a DBZ-253 Adding integration test 2017-05-23 21:26:01 +02:00
Gunnar Morling
1903b7dfe2 DBZ-253 Avoiding exception when receiving DDL events with table maintenance statements 2017-05-23 21:26:01 +02:00
Gunnar Morling
40cb3e2530 DBZ-215 Validating configuration to prevent that database history topic and schema change topic collide 2017-05-22 08:21:22 +02:00
Randall Hauch
02e655fa53 DBZ-198 Add another DDL parser test case
Added an additional test that is unable to reproduce the problem reported on April 4.
2017-05-19 15:50:03 +02:00
Gunnar Morling
03ca03c90c DBZ-232 Removing unused constant 2017-05-19 09:28:56 +02:00
Randall Hauch
787959c4d0 DBZ-232 Removed the database and table recommenders
It’s not clear how valuable these recommenders actually are. First, it’s not clear about the expected semantics: can the user use values that don’t appear in the recommended values? Second, the recommenders that return large numbers of values can be slow and can result in very large REST API responses.

Debezium was using recommenders to return the database and table/collection names, but these lists can be very large for large databases. Rather than cap the number of recommended values and have the recommender return a subset of all potential values, we will instead remove the recommenders altogether.
2017-05-19 09:24:07 +02:00
Randall Hauch
8f69836c9e DBZ-232 Changed the database and table recommenders to return no more than 100 values 2017-05-19 09:24:07 +02:00
Gunnar Morling
37529f27b1 DBZ-243 Removing superfluous default value implementation for server name as the filed is mandatory 2017-05-19 09:14:53 +02:00
Randall Hauch
c10c1204f3 Merge pull request #230 from jpechane/retries
DBZ-251 Prepopulate readbinlog_test database
2017-05-18 16:58:53 -05:00
Randall Hauch
5fb878137e Merge pull request #223 from gunnarmorling/DBZ-243
DBZ-243 Don't mention default value for "database.server.name"
2017-05-18 12:09:39 -05:00
Jiri Pechanec
e79268ec90 Prepopulate readbinlog_test database 2017-05-12 07:20:53 +02:00
Randall Hauch
afef753d36 DBZ-242 Corrected MySQL filters to handle built-in tables
When the `table.ignore.built` is set to `false`, the MySQL connector included all of the ~250 system tables yet the database and table filters were never applied to these system tables and therefore all would be captured. With this change, the database filters and table filters now apply to the system tables should they not be ignored.

Note that this does change the behavior of the connector when `table.ignore.built` is set to `false`. However, it is unlikely that anyone is seriously capturing all of the system table changes, so correcting the behavior is the preferred solution.
2017-05-11 11:57:08 +02:00
Gunnar Morling
3f55398bab DBZ-243 Don't mention default value for "database.server.name" for MySQL connector 2017-05-08 20:24:28 +02:00
Gunnar Morling
baa8119df8 Typo fix 2017-05-08 17:40:19 +02:00
Gunnar Morling
1074de4efa DBZ-222 Some more clean-up 2017-05-04 09:25:25 +02:00
Gunnar Morling
6eea4c9717 DBZ-222 Some typo fixes 2017-05-04 08:53:51 +02:00
Omar Al-Safi
3d92011277 DBZ-222 Typo cleanup and removed unnecessary assertion 2017-05-04 08:53:34 +02:00
Gunnar Morling
5630b61be6 DBZ-222 dependency clean-up 2017-05-04 08:53:05 +02:00
Omar Al-Safi
2a17748a95 DBZ-222 Readded readbinlog_test.product and readbinlog_test.purchased to MySqlConnectorIT 2017-05-04 08:53:05 +02:00
Omar Al-Safi
791545c5f4 DBZ-222 Added support for MySQL POINT type 2017-05-04 08:53:05 +02:00
Jiri Pechanec
51a1c7cd69 DBZ-229 - check all privilege records 2017-04-26 15:25:59 +02:00
Randall Hauch
709cd8f3fe [maven-release-plugin] prepare for next development iteration 2017-03-27 11:28:12 -05:00
Randall Hauch
2bc3d45954 [maven-release-plugin] prepare release v0.5.0 2017-03-27 11:28:11 -05:00
Randall Hauch
25adc3f642 Merge pull request #209 from rhauch/dbz-205
DBZ-205 Corrected MySQL connector to handle 2-digit years
2017-03-27 11:10:04 -05:00
Randall Hauch
81f62b6961 DBZ-205 Corrected MySQL connector to handle 2-digit years
MySQL has special handling of 2-digit years that it deems are ambiguous, such as the year value `17` that is actually treated as `2017`. Apparently the 2-digit values are stored in MySQL and the interpretation is performed when the data is extracted, so therefore the connector needs to also perform this adjustment of the year values. This commit uses the JDK’s `TemporalAdjuster` interface and passes this down to the requisite temporal-related datatype handling code. The MySQL connector then provides its own `TemporalAdjuster` implementation that adjusts the year values via the excellend JDK `Temporal` methods.

A row in one of the MySQL test databases was changed to use a 2-digit year of `16` while the test method still checks that the year is still 2016`, verifying that the year value is properly adjusted.
2017-03-27 10:58:21 -05:00
Randall Hauch
74fbb4a140 DBZ-198 Corrected MySQL DDL parser
The parser now handles `BEGIN…END` blocks better by properly handling `IF()` functions that are not `IF…THEN…END IF` control blocks, and `CASE … WHEN … END CASE` control blocks.
2017-03-24 13:12:46 -05:00
Sanjay Kr Singh
ac4575a13c update mysql-dbz-198.ddl with 3 new test cases for PROCEDURE and 2 test cases for FUNCTION
From L49 - one DDL test case for PROCEDURE ,
From L278 - one DDL test case for FUNCTION ,
From L433 - one more DDL test case for PROCEDURE with CURSOR,
From L713 -  ONE more DDL TEST FOR PROCEDURE,
From L755 - ONE more DDL TEST CASE FOR FUNCTION
2017-03-24 12:29:17 +05:30
Randall Hauch
430d756062 [maven-release-plugin] prepare for next development iteration 2017-03-17 15:41:58 -05:00
Randall Hauch
536cbf6300 [maven-release-plugin] prepare release v0.4.1 2017-03-17 15:41:57 -05:00
Randall Hauch
2086b779b1 DBZ-204 Added test case but unable replicate error
Added a test case that uses the MySQL DDL parser to parse similar DDL statements to those reported in the issue, but these are properly handled with the current state of the `master` branch.
2017-03-17 14:15:25 -05:00
Randall Hauch
e5ee3847dd DBZ-195 Added tests to try to replicate a reported issue
Added a table and inserted rows that tries to replicate the problem reported in DBZ-195, but the test was unable to replicate the problem. In fact, this really is no different than existing tests. Changed the log messages so that if/when this happens again it will be possible to know which row was problematic.
2017-03-17 10:47:32 -05:00
Randall Hauch
ddbc1e07aa DBZ-197 Corrected MySQL connector to handle invalid enum values
MySQL represents an invalid enum literal in the binlog events as an empty string or an value of `0`. Now, when the connector comes across such a value in the binlog, it will instead use an empty string for the enum literal.
2017-03-16 13:55:24 -05:00
Randall Hauch
cf5391482a DBZ-198 Improved MySQL DDL parser to better handle blocks
The MySQL parser now properly handles control blocks such as `BEGIN…END`, `IF…END IF`, `REPEAT…END REPEAT`, and `LOOP…END LOOP`, even in cases where the block is preceded by and terminated by a label.
2017-03-16 13:32:21 -05:00
Randall Hauch
06d9bcb092 Merge pull request #199 from dasl-/daylightsavings
DBZ-202 Fix daylight savings test issues
2017-03-15 15:41:41 -05:00
dleibovic
782ccc66e5 fix daylight savings test issues 2017-03-15 15:44:03 -04:00
rich
06a25f8fd9 modify the query used to fetch the table list so that it includes tables only(no views) 2017-03-15 13:56:52 -04:00
Randall Hauch
b48ccce4b5 DBZ-200 Corrected MySQL DDL parser to better handle column definitions
Apparently not all reserved words must be quoted when using them as colum names, so refactored MySQL’s DDL parser to better handle a variety of unquoted colum names that are reserved words.
2017-03-08 12:12:27 -06:00
Josh Stanfield
c794416684 update to allow savepoint in mysql replication stream 2017-03-08 09:28:10 -07:00
dleibovic
9fd2afc8b4 Increase the default mysql init timeout to 60 seconds for slower computers. Also paramterize it so that users can pass a custom value via 'mvn clean install -Dmysql.init.timeout=80000' for example 2017-02-23 13:34:14 -05:00
rich
9aa49736c8 DBZ-140 when locking individual tables, use a single statement with all the table names instead of issuing a statement per table which causes a MySQL error 2017-02-16 15:45:29 -05:00
Randall Hauch
043a2d2d92 DBZ-194 Improved MySQL connector’s built-in table filtering
The MySQL connector’s built-in table filter now just filters out all tables within the known built-in databases, and does not check the names of the tables. Thus, the connector should no longer filter out tables in other databases that happen to have the same names as the tables in the built-in databases.
2017-02-14 09:23:39 -06:00
Randall Hauch
af94fa8759 DBZ-193 MySQL DDL parser handles FULLTEXT index
Corrected the MySQL DDL parser to correctly handle `FULLTEXT` indexes within a `CREATE TABLE` statement. The parser was incorrectly using `canConsume(…)` with a list of options instead of `canConsumeAnyOf(…)`.
2017-02-10 15:49:20 -06:00
Randall Hauch
9a4a177004 DBZ-188 Corrected JavaDoc 2017-02-10 15:39:22 -06:00
dleibovic
aa50bfe71a DBZ-188: Allow a debezium mysql connector to filter production of DML events into kafka by the mysql UUID of the event
With GTIDs enabled, each transaction in the binlog contains a GTID event, which gives us access to the GTID of the transaction. The GTID has the following format: source_id:transaction_id, where source_id is the UUID of the mysql server the transaction was written to.

I propose to allow a debezium instance to be configured with a UUID pattern to check against before producing DML events into Kafka. Debezium would produce a DML event into kafka if and only if the UUID in the event's GTID matches the pattern with which debezium was configured.

The configuration for the UUID patterns will make use of the existing gtid.source.includes and gtid.source.excludes options. The DML event filtering will only be performed if the new option gtid.source.filter.dml.events is true.
2017-02-10 14:14:10 -05:00
Randall Hauch
d2986710a5 DBZ-188 More efficient GTID source filters for MySQL Connector
Changed the GTID source filters in the MySQL connector to be far more efficient when the filters specify literal UUIDs rather than regex patterns. In these cases, the predicate just checks whether a supplied value is in a hash set, and no regular expression patterns are used.

The GTID source filters can still be a combination of UUID literals and regular expressions, and the predicate will use the best implementation for each. For example, if the filters include all UUID literals, then regular expressions will never be used.
2017-02-10 11:34:24 -06:00
Randall Hauch
8c60c29883 [maven-release-plugin] prepare for next development iteration 2017-02-07 14:22:12 -06:00
Randall Hauch
20134286e9 [maven-release-plugin] prepare release v0.4.0 2017-02-07 14:22:11 -06:00
Randall Hauch
403fee1375 DBZ-185 MySQL’s database history now filters GTID sources
Corrects how the MySQL connector reloads database history to take into account the included and excluded GTID sources. This only affects a connector configured to capture changes from _multiple_ MySQL database servers when GTID sources are explicitly excluded or included.
2017-02-07 11:21:22 -06:00
Randall Hauch
bb0800ca3a DBZ-140 Improved locking logic to support RDS
Improved the MySQL connector's logic to better handle Amazon RDS that does not allow giving user `SUPER` privileges. As before, the connector starts a transaction and attempts to get a global read lock via `FLUSH TABLES WITH READ LOCK` to prevent writes to the database so that the binlog position can be accurately read _and_ the table schemas can be read without interference from other clients. Once that is done, the connector releases the global read lock and continues in the same transaction to read all table rows. This means that our snapshot is consistent, but we maintain the global read lock for a very short period of time.

Amazon's RDS and Aurora are hosted MySQL instances that do not allow users to have the `SUPER` privilege, which means the user cannot get a global read lock. In this case, the connector detects this error, continues to read the database and table names (without any lock), and _then_ uses `FLUSH TABLES <tableName> WITH READ LOCK` on each table that satisfies the filters to prevent changes from other clients. The connector then reads the table schemas, reads _all_ table rows, commits the transaction, and _finally_ releases the table locks.

Therefore, there are two very different behaviors/requirements when the user can't obtain a global read lock because of lack of privilege, like on RDS:

# The RDS user that the connector makes use of must also have the `LOCK TABLES` privilege; without it the connector will fail during the snapshot.
# The connector must hold the table read locks _until it has completed reading all of the tables_, since release the table locks using `UNLOCK TABLES` would prematurely commit our transaction and prevent us from getting a consistent snapshot. From the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/flush.html):
> `UNLOCK TABLES` implicitly commits any active transaction only if any tables currently have been locked with `LOCK TABLES`. The commit does not occur for `UNLOCK TABLES` following `FLUSH TABLES WITH READ LOCK` because the latter statement does not acquire table locks.
2017-02-06 13:56:55 -06:00
Randall Hauch
5490842449 Merge pull request #175 from rhauch/dbz-176
DBZ-176 Corrected MySQL DDL parser to support creating triggers with definers
2017-02-02 13:59:01 -06:00
Randall Hauch
74e5ba6448 DBZ-176 Corrected MySQL DDL parser to support creating triggers with definers
The MySQL DDL parser was not correclty handling `DEFINER` clauses within `CREATE TRIGGER` or `CREATE EVENT` statements. Support for `DEFINER` clauses was recently added for the various forms of `CREATE PROCEDURE`, `CREATE FUNCTION` and `CREATE VIEW` statements. These are the only kinds of statements that have the definer attribute, per the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/stored-programs-security.html).
2017-02-02 12:44:28 -06:00
Randall Hauch
32a88fdc6f DBZ-184 Added database and table name to change event metadata 2017-02-02 12:09:53 -06:00
Randall Hauch
6230cab90e Merge pull request #173 from rhauch/dbz-113
DBZ-113 Added MySQL threads to the event’s source metadata
2017-02-02 12:00:19 -06:00
Randall Hauch
fe17b246af DBZ-113 Added MySQL threads to the event’s source metadata
Changed the events’ `source` structure to optionally contain the identifier of the MySQL thread where appropriate. The thread is included on each `BEGIN` binlog event, so these are captured and added to all of the associated change events produced for that transaction.
2017-02-02 11:53:32 -06:00
Randall Hauch
f2a65d03df DBZ-174 Added support for new binlog events
MySQL recently added additional binlog events, and this commit adds support to handle these new events by ignoring them.
2017-02-01 15:26:28 -06:00
Horia Chiorean
031c4a1552 DBZ-183 Fixes the BinlogReader's handling of TIMESTAMP columns to correctly account for timezones 2017-01-25 16:39:36 +02:00
Randall Hauch
a73f85a80f Merge pull request #162 from rareddy/DBZ-177
DBZ-177: Providing an alternative way to create JDBC connection based …
2017-01-13 13:37:38 -06:00
Ramesh Reddy
a9aace3480 DBZ-177: Providing an alternative way to create JDBC connection based on the configured JDBC driver class name and supplied classloader. The loading/creating the JDBC connections is not reliable when driver libraries in a different classloader than the DriverManager. 2017-01-13 12:58:14 -06:00
Horia Chiorean
a300d3e1cf DBZ-3 Changes the configuration of the Docker Maven plugin to only use alias naming when necessary and moves the PG connector ahead of the Mongo connector in the build 2016-12-27 14:44:33 +02:00
Horia Chiorean
23e3f59fa1 DBZ-3 Implements a connector for streaming changes from a Postgres database
The version of the DB server required for this to work is at least 9.4
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
2016-12-27 14:44:32 +02:00
Randall Hauch
e60839e76b DBZ-164 Improved MySQL snapshot reader logic
Added more logic to the snapshot reader to better handle errors when reading the list of table names in each database. Now, any errors with a single database (e.g., some of the not-quite-a-database names described in the JIRA issue) will cause the snapshot reader to simply skip that database name and continue on (with proper logging).

This change also quotes all of the database and table names when used in SQL statements.
2016-12-20 22:03:46 -06:00
Randall Hauch
fd7e152852 Merge pull request #142 from rhauch/dbz-151
DBZ-151 Added new integration test framework
2016-12-20 17:53:16 -06:00
Randall Hauch
ab1140ef70 Merge pull request #155 from rhauch/dbz-169
DBZ-169 MySQL connector support for ON UPDATE clauses
2016-12-20 17:48:06 -06:00
Randall Hauch
fe44380d4c Merge pull request #154 from rhauch/dbz-168
DBZ-168 MySQL connector ignores XA binlog events
2016-12-20 17:47:57 -06:00
Randall Hauch
a9a84cb6aa DBZ-152 Enabled MySQL connector to skip table count checks during snapshot
Change the MySQL connector’s `min.row.count.to.stream.results` configuration property to accept a value of 0, which signifies that all `SELECT COUNT(*) FROM tableA` queries should be skipped and instead all results should be streamed.
2016-12-20 17:40:57 -06:00
Randall Hauch
046702d959 DBZ-169 MySQL connector support for ON UPDATE clauses
Corrected the MySQL DDL parser to support `ON UPDATE NOW()` clauses in addition to `ON UPDATE CURRENT_TIMESTAMP`.
2016-12-20 16:19:18 -06:00
Randall Hauch
09f87cf190 DBZ-168 MySQL connector ignores XA binlog events
MySQL 5.7.7 introduced new behavior for handling XA events in the binlog. See the [MySQL documentation|http://dev.mysql.com/doc/refman/5.7/en/xa-restrictions.html] for details. This PR changes the binlog reader so that `XA …` statements appearing in the binlog are ignored altogether.
2016-12-20 15:32:44 -06:00
Randall Hauch
5dceb05f69 DBZ-151 Additional changes to improve test framework and MySQL integration tests 2016-12-20 10:58:56 -06:00
Randall Hauch
08e32a4a8b DBZ-151 Added multiple integration test modules to test various MySQL versions and configurations.
These new modules run during the '-Passembly' profile and use the new integration test framework that compares all
output produced by a connector to expected results that were previously recorded and verified. These integration test modules
can be run manually with a simple build of those modules or their parent; only the top-level 'integration-tests' module is run
during the assembly profile during builds of the entire codebase.
2016-12-20 09:18:10 -06:00
Randall Hauch
0851d8280c DBZ-166 Corrected shutdown logic of MySQL connector
The MySQL connector uses several threads, so previously upon connector shutdown these threads were simply cancelled. This is fine for the binlog reader (which can stop at any moment), but is a poor approach for the snapshot as we didn’t always properly release the database resources and also didn’t complete the writing of the DDL history.

With this change, the snapshot reader stops in a very controlled manner, basically by having the 10-step snapshot procedure frequently check whether the reader is to continue working, and to completely avoid thread interruption altogether. And, the snapshot procedure will always clean up its database resources (locks, transactions, etc.), even if the procedure is stopped before completion.

This change also refactors how the snapshot and binlog reader are managed. This is no longer done in the MySqlConnectorTask class (which is busy enough), but rather the logic has been encapsulated in a new `ChainedReader` that makes use of a new `Reader` interface. This makes testing of `ChainedReader` easier, and ensure that `ChainedReader` relies only upon the primary methods of `Reader` rather than upon `AbstractReader`. `ChainedReader` handles multiple readers generically, and ensures that when stopped the readers are all handled correctly and completely process all records, yet avoid accidentally starting a subsequent reader(s) when stopping the previous reader.
2016-12-15 10:55:18 -06:00
Randall Hauch
e3e66bf960 DBZ-161 Corrected MySQL connector logic when no GTIDs are used
Corrected the logic of the MySQL connector when getting the server’s GTID set. Previously, this logic failed if GTIDs are not used.
2016-12-08 08:09:52 -06:00
Dennis Persson
acd7bd8fa5 DBZ-142 Handle national character set columns in DDL parser 2016-12-07 07:38:30 +01:00
Randall Hauch
c762a221b7 DBZ-162 Corrected DDL parsing of MySQL functions
The MySQL DDL parser was not properly consuming function declarations. For functions, the parser consumes the entire statement without handline the various expressions within the function declaration, but the parser was not properly finding the end of the statement and instead was continuing to try to consume values beyond the end of the statement.

Specifically, when the parser consumes a `BEGIN`, it looks for a corresponding `END`. However, if it encountered an `END IF`, the `IF` plus any remaining tokens were left on the token stream and unprocessed. This confused the parser, which keep looking for statements and ultimately ended with a `No more content` error.

This case was replicated in integration tests, and the code fixed to properly find the end of the statements.
2016-12-06 17:34:52 -06:00
Randall Hauch
c72242eeb0 Merge pull request #145 from sherafpm/bugfix/DBZ-160
DBZ-160 - Issue while parsing create table script with ENUM type and default value 'b'
2016-12-06 14:21:23 -06:00
Randall Hauch
eedc4fba00 DBZ-163 Corrected assembly profile in build
The Travis-CI builds run the Maven build using the `assembly` profile, and this has been failing quite a bit lately.

The first problem appears to be that the Travis-CI environment recently changed to have port 3306 taken, which means that our build fails to start any Docker containers for MySQL that attempt to use this port. A simple fix is to use different ports for the assembly build.

However, trying to change the port numbers for some of the profiles caused a lot of problems, and to correct these required refactoring how the properties are set. The Docker Maven plugin is now configured with separate properties that are set once (depending upon the profile) to determine the port assignments of the various Docker containers. The Failsafe plugin executions then use these Maven properties when setting the system variables (e.g., `database.host`) needed in the integration tests. This appears to have worked, but it still is a bit fragile. For example, the assembly profile defines several Failsafe executions, and during this profile these should be the only executions run; however, if not all the properties are set properly, the build seems to also run the default Failsafe execution in addition to the other `assembly` profile executions. (I think properties can’t only be defined in the execution, but need to also be defined in the Failsafe configuration.)

The “alternative” MySQL Docker images were removed, since they basically should not provide any different behavior than the `mysql/mysql-server` images we normally used. The extra containers required a lot more resources to run and dramatically increased the complexity of the build.

A few other trivial changes were made.
2016-12-05 16:37:59 -06:00
Randall Hauch
2b2bf693d7 DBZ-163 Changed Travis-CI build to skip the install dependencies step 2016-12-02 15:43:57 -06:00
Sherafudheen PM
ee52219736 DBZ-160 - Issue while parsing create table script with ENUM type and default value 'b' 2016-12-02 17:42:44 +05:30
Randall Hauch
0bf3b4c9f3 DBZ-157 Upgraded Docker Maven plugin
Upgraded the Docker Maven plugin to 0.18.1, which required changing our use of the `docker.image` to `docker.filter` (per the [changes in 0.17.1](https://github.com/fabric8io/docker-maven-plugin/blob/master/doc/changelog.md)).
2016-11-22 09:23:07 -06:00
Randall Hauch
a82ae5691b Reduce the log verbosity of the MySQL tests 2016-11-14 13:41:10 -06:00
Randall Hauch
d80bc1bfd7 DBZ-153 MySQL connector supports enum and set values with parentheses
Changed the MySQL connector to support ENUM and SET literals with parentheses.
2016-11-14 12:22:08 -06:00
Randall Hauch
8a52cda0dc DBZ-150 Changed the order of events when a row's key is changed. 2016-11-09 14:42:43 -06:00
Randall Hauch
b0ded5f383 DBZ-147 Added ability to treat MySQL DECIMAL as double
By default the MySQL connector handles `DECIMAL` and `NUMERIC` columns using `java.math.BigDecimal` values and describing them using the `org.apache.kafka.connect.data.Decimal` schema type, which serializes the values to a binary form.

This change adds a configuration option that will keep the default behavior, but will instead allow handling `DECIMAL` adn `NUMERIC` values as Java `double` and a schema type of `FLOAT64`.
2016-11-09 11:27:09 -06:00
Randall Hauch
ea5f7983c7 DBZ-144 Corrected MySQL connector restart
Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped.

The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart.

This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.
2016-11-09 08:11:41 -06:00
Randall Hauch
0d2acfd0a6 DBZ-149 Corrected type of BINARY column
The MySQL connector (or rather the DDL parser used in the connector) improperly assumed a `CHAR` JDBC type (and Avro schema `STRING` type) for MySQL columns of type `BINARY`. This corrects the error.
2016-11-08 17:41:01 -06:00
Randall Hauch
7656dce985 DBZ-148 Corrected timestamp check in test case to account for DST 2016-11-08 15:37:03 -06:00
Randall Hauch
43d2bf14cf DBZ-143 Minor improvements and correction of JavaDoc. 2016-11-04 09:02:44 -05:00
Randall Hauch
207315e5df DBZ-146 Improved error handling of MySQL Connector
Improved the error handling of the MySQL connector to ensure that we’re always stopping the connector when we have a problem handling a binlog event or if we have problems starting.
2016-11-03 16:55:59 -05:00
Chris Riccomini
c195fc1f4c DBZ-143 Support multi-channel MySQL failover
Make Debezium merge its GTID set with the GTID set on the server that
it's connecting to. This allows Debezium to consume from a MySQL server
that might have a different set of channels (upstream masters), provided
that the server has the data that Debezium needs.
2016-11-03 16:47:42 -05:00
Randall Hauch
f970899a6d DBZ-133 Minor changes to JavaDoc 2016-10-25 11:12:55 -05:00
Prannoy Mittal
fa66abdcc3 DBZ-133 is for enabling schema only snapshot mode.
Snapshot Reader will have a dataInclude flag, which will determine whether initial data in whitelisted database and tables have to
    read or not. In schema only mode, will not read inital data, will capture only database table schema
    Added unit test for validating checks that initial data is not copied
2016-10-25 11:04:26 -05:00
Randall Hauch
094f9a4925 DBZ-139 Corrected binlog timestamp handling
MySQL records the timestamp with second precision in binlog events, but the library we use multiplies by 1000 to return the padded value in milliseconds (even though the value still has second precision). The BinlogReader converts this back to seconds, so the SourceInfo should not also be dividing by 1000.
2016-10-20 09:31:02 -05:00
Randall Hauch
25b8055642 DBZ-134 Enabled JMX metrics for MySQL connector
Added an MXBean for the MySQL connector that captures various metrics while reading the binlog.
2016-10-19 16:48:11 -05:00
Randall Hauch
4a62b09ead DBZ-126 Added support for MySQL JSON type
Adds support for MySQL 5.7's `JSON` type, which is capable of holding JSON objects, JSON arrays, and scalar values. The Debezium MySQL connector represents `JSON` values as string with a `io.debezium.data.Json` semantic type (which is basically a string schema that has a special name to denote the semantics), and the _contents_ of that string will be the JSON representation of the object, array, or scalar value.
2016-10-18 17:32:55 -05:00
Randall Hauch
2f5772712a DBZ-129 Fix for GTID updates
Workaround for https://github.com/shyiko/mysql-binlog-connector-java/issues/122.
2016-10-18 14:32:06 -05:00
Randall Hauch
7387654bfa DBZ-129 Additional improvements for MySQL connector GTID-based startup
Added more integration tests to verify the behavior of the MySQL connector when it is (re)starting using GTIDs.
2016-10-18 14:30:10 -05:00
Randall Hauch
305c4c5ac6 DBZ-129 MySQL connector can now use subset of GTID set when reconnecting to MySQL
When a connector is originally connected to a MySQL server, it will record the GTID set that identifies the position in the binlog. When all of the interesting transactions originate on a different server (i.e., the server we're listening to is a replica), the server we're listening to will still include some transactions in the binlog (e.g., for the information schema, performance, or other internal databases), and so the GTID set will include a GTID range for our server. If we stop the connector and want to point it to a different MySQL server, asking MySQL to position the binlog using the complete GTID set (including the GTID range for our old replica) will cause an error, since the new server does not have any GTID ranges from the old replica. Therefore, the connector needs to be able to exclude some GTID ranges that originated on the original replica, using the `server_uuid` property of the replica server.

This change adds two configuration properties: `gtid.source.includes` and `gtid.source.excludes`. Both are optional, but at most only one of these can be used. These properties contain a comma-separated list of GTID sources (i.e., the `server_uuid` value for the server where the transaction originated) or regular expressions matching GTID sources, and upon connector startup the connector uses the list to filter the previously-recorded GTID set against the available GTID set in the current MySQL server. By including specific GTID sources, an administrator can control the subset of GTID ranges that govern the binlog position.

These properties will not be useful in some topologies, especially when the MySQL server from which the binlog is being read is the originating server for some of the transactions. However, these properties may be very useful in any topology where the connector is _only_ reading from replicas, so that the connector can be switched to another replica at any time. In some cases it may be easier to exclude all of the replicas' `server_uuid` values, while in other cases it may be easier to include all of the `server_uuid` values where transactions can originate.
2016-10-18 14:29:58 -05:00
Horia Chiorean
1a99f5bbc7 DBZ-135 Fixes the parsing of line separators by GtidSet (#118) 2016-10-13 10:18:33 -05:00
Randall Hauch
d955ed2e4b DBZ-132 Cleanup of code (#117)
Additional cleanup of changes made for DBZ-132.
2016-10-11 15:36:07 -05:00
Prannoy Mittal
301d60411f Using debezium String Library to get join to list of strings 2016-10-12 00:53:36 +05:30
Prannoy Mittal
a36700e51b Enum and Set were assumed to single character.
Updated MysqlParser to return list of String for allowed enum and set values
And also added code fix to get a enum value at a particular index and for set option too.
Used debezium string utility to join list of string into deliminator seperated String.
Updating old test cases as per required to handle list of strings.
2016-10-12 00:41:08 +05:30
Willie
fda76c875e DBZ-115 Add support to recognize older row_event formats 2016-10-08 11:42:12 -07:00
Randall Hauch
99a86ad289 Merge pull request #112 from rhauch/dbz-123
DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
2016-10-07 17:16:37 -05:00
Randall Hauch
beb47dd2de DBZ-131 Improved logging while reading binlog
When the MySQL connector is reading the binlog, it outputs INFO log messages reporting status at an exponentially-increasing rate, starting at every 5 seconds and doubling until a max period of 1 hour. This output is useful when the connector starts to know that it is working, but thereafter the usefulness decreases. Once an hour is probably acceptable output.

This is not intended to replace the capturing of metrics, but is merely an aid to easily tell via the logs whether the connector continues to work.

Also improved the log message when the binlog reader stops to capture the total number of events recorded by Kafka Connect and the last recorded offset.
2016-10-07 17:10:01 -05:00
Randall Hauch
50eb4094ac DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
The DDL parser now properly handles bit-set literals, and several minor case-sensitivity bugs dealing with other escaped literals.
2016-10-06 13:25:38 -05:00
Randall Hauch
64bab3b3cf DBZ-104 Added test to verify behavior of CREATE TABLE LIKE expressions with and without snapshot 2016-09-23 12:11:38 -05:00
Randall Hauch
dc03335049 DBZ-128 Additional fix to MySQL compatibility message. 2016-09-23 11:03:40 -05:00
Randall Hauch
7654321cfd DBZ-128 Improved checking of MySQL status and configuration
Added logic to verify that MySQL's row-level binlog is enabled, and whether it is likely that when snapshots are not performed that the binlog is likely to have been purged. Some situations will result in an error, while others are logged as warnings.
2016-09-22 17:06:14 -05:00
Randall Hauch
730603976d Merge pull request #107 from rhauch/dbz-123
DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
2016-09-21 15:22:00 -05:00
Randall Hauch
bcf60940db DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
Corrected how the MySQL connector is treating columns of type `BIT(n)`, where _n_ is the number of bits in the value. When  `n=1`, the resulting values are booleans; when `n>1`, the resulting values are little endian `byte[]` that have the minimum number of bytes to hold the `n` bits.
2016-09-21 15:04:20 -05:00
Randall Hauch
9aae6c62d9 DBZ-124 Eliminated the JMX "already registered" warning in the MySQL connector
The `KafkaDatabaseHistory` was always creating a new producer whenever its `start()` method was called, even if it were called more than once. And, the `MySqlSchema` was calling `start()` twice, resulting in multiple producers being created and registered with JMX. Both issues were fixed.

Also, UUIDs were being used as the name of the JMX MBean for the producer, unless the `database.history.consumer.client.id` and `database.history.producer.client.id` properties were being explicitly set. Now, the MySQL connector will by default set the `client.id` property on both the database history's Kafka consumer and producer to `{connectorName}-dbhistory`. Of course, the `database.history.consumer.client.id` and `database.history.producer.client.id` properties can still be set to define the name of the producer and consumer.
2016-09-21 10:05:15 -05:00
Randall Hauch
54b737edc1 DBZ-114 MySQL connector now handles "zero-value" dates and timestamps
MySQL supports "zero-value" dates and timestamps, but these cannot be represented as valid dates or timestamps using the Java types. For example, the zero-value `0000-00-00` for a date has what Java considers to be an invalid month and day-of-the-month.

This commit changes how the MySQL connector handles these values to not throw exceptions. When columns allow nulls, such values will be treated as nulls; when columns do not allow null values, these values will be converted to a "zero-value" for the corresponding Java representation (e.g., the epoch day or timestamp). A new test case verifies the behaviors.
2016-09-21 09:23:12 -05:00
Akshath
8a1a9c3542 Changed server.id to support Long instead of Int 2016-09-06 15:09:05 -07:00
Randall Hauch
de1edce895 DBZ-116 Improved logging when MySQL connector is reading binlog
The MySQL connector now outputs an INFO log message whenever its task's `poll()` method returns a non-empty list of `SourceRecord` objects, where the message includes the number of records and the offset of the last record.
2016-09-06 11:31:54 -05:00
Randall Hauch
330a27ce52 Merge pull request #97 from rhauch/dbz-102
DBZ-102 MySQL connector support for column charsets
2016-08-29 15:12:24 -05:00
Randall Hauch
cc8f45309a Merge pull request #98 from rhauch/dbz-112
DBZ-112 Corrected the logic of setting the MySQL driver's SSL-related system properties
2016-08-29 15:00:34 -05:00
Randall Hauch
5cef237aac DBZ-111 Corrected GTID set comparison logic of the MySQL connector
The MySQL connector was improperly comparing the GTID set required by the connector to the GTID set of the MySQL instance. In particular, when the GTID set of the MySQL server contained a newline character, the comparison logic failed. (This should have been fixed as part of DBZ-107.)
2016-08-29 14:53:21 -05:00
Randall Hauch
0861518788 DBZ-112 Corrected the logic of setting the MySQL driver's SSL-related system properties 2016-08-29 14:27:43 -05:00
Randall Hauch
a46a427b57 DBZ-102 Added MySQL integration test that verifies character encodings
Added a table with data to one of the MySQL databases used in the integration tests. It verifies that the UTF-8 data stored in the table is able to be handled properly when obtaining a snapshot and reading the binlog.
2016-08-29 13:42:10 -05:00
Randall Hauch
cc94bbc697 DBZ-102 MySQL connector now processes character sets
The MySQL binlog events contain the binary representation of string-like values as encoded per the column's character set. Properly decoding these into Java strings requires capturing the column, table, and database character set when parsing the DDL statements.

Unfortunately, MySQL DDL allows columns (at the time the columns are created or modified) to inherit the default character set for the table, or if that is not defined the default character set for the database, or if that is not defined the character set for the server. So, in addition to modifying the MySQL DDL parser to support capturing the character set name for each column, it also had to be changed to know what these default character set names are.

The default character sets are all available via MySQL server/session/local variables. Although strictly speaking the character set variables cannot be set globally, MySQL DDL does allow session and local variables to be set with `SET` statements. Therefore, this commit enhances the MySQL DDL parser to parse `SET` statements and to track the various global, session, and local variables as seen by the DDL parser. Upon connector startup, a subset of server variables (related to character sets and collations) are read from the database via JDBC and used to initialize the DDL parser via `SET` methods.

In addition to initializing the DDL parser with the system variables related to character sets and collation, it is important to also capture the server and database default character sets in the database history so that the correct character sets are used for columns even when the default character sets have changed on the database and/or the server. Therefore, upon startup or snapshot the MySQL connector records in the database history a `SET` statement for the `character_set_server` and `collation_server` system variables so that, upon a later restart, the history's DDL statements can be re-parsed with the correct default server and database character sets. Also, when the MySQL connector reloads the database history (upon startup), the recorded default server character set is compared with the MySQL instance's current server character set, and if they are different the current character set is recorded with a new `SET` statement.

These extra steps ensure that the connector use the correct character set for each column, even when the connector restarts and reloads the database history captured by a previous version of the connector. IOW, the MySQL connector can be safely upgraded, and the new version will correctly start using the columns' character sets to decode the string-like values.
2016-08-29 12:19:24 -05:00
Randall Hauch
257e81c540 DBZ-102 MySQL in-memory models of tables capture column character sets
The DDL parser and in-memory models of the relational schemas were changed to capture the character set for each column whose type is a string (e.g., `CHAR`, `VARCHAR`, etc.). This required handling `SET` statements used to change the system variables that hold the names of the default character set for the server and for each database. So, even if a column does not explicitly define the character set, the column's actual character set is identified from the table's character set, which might default to the current database's character set, which if not set defaults to the system character set.

These changes merely affect how MySQL DDL is parsed and the in-memory relational schema representation to accommodate the character set at various levels. It does not change the behavior of the MySQL connector; that will be done in a subsequent commit.

All tests pass with these changes, including quite a few additional tests for the new functionality.
2016-08-29 11:50:51 -05:00
Randall Hauch
93d0fae02b DBZ-109 Captured MySQL error code and SQLSTATE code in exceptions
The binlog reader and JDBC operations might throw exceptions with this information, so in these cases the connector now captures the error code and SQLSTATE code from the exception and includes them in the message.
2016-08-25 08:11:50 -05:00
Randall Hauch
638b459484 DBZ-108 Removed the TimeZoneAdapter and test, which is no longer used 2016-08-24 16:31:35 -05:00
Randall Hauch
4de56fd657 Merge pull request #94 from hchiorean/DZB-header-fix
Fixes the DBZ header required by checkstyle
2016-08-24 14:28:43 -05:00
Randall Hauch
ce2b2db80c DBZ-99 Added support for MySQL connector to connect securely to MySQL
Changed the MySQL connector to have several new configuration properties for setting up the SSL key store and trust store (which can be used in place of System or JDK properties) used for MySQL secure connections, and another property to specify what kind of SSL connection be used.

Modified several integration tests to ensure all MySQL connections are made with `useSSL=false`.
2016-08-24 13:27:35 -05:00
Horia Chiorean
2732d26ff0 Fixes the DBZ header required by checkstyle
This commit removes an extra space character from the first blank line of the header
2016-08-24 13:41:15 +03:00
Randall Hauch
40318f87a3 Merge pull request #92 from rhauch/dbz-107
DBZ-107 MySQL Connector should tolerate newlines in GTID sets read during snapshot
2016-08-23 17:45:58 -05:00
Randall Hauch
3051e3b2d7 DBZ-107 MySQL Connector should tolerate newlines in GTID sets read during snapshot 2016-08-23 17:37:48 -05:00
Randall Hauch
448d514c81 DBZ-106 Corrected the MySQL DDL parser to properly handled quoted keywords as column names. 2016-08-23 17:03:53 -05:00
Randall Hauch
e86fb83459 [maven-release-plugin] prepare for next development iteration 2016-08-16 09:56:47 -05:00
Randall Hauch
ccdb0a1a63 [maven-release-plugin] prepare release v0.3.0 2016-08-16 09:56:47 -05:00
Randall Hauch
83b2d2eab6 DBZ-92 Added a few log statements upon MySQL connector task startup 2016-08-15 21:57:09 -05:00
Randall Hauch
b1277ed196 DBZ-94 Corrections to mark the end of the snapshot 2016-08-15 21:56:20 -05:00
Randall Hauch
2fea8c8ee0 DBZ-100 Corrected JavaDoc and renamed methods to be more explicit 2016-08-15 12:37:39 -05:00
Randall Hauch
d8a5d2b50f DBZ-100 Corrected MySQL connector's use of ENUM and SET values
The ENUM and SET values read from the binlog contain the indexes of the options that are included in the value, but this doesn't compared with the string values returned by MySQL and JDBC that contain the comma-separated options. With this change, the values read from the binlog will also be comma-separated strings.
2016-08-15 12:11:35 -05:00
Randall Hauch
629542458e DBZ-91 Added option to force use Kafka Connect temporal types. 2016-08-11 10:48:07 -05:00
Randall Hauch
31641fb43e DBZ-91 Changed how temporal values are treated in MySQL connector
Rewrote how the MySQL connector converts temporal values to use schemas with names that identify the semantic
type of temporal value, and customized how the MySQL binlog client library creates Java object values from the
raw binlog events.

Several new "semantic" schema types were defined:

* `io.debezium.time.Year` represents a year number as an INT32 value (e.g., 2016, -345, etc.).
* `io.debezium.time.Date` represents a date by storing the epoch seconds (that is, the number of seconds past the epoch) as an INT64 value.
* `io.debezium.time.Time` represents a time by storing the milliseconds past midnight as an INT32 value.
* `io.debezium.time.MicroTime` represents a time by storing the microsconds past midnight as an INT32 value.
* `io.debezium.time.NanoTime` represents a time by storing the nanoseconds past midnight as an INT32 value.
* `io.debezium.time.Timestamp` represents a date and time (without timezone information) by storing the milliseconds past epoch as an INT64 value.
* `io.debezium.time.MicroTimestamp` represents a date and time (without timezone information) by storing the microseconds past epoch as an INT64 value.
* `io.debezium.time.NanoTimestamp` represents a date and time (without timezone information) by storing the nanoseconds past epoch as an INT64 value.
* `io.debezium.time.ZonedTime` represents a time with timezone and optional fractions of a second (but no date) by storing the ISO8601 form as a STRING value (e.g., `10:15:30+01:00`)
* `io.debezium.time.ZonedTimestamp` represents a date and time with timezone and optional fractions of a second by storing the ISO8601 form as a STRING value (e.g., `2011-12-03T10:15:30.030431+01:00`)

This range of semantic types allows for a far more accurate representation in the events of the temporal values stored within the database. The MySQL connector chooses the semantic type based upon the precision of the MySQL type (e.g., `TIMESTAMP(6)` will be represented with `io.debezium.time.MicroTimestamp`, whereas `TIMESTAMP(3)` will be represented with `io.debezium.time.Timestamp`). This ensures that the events do not lose precision and that the semantics of the database column values are retained in the events even though the values are represented with primitive values.

Obviously these Kafka Connect schema representations are different and more precise than the built-in `org.apache.kafka.connect.data.Date`, `org.apache.kafka.connect.data.Time`, and `org.apache.kafka.connect.data.Timestamp` logical types provided by Kafka Connect and used by the MySQL connector in all 0.2.x and 0.1.x versions. Migration to the new MySQL connector should be possible, although consumers may still need to know about these types to properly handle temporal values and the correct precision (i.e., consumers can just assume all date INT64 values represent milliseconds).

The MySQL binlog client library converted the raw binary event information to JDBC types using a local Calendar instance, which obviously incorporates the local timezone and cannot retain more than millisecond precision. This change extends the library's deserializers to instead use the Java 8 `javax.time` classes and to retain the exact semantics of the database values and to not lose any precisions (since the `javax.time` classes have nanosecond precision).

The same logic is also used to convert the JDBC values obtained during a snapshot from the MySQL Connect/J JDBC driver. The latter has a few quirks, such as not returning any fractional seconds for `TIME` columns, even though `java.sql.Time` can store up to milliseconds.

Most of the logic of the conversions of values and mapping to Kafka Connect schemas is handled in the new `JdbcValueConverters`, which was extracted from the existing `TableSchemaBuilder`. The MySQL connector reuses and actually extends the `JdbcValueConverters` class with its own `MySqlValueConverters` class that also adds support for MySQL-specific types such as `YEAR`. Other connectors whose values are based on JDBC types should be able to reuse and/or extend the `JdbcValueConverters` class.

Integration tests that deal with temporal types were modified to use proper expected values and comparisons.
2016-08-10 15:51:07 -05:00
Randall Hauch
774f105670 Merge pull request #85 from hchiorean/DBZ-95
DBZ-95 Adds support for `null` binlog filename in certain cases
2016-08-10 15:15:24 -05:00
Horia Chiorean
616e7dea72 DBZ-95 Adds support for null binlog filename in certain cases 2016-08-09 20:03:43 +03:00
Horia Chiorean
008263ea00 DBZ-92, DBZ-97 Makes logging more verbose and changes the snapshot reader to produce separate events for each DDL change 2016-08-09 19:04:51 +03:00
Horia Chiorean
ab24f013d1 DBZ-96 Removes some asserts on tables created by another test case 2016-08-08 14:25:38 +03:00
Randall Hauch
2ae26819af DBZ-94 Added support for copying very large tables during snapshot
By default the MySQL JDBC driver will put the entire result set into memory, which obviously doesn't work for tables of even moderate sizes. This change adds support for streaming rows in result sets when the tables have more than a configurable number of rows (defaults to 1,000).

This posed a problem for how we were previously finding the last row in the last table; the MySQL driver does not support `ResultSet.isLast()` on result sets that are streamed. Instead, this commit wraps the consumer to which the snapshot reader writes all source records, with a consumer that buffers the last record. When the snapshot completes, the offset is updated (denoting the end of the snapshot) and set on the last buffered record before that record is flushed to the normal consumer. This should add minimal overhead while simplifying the logic to ensure the last source record has the updated offset.

This also improves the log output of the snapshot process.
2016-08-04 16:06:50 -05:00
Horia Chiorean
bb1b7d5734 DBZ-92 Adds more logging information during MySQL snapshot recreation 2016-08-03 16:54:17 +03:00
Randall Hauch
b9e9f0fdf9 Corrected build to look for updated output of alt-mysql container
The mysql:5.7 docker image changed its output to be more like mysql/mysql-server:5.7, and this broke our build because of what our build is looking for while waiting to for the server to completely intialize. Simply changing the pattern corrects the problem.
2016-08-03 08:24:13 -05:00
Randall Hauch
6894e9c30d Merge pull request #78 from hchiorean/mysql-tests-fix
Fixes some more tests around date handling in the MySQL connector
2016-08-02 20:12:04 -05:00
Randall Hauch
8cb39eacf0 Reverted back to 0.3.0-SNAPSHOT, since the 0.3 candidate release was not acceptable. 2016-08-01 12:25:58 -05:00
Horia Chiorean
eaf295fbf0 Fixes some more tests around date handling in the MySQL connector 2016-07-29 08:57:47 +03:00
Randall Hauch
517272278d [maven-release-plugin] prepare for next development iteration 2016-07-25 17:50:31 -05:00
Randall Hauch
b89296e646 [maven-release-plugin] prepare release v0.3.0 2016-07-25 17:50:31 -05:00
Randall Hauch
e3a00e1992 DBZ-87 Added support for SIGNED in all numeric types in MySQL 2016-07-25 16:07:56 -05:00
Randall Hauch
447acb797d DBZ-62 Upgraded to Kafka and Kafka Connect 0.10.0.0
Upgraded from Kafka 0.9.0.1 to Kafka 0.10.0. The only required change was to override the `Connector.config()` method, which returns `null` or a `ConfigDef` instance that contains detailed metadata for each of the configuration fields, including supporting recommended values and marking fields as not visible (e.g., if they don't make sense given other configuration field values). This can be used by user interfaces to data-drive the configuration of a connector. Also, the default validation logic of the Connector implementations uses a `Validator` that is pretty restrictive in its functionality.

Debezium already had a fairly decent and simple `Configuration` framework. After several attempts to try and merge these concepts, reconciling the two validation mechanisms was very complicated and involved a lot of changes. It was easier to simply continue Debezium-specific validation and to override the `Connector.validate(...)` method to use Debezium's `Configuration`-based validation. Connector-based validation logic includes determining recommended values, so Debezium's `Field` class (used to define each configuration property) was enhanced with a new `Recommender` class that is similar to Kafka's.

Additional integration tests were added to verify that the `ConfigDef` result is acceptable and that the new connector validation logic works as expected, including getting recommended values for some fields (e.g., database names, table/collection names) from MySQL and MongoDB by connecting and dynamically reading the values. This was done in a way that remains backward compatible with the regular expression formats of these fields, but in a user interface that uses the `ConfigDef` mechanism the user can simply select the databases and table/collection identifiers.
2016-07-25 14:21:31 -05:00
Randall Hauch
30777e3345 DBZ-85 Added test case and made correction to temporal values
Added an integration test case to diagnose the loss of the fractional seconds from MySQL temporal values. The problem appears to be a bug in the MySQL Binary Log Connector library that we used, and this bug was reported as https://github.com/shyiko/mysql-binlog-connector-java/issues/103. That was fixed in version 0.3.2 of the library, which Stanley was kind enough to release for us.

During testing, though, several issues were discovered in how temporal values are handled and converted from the MySQL events, through the MySQL Binary Log client library, and through the Debezium MySQL connector to conform with Kafka Connect's various temporal logical schema types. Most of the issues involved converting most of the temporal values from local time zone (which is how they are created by the MySQL Binary Log client) into UTC (which is how Kafka Connect expects them). Really, java.util.Date doesn't have time zone information and instead tracks the number of milliseconds past epoch, but the conversion of normal timestamp information to the milliseconds past epoch in UTC depends on the time zone in which that conversion happens.
2016-07-20 17:07:56 -05:00
Randall Hauch
a5f4d0bf31 DBZ-87 Changed mapping of MySQL TINYINT and SMALLINT columns from INT32 to INT16
The MySQL connector now maps TINYINT and SMALLINT columns to INT16 (rather than INT32) because INT16 is smaller and yet still large enough for all TINYINT and SMALLINT values. Note that the range of TINYINT values is either -128 to 127 for signed or 0 to 255 for unsigned, and thus INT8 is not an acceptable choice since it can only handle values in the range 0 to 255. Additionally, the JDBC Specification also suggests the proper Java type for SQL-99's TINYINT is short, which maps to Kafka Connect's INT16.

This change will be backward compatible, although the generated Kafka Connect schema will be different than in previous versions. This shouldn't cause a problem, since clients should expect to handle schema changes, and this schema change does comply with Avro schema evolution rules.
2016-07-19 11:11:05 -05:00
Randall Hauch
04eef2da5c DBZ-84 Tried to replicate error with MySQL TINYINT columns
Tried unsuccessfully to replicate the problem reported in DBZ-84 with a new regression integration test.
2016-07-19 10:58:28 -05:00
Randall Hauch
ed1b494fdf DBZ-86 Cleaned up logging and error message in MySQL connector 2016-07-15 16:37:47 -05:00
Randall Hauch
a88bcb9ae7 DBZ-86 Generated Kafka Schema names will now also be valid Avro fullnames 2016-07-15 16:29:52 -05:00