Commit Graph

1223 Commits

Author SHA1 Message Date
Randall Hauch
afef753d36 DBZ-242 Corrected MySQL filters to handle built-in tables
When the `table.ignore.built` is set to `false`, the MySQL connector included all of the ~250 system tables yet the database and table filters were never applied to these system tables and therefore all would be captured. With this change, the database filters and table filters now apply to the system tables should they not be ignored.

Note that this does change the behavior of the connector when `table.ignore.built` is set to `false`. However, it is unlikely that anyone is seriously capturing all of the system table changes, so correcting the behavior is the preferred solution.
2017-05-11 11:57:08 +02:00
Gunnar Morling
3f55398bab DBZ-243 Don't mention default value for "database.server.name" for MySQL connector 2017-05-08 20:24:28 +02:00
Gunnar Morling
baa8119df8 Typo fix 2017-05-08 17:40:19 +02:00
Gunnar Morling
1074de4efa DBZ-222 Some more clean-up 2017-05-04 09:25:25 +02:00
Gunnar Morling
6eea4c9717 DBZ-222 Some typo fixes 2017-05-04 08:53:51 +02:00
Omar Al-Safi
3d92011277 DBZ-222 Typo cleanup and removed unnecessary assertion 2017-05-04 08:53:34 +02:00
Gunnar Morling
5630b61be6 DBZ-222 dependency clean-up 2017-05-04 08:53:05 +02:00
Omar Al-Safi
2a17748a95 DBZ-222 Readded readbinlog_test.product and readbinlog_test.purchased to MySqlConnectorIT 2017-05-04 08:53:05 +02:00
Omar Al-Safi
791545c5f4 DBZ-222 Added support for MySQL POINT type 2017-05-04 08:53:05 +02:00
Jiri Pechanec
51a1c7cd69 DBZ-229 - check all privilege records 2017-04-26 15:25:59 +02:00
Randall Hauch
709cd8f3fe [maven-release-plugin] prepare for next development iteration 2017-03-27 11:28:12 -05:00
Randall Hauch
2bc3d45954 [maven-release-plugin] prepare release v0.5.0 2017-03-27 11:28:11 -05:00
Randall Hauch
25adc3f642 Merge pull request #209 from rhauch/dbz-205
DBZ-205 Corrected MySQL connector to handle 2-digit years
2017-03-27 11:10:04 -05:00
Randall Hauch
81f62b6961 DBZ-205 Corrected MySQL connector to handle 2-digit years
MySQL has special handling of 2-digit years that it deems are ambiguous, such as the year value `17` that is actually treated as `2017`. Apparently the 2-digit values are stored in MySQL and the interpretation is performed when the data is extracted, so therefore the connector needs to also perform this adjustment of the year values. This commit uses the JDK’s `TemporalAdjuster` interface and passes this down to the requisite temporal-related datatype handling code. The MySQL connector then provides its own `TemporalAdjuster` implementation that adjusts the year values via the excellend JDK `Temporal` methods.

A row in one of the MySQL test databases was changed to use a 2-digit year of `16` while the test method still checks that the year is still 2016`, verifying that the year value is properly adjusted.
2017-03-27 10:58:21 -05:00
Randall Hauch
74fbb4a140 DBZ-198 Corrected MySQL DDL parser
The parser now handles `BEGIN…END` blocks better by properly handling `IF()` functions that are not `IF…THEN…END IF` control blocks, and `CASE … WHEN … END CASE` control blocks.
2017-03-24 13:12:46 -05:00
Sanjay Kr Singh
ac4575a13c update mysql-dbz-198.ddl with 3 new test cases for PROCEDURE and 2 test cases for FUNCTION
From L49 - one DDL test case for PROCEDURE ,
From L278 - one DDL test case for FUNCTION ,
From L433 - one more DDL test case for PROCEDURE with CURSOR,
From L713 -  ONE more DDL TEST FOR PROCEDURE,
From L755 - ONE more DDL TEST CASE FOR FUNCTION
2017-03-24 12:29:17 +05:30
Randall Hauch
430d756062 [maven-release-plugin] prepare for next development iteration 2017-03-17 15:41:58 -05:00
Randall Hauch
536cbf6300 [maven-release-plugin] prepare release v0.4.1 2017-03-17 15:41:57 -05:00
Randall Hauch
2086b779b1 DBZ-204 Added test case but unable replicate error
Added a test case that uses the MySQL DDL parser to parse similar DDL statements to those reported in the issue, but these are properly handled with the current state of the `master` branch.
2017-03-17 14:15:25 -05:00
Randall Hauch
e5ee3847dd DBZ-195 Added tests to try to replicate a reported issue
Added a table and inserted rows that tries to replicate the problem reported in DBZ-195, but the test was unable to replicate the problem. In fact, this really is no different than existing tests. Changed the log messages so that if/when this happens again it will be possible to know which row was problematic.
2017-03-17 10:47:32 -05:00
Randall Hauch
ddbc1e07aa DBZ-197 Corrected MySQL connector to handle invalid enum values
MySQL represents an invalid enum literal in the binlog events as an empty string or an value of `0`. Now, when the connector comes across such a value in the binlog, it will instead use an empty string for the enum literal.
2017-03-16 13:55:24 -05:00
Randall Hauch
cf5391482a DBZ-198 Improved MySQL DDL parser to better handle blocks
The MySQL parser now properly handles control blocks such as `BEGIN…END`, `IF…END IF`, `REPEAT…END REPEAT`, and `LOOP…END LOOP`, even in cases where the block is preceded by and terminated by a label.
2017-03-16 13:32:21 -05:00
Randall Hauch
06d9bcb092 Merge pull request #199 from dasl-/daylightsavings
DBZ-202 Fix daylight savings test issues
2017-03-15 15:41:41 -05:00
dleibovic
782ccc66e5 fix daylight savings test issues 2017-03-15 15:44:03 -04:00
rich
06a25f8fd9 modify the query used to fetch the table list so that it includes tables only(no views) 2017-03-15 13:56:52 -04:00
Randall Hauch
b48ccce4b5 DBZ-200 Corrected MySQL DDL parser to better handle column definitions
Apparently not all reserved words must be quoted when using them as colum names, so refactored MySQL’s DDL parser to better handle a variety of unquoted colum names that are reserved words.
2017-03-08 12:12:27 -06:00
Josh Stanfield
c794416684 update to allow savepoint in mysql replication stream 2017-03-08 09:28:10 -07:00
dleibovic
9fd2afc8b4 Increase the default mysql init timeout to 60 seconds for slower computers. Also paramterize it so that users can pass a custom value via 'mvn clean install -Dmysql.init.timeout=80000' for example 2017-02-23 13:34:14 -05:00
rich
9aa49736c8 DBZ-140 when locking individual tables, use a single statement with all the table names instead of issuing a statement per table which causes a MySQL error 2017-02-16 15:45:29 -05:00
Randall Hauch
043a2d2d92 DBZ-194 Improved MySQL connector’s built-in table filtering
The MySQL connector’s built-in table filter now just filters out all tables within the known built-in databases, and does not check the names of the tables. Thus, the connector should no longer filter out tables in other databases that happen to have the same names as the tables in the built-in databases.
2017-02-14 09:23:39 -06:00
Randall Hauch
af94fa8759 DBZ-193 MySQL DDL parser handles FULLTEXT index
Corrected the MySQL DDL parser to correctly handle `FULLTEXT` indexes within a `CREATE TABLE` statement. The parser was incorrectly using `canConsume(…)` with a list of options instead of `canConsumeAnyOf(…)`.
2017-02-10 15:49:20 -06:00
Randall Hauch
9a4a177004 DBZ-188 Corrected JavaDoc 2017-02-10 15:39:22 -06:00
dleibovic
aa50bfe71a DBZ-188: Allow a debezium mysql connector to filter production of DML events into kafka by the mysql UUID of the event
With GTIDs enabled, each transaction in the binlog contains a GTID event, which gives us access to the GTID of the transaction. The GTID has the following format: source_id:transaction_id, where source_id is the UUID of the mysql server the transaction was written to.

I propose to allow a debezium instance to be configured with a UUID pattern to check against before producing DML events into Kafka. Debezium would produce a DML event into kafka if and only if the UUID in the event's GTID matches the pattern with which debezium was configured.

The configuration for the UUID patterns will make use of the existing gtid.source.includes and gtid.source.excludes options. The DML event filtering will only be performed if the new option gtid.source.filter.dml.events is true.
2017-02-10 14:14:10 -05:00
Randall Hauch
d2986710a5 DBZ-188 More efficient GTID source filters for MySQL Connector
Changed the GTID source filters in the MySQL connector to be far more efficient when the filters specify literal UUIDs rather than regex patterns. In these cases, the predicate just checks whether a supplied value is in a hash set, and no regular expression patterns are used.

The GTID source filters can still be a combination of UUID literals and regular expressions, and the predicate will use the best implementation for each. For example, if the filters include all UUID literals, then regular expressions will never be used.
2017-02-10 11:34:24 -06:00
Randall Hauch
8c60c29883 [maven-release-plugin] prepare for next development iteration 2017-02-07 14:22:12 -06:00
Randall Hauch
20134286e9 [maven-release-plugin] prepare release v0.4.0 2017-02-07 14:22:11 -06:00
Randall Hauch
403fee1375 DBZ-185 MySQL’s database history now filters GTID sources
Corrects how the MySQL connector reloads database history to take into account the included and excluded GTID sources. This only affects a connector configured to capture changes from _multiple_ MySQL database servers when GTID sources are explicitly excluded or included.
2017-02-07 11:21:22 -06:00
Randall Hauch
bb0800ca3a DBZ-140 Improved locking logic to support RDS
Improved the MySQL connector's logic to better handle Amazon RDS that does not allow giving user `SUPER` privileges. As before, the connector starts a transaction and attempts to get a global read lock via `FLUSH TABLES WITH READ LOCK` to prevent writes to the database so that the binlog position can be accurately read _and_ the table schemas can be read without interference from other clients. Once that is done, the connector releases the global read lock and continues in the same transaction to read all table rows. This means that our snapshot is consistent, but we maintain the global read lock for a very short period of time.

Amazon's RDS and Aurora are hosted MySQL instances that do not allow users to have the `SUPER` privilege, which means the user cannot get a global read lock. In this case, the connector detects this error, continues to read the database and table names (without any lock), and _then_ uses `FLUSH TABLES <tableName> WITH READ LOCK` on each table that satisfies the filters to prevent changes from other clients. The connector then reads the table schemas, reads _all_ table rows, commits the transaction, and _finally_ releases the table locks.

Therefore, there are two very different behaviors/requirements when the user can't obtain a global read lock because of lack of privilege, like on RDS:

# The RDS user that the connector makes use of must also have the `LOCK TABLES` privilege; without it the connector will fail during the snapshot.
# The connector must hold the table read locks _until it has completed reading all of the tables_, since release the table locks using `UNLOCK TABLES` would prematurely commit our transaction and prevent us from getting a consistent snapshot. From the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/flush.html):
> `UNLOCK TABLES` implicitly commits any active transaction only if any tables currently have been locked with `LOCK TABLES`. The commit does not occur for `UNLOCK TABLES` following `FLUSH TABLES WITH READ LOCK` because the latter statement does not acquire table locks.
2017-02-06 13:56:55 -06:00
Randall Hauch
5490842449 Merge pull request #175 from rhauch/dbz-176
DBZ-176 Corrected MySQL DDL parser to support creating triggers with definers
2017-02-02 13:59:01 -06:00
Randall Hauch
74e5ba6448 DBZ-176 Corrected MySQL DDL parser to support creating triggers with definers
The MySQL DDL parser was not correclty handling `DEFINER` clauses within `CREATE TRIGGER` or `CREATE EVENT` statements. Support for `DEFINER` clauses was recently added for the various forms of `CREATE PROCEDURE`, `CREATE FUNCTION` and `CREATE VIEW` statements. These are the only kinds of statements that have the definer attribute, per the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/stored-programs-security.html).
2017-02-02 12:44:28 -06:00
Randall Hauch
32a88fdc6f DBZ-184 Added database and table name to change event metadata 2017-02-02 12:09:53 -06:00
Randall Hauch
6230cab90e Merge pull request #173 from rhauch/dbz-113
DBZ-113 Added MySQL threads to the event’s source metadata
2017-02-02 12:00:19 -06:00
Randall Hauch
fe17b246af DBZ-113 Added MySQL threads to the event’s source metadata
Changed the events’ `source` structure to optionally contain the identifier of the MySQL thread where appropriate. The thread is included on each `BEGIN` binlog event, so these are captured and added to all of the associated change events produced for that transaction.
2017-02-02 11:53:32 -06:00
Randall Hauch
f2a65d03df DBZ-174 Added support for new binlog events
MySQL recently added additional binlog events, and this commit adds support to handle these new events by ignoring them.
2017-02-01 15:26:28 -06:00
Horia Chiorean
031c4a1552 DBZ-183 Fixes the BinlogReader's handling of TIMESTAMP columns to correctly account for timezones 2017-01-25 16:39:36 +02:00
Randall Hauch
a73f85a80f Merge pull request #162 from rareddy/DBZ-177
DBZ-177: Providing an alternative way to create JDBC connection based …
2017-01-13 13:37:38 -06:00
Ramesh Reddy
a9aace3480 DBZ-177: Providing an alternative way to create JDBC connection based on the configured JDBC driver class name and supplied classloader. The loading/creating the JDBC connections is not reliable when driver libraries in a different classloader than the DriverManager. 2017-01-13 12:58:14 -06:00
Horia Chiorean
a300d3e1cf DBZ-3 Changes the configuration of the Docker Maven plugin to only use alias naming when necessary and moves the PG connector ahead of the Mongo connector in the build 2016-12-27 14:44:33 +02:00
Horia Chiorean
23e3f59fa1 DBZ-3 Implements a connector for streaming changes from a Postgres database
The version of the DB server required for this to work is at least 9.4
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
2016-12-27 14:44:32 +02:00
Randall Hauch
e60839e76b DBZ-164 Improved MySQL snapshot reader logic
Added more logic to the snapshot reader to better handle errors when reading the list of table names in each database. Now, any errors with a single database (e.g., some of the not-quite-a-database names described in the JIRA issue) will cause the snapshot reader to simply skip that database name and continue on (with proper logging).

This change also quotes all of the database and table names when used in SQL statements.
2016-12-20 22:03:46 -06:00
Randall Hauch
fd7e152852 Merge pull request #142 from rhauch/dbz-151
DBZ-151 Added new integration test framework
2016-12-20 17:53:16 -06:00
Randall Hauch
ab1140ef70 Merge pull request #155 from rhauch/dbz-169
DBZ-169 MySQL connector support for ON UPDATE clauses
2016-12-20 17:48:06 -06:00
Randall Hauch
fe44380d4c Merge pull request #154 from rhauch/dbz-168
DBZ-168 MySQL connector ignores XA binlog events
2016-12-20 17:47:57 -06:00
Randall Hauch
a9a84cb6aa DBZ-152 Enabled MySQL connector to skip table count checks during snapshot
Change the MySQL connector’s `min.row.count.to.stream.results` configuration property to accept a value of 0, which signifies that all `SELECT COUNT(*) FROM tableA` queries should be skipped and instead all results should be streamed.
2016-12-20 17:40:57 -06:00
Randall Hauch
046702d959 DBZ-169 MySQL connector support for ON UPDATE clauses
Corrected the MySQL DDL parser to support `ON UPDATE NOW()` clauses in addition to `ON UPDATE CURRENT_TIMESTAMP`.
2016-12-20 16:19:18 -06:00
Randall Hauch
09f87cf190 DBZ-168 MySQL connector ignores XA binlog events
MySQL 5.7.7 introduced new behavior for handling XA events in the binlog. See the [MySQL documentation|http://dev.mysql.com/doc/refman/5.7/en/xa-restrictions.html] for details. This PR changes the binlog reader so that `XA …` statements appearing in the binlog are ignored altogether.
2016-12-20 15:32:44 -06:00
Randall Hauch
5dceb05f69 DBZ-151 Additional changes to improve test framework and MySQL integration tests 2016-12-20 10:58:56 -06:00
Randall Hauch
08e32a4a8b DBZ-151 Added multiple integration test modules to test various MySQL versions and configurations.
These new modules run during the '-Passembly' profile and use the new integration test framework that compares all
output produced by a connector to expected results that were previously recorded and verified. These integration test modules
can be run manually with a simple build of those modules or their parent; only the top-level 'integration-tests' module is run
during the assembly profile during builds of the entire codebase.
2016-12-20 09:18:10 -06:00
Randall Hauch
0851d8280c DBZ-166 Corrected shutdown logic of MySQL connector
The MySQL connector uses several threads, so previously upon connector shutdown these threads were simply cancelled. This is fine for the binlog reader (which can stop at any moment), but is a poor approach for the snapshot as we didn’t always properly release the database resources and also didn’t complete the writing of the DDL history.

With this change, the snapshot reader stops in a very controlled manner, basically by having the 10-step snapshot procedure frequently check whether the reader is to continue working, and to completely avoid thread interruption altogether. And, the snapshot procedure will always clean up its database resources (locks, transactions, etc.), even if the procedure is stopped before completion.

This change also refactors how the snapshot and binlog reader are managed. This is no longer done in the MySqlConnectorTask class (which is busy enough), but rather the logic has been encapsulated in a new `ChainedReader` that makes use of a new `Reader` interface. This makes testing of `ChainedReader` easier, and ensure that `ChainedReader` relies only upon the primary methods of `Reader` rather than upon `AbstractReader`. `ChainedReader` handles multiple readers generically, and ensures that when stopped the readers are all handled correctly and completely process all records, yet avoid accidentally starting a subsequent reader(s) when stopping the previous reader.
2016-12-15 10:55:18 -06:00
Randall Hauch
e3e66bf960 DBZ-161 Corrected MySQL connector logic when no GTIDs are used
Corrected the logic of the MySQL connector when getting the server’s GTID set. Previously, this logic failed if GTIDs are not used.
2016-12-08 08:09:52 -06:00
Dennis Persson
acd7bd8fa5 DBZ-142 Handle national character set columns in DDL parser 2016-12-07 07:38:30 +01:00
Randall Hauch
c762a221b7 DBZ-162 Corrected DDL parsing of MySQL functions
The MySQL DDL parser was not properly consuming function declarations. For functions, the parser consumes the entire statement without handline the various expressions within the function declaration, but the parser was not properly finding the end of the statement and instead was continuing to try to consume values beyond the end of the statement.

Specifically, when the parser consumes a `BEGIN`, it looks for a corresponding `END`. However, if it encountered an `END IF`, the `IF` plus any remaining tokens were left on the token stream and unprocessed. This confused the parser, which keep looking for statements and ultimately ended with a `No more content` error.

This case was replicated in integration tests, and the code fixed to properly find the end of the statements.
2016-12-06 17:34:52 -06:00
Randall Hauch
c72242eeb0 Merge pull request #145 from sherafpm/bugfix/DBZ-160
DBZ-160 - Issue while parsing create table script with ENUM type and default value 'b'
2016-12-06 14:21:23 -06:00
Randall Hauch
eedc4fba00 DBZ-163 Corrected assembly profile in build
The Travis-CI builds run the Maven build using the `assembly` profile, and this has been failing quite a bit lately.

The first problem appears to be that the Travis-CI environment recently changed to have port 3306 taken, which means that our build fails to start any Docker containers for MySQL that attempt to use this port. A simple fix is to use different ports for the assembly build.

However, trying to change the port numbers for some of the profiles caused a lot of problems, and to correct these required refactoring how the properties are set. The Docker Maven plugin is now configured with separate properties that are set once (depending upon the profile) to determine the port assignments of the various Docker containers. The Failsafe plugin executions then use these Maven properties when setting the system variables (e.g., `database.host`) needed in the integration tests. This appears to have worked, but it still is a bit fragile. For example, the assembly profile defines several Failsafe executions, and during this profile these should be the only executions run; however, if not all the properties are set properly, the build seems to also run the default Failsafe execution in addition to the other `assembly` profile executions. (I think properties can’t only be defined in the execution, but need to also be defined in the Failsafe configuration.)

The “alternative” MySQL Docker images were removed, since they basically should not provide any different behavior than the `mysql/mysql-server` images we normally used. The extra containers required a lot more resources to run and dramatically increased the complexity of the build.

A few other trivial changes were made.
2016-12-05 16:37:59 -06:00
Randall Hauch
2b2bf693d7 DBZ-163 Changed Travis-CI build to skip the install dependencies step 2016-12-02 15:43:57 -06:00
Sherafudheen PM
ee52219736 DBZ-160 - Issue while parsing create table script with ENUM type and default value 'b' 2016-12-02 17:42:44 +05:30
Randall Hauch
0bf3b4c9f3 DBZ-157 Upgraded Docker Maven plugin
Upgraded the Docker Maven plugin to 0.18.1, which required changing our use of the `docker.image` to `docker.filter` (per the [changes in 0.17.1](https://github.com/fabric8io/docker-maven-plugin/blob/master/doc/changelog.md)).
2016-11-22 09:23:07 -06:00
Randall Hauch
a82ae5691b Reduce the log verbosity of the MySQL tests 2016-11-14 13:41:10 -06:00
Randall Hauch
d80bc1bfd7 DBZ-153 MySQL connector supports enum and set values with parentheses
Changed the MySQL connector to support ENUM and SET literals with parentheses.
2016-11-14 12:22:08 -06:00
Randall Hauch
8a52cda0dc DBZ-150 Changed the order of events when a row's key is changed. 2016-11-09 14:42:43 -06:00
Randall Hauch
b0ded5f383 DBZ-147 Added ability to treat MySQL DECIMAL as double
By default the MySQL connector handles `DECIMAL` and `NUMERIC` columns using `java.math.BigDecimal` values and describing them using the `org.apache.kafka.connect.data.Decimal` schema type, which serializes the values to a binary form.

This change adds a configuration option that will keep the default behavior, but will instead allow handling `DECIMAL` adn `NUMERIC` values as Java `double` and a schema type of `FLOAT64`.
2016-11-09 11:27:09 -06:00
Randall Hauch
ea5f7983c7 DBZ-144 Corrected MySQL connector restart
Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped.

The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart.

This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.
2016-11-09 08:11:41 -06:00
Randall Hauch
0d2acfd0a6 DBZ-149 Corrected type of BINARY column
The MySQL connector (or rather the DDL parser used in the connector) improperly assumed a `CHAR` JDBC type (and Avro schema `STRING` type) for MySQL columns of type `BINARY`. This corrects the error.
2016-11-08 17:41:01 -06:00
Randall Hauch
7656dce985 DBZ-148 Corrected timestamp check in test case to account for DST 2016-11-08 15:37:03 -06:00
Randall Hauch
43d2bf14cf DBZ-143 Minor improvements and correction of JavaDoc. 2016-11-04 09:02:44 -05:00
Randall Hauch
207315e5df DBZ-146 Improved error handling of MySQL Connector
Improved the error handling of the MySQL connector to ensure that we’re always stopping the connector when we have a problem handling a binlog event or if we have problems starting.
2016-11-03 16:55:59 -05:00
Chris Riccomini
c195fc1f4c DBZ-143 Support multi-channel MySQL failover
Make Debezium merge its GTID set with the GTID set on the server that
it's connecting to. This allows Debezium to consume from a MySQL server
that might have a different set of channels (upstream masters), provided
that the server has the data that Debezium needs.
2016-11-03 16:47:42 -05:00
Randall Hauch
f970899a6d DBZ-133 Minor changes to JavaDoc 2016-10-25 11:12:55 -05:00
Prannoy Mittal
fa66abdcc3 DBZ-133 is for enabling schema only snapshot mode.
Snapshot Reader will have a dataInclude flag, which will determine whether initial data in whitelisted database and tables have to
    read or not. In schema only mode, will not read inital data, will capture only database table schema
    Added unit test for validating checks that initial data is not copied
2016-10-25 11:04:26 -05:00
Randall Hauch
094f9a4925 DBZ-139 Corrected binlog timestamp handling
MySQL records the timestamp with second precision in binlog events, but the library we use multiplies by 1000 to return the padded value in milliseconds (even though the value still has second precision). The BinlogReader converts this back to seconds, so the SourceInfo should not also be dividing by 1000.
2016-10-20 09:31:02 -05:00
Randall Hauch
25b8055642 DBZ-134 Enabled JMX metrics for MySQL connector
Added an MXBean for the MySQL connector that captures various metrics while reading the binlog.
2016-10-19 16:48:11 -05:00
Randall Hauch
4a62b09ead DBZ-126 Added support for MySQL JSON type
Adds support for MySQL 5.7's `JSON` type, which is capable of holding JSON objects, JSON arrays, and scalar values. The Debezium MySQL connector represents `JSON` values as string with a `io.debezium.data.Json` semantic type (which is basically a string schema that has a special name to denote the semantics), and the _contents_ of that string will be the JSON representation of the object, array, or scalar value.
2016-10-18 17:32:55 -05:00
Randall Hauch
2f5772712a DBZ-129 Fix for GTID updates
Workaround for https://github.com/shyiko/mysql-binlog-connector-java/issues/122.
2016-10-18 14:32:06 -05:00
Randall Hauch
7387654bfa DBZ-129 Additional improvements for MySQL connector GTID-based startup
Added more integration tests to verify the behavior of the MySQL connector when it is (re)starting using GTIDs.
2016-10-18 14:30:10 -05:00
Randall Hauch
305c4c5ac6 DBZ-129 MySQL connector can now use subset of GTID set when reconnecting to MySQL
When a connector is originally connected to a MySQL server, it will record the GTID set that identifies the position in the binlog. When all of the interesting transactions originate on a different server (i.e., the server we're listening to is a replica), the server we're listening to will still include some transactions in the binlog (e.g., for the information schema, performance, or other internal databases), and so the GTID set will include a GTID range for our server. If we stop the connector and want to point it to a different MySQL server, asking MySQL to position the binlog using the complete GTID set (including the GTID range for our old replica) will cause an error, since the new server does not have any GTID ranges from the old replica. Therefore, the connector needs to be able to exclude some GTID ranges that originated on the original replica, using the `server_uuid` property of the replica server.

This change adds two configuration properties: `gtid.source.includes` and `gtid.source.excludes`. Both are optional, but at most only one of these can be used. These properties contain a comma-separated list of GTID sources (i.e., the `server_uuid` value for the server where the transaction originated) or regular expressions matching GTID sources, and upon connector startup the connector uses the list to filter the previously-recorded GTID set against the available GTID set in the current MySQL server. By including specific GTID sources, an administrator can control the subset of GTID ranges that govern the binlog position.

These properties will not be useful in some topologies, especially when the MySQL server from which the binlog is being read is the originating server for some of the transactions. However, these properties may be very useful in any topology where the connector is _only_ reading from replicas, so that the connector can be switched to another replica at any time. In some cases it may be easier to exclude all of the replicas' `server_uuid` values, while in other cases it may be easier to include all of the `server_uuid` values where transactions can originate.
2016-10-18 14:29:58 -05:00
Horia Chiorean
1a99f5bbc7 DBZ-135 Fixes the parsing of line separators by GtidSet (#118) 2016-10-13 10:18:33 -05:00
Randall Hauch
d955ed2e4b DBZ-132 Cleanup of code (#117)
Additional cleanup of changes made for DBZ-132.
2016-10-11 15:36:07 -05:00
Prannoy Mittal
301d60411f Using debezium String Library to get join to list of strings 2016-10-12 00:53:36 +05:30
Prannoy Mittal
a36700e51b Enum and Set were assumed to single character.
Updated MysqlParser to return list of String for allowed enum and set values
And also added code fix to get a enum value at a particular index and for set option too.
Used debezium string utility to join list of string into deliminator seperated String.
Updating old test cases as per required to handle list of strings.
2016-10-12 00:41:08 +05:30
Willie
fda76c875e DBZ-115 Add support to recognize older row_event formats 2016-10-08 11:42:12 -07:00
Randall Hauch
99a86ad289 Merge pull request #112 from rhauch/dbz-123
DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
2016-10-07 17:16:37 -05:00
Randall Hauch
beb47dd2de DBZ-131 Improved logging while reading binlog
When the MySQL connector is reading the binlog, it outputs INFO log messages reporting status at an exponentially-increasing rate, starting at every 5 seconds and doubling until a max period of 1 hour. This output is useful when the connector starts to know that it is working, but thereafter the usefulness decreases. Once an hour is probably acceptable output.

This is not intended to replace the capturing of metrics, but is merely an aid to easily tell via the logs whether the connector continues to work.

Also improved the log message when the binlog reader stops to capture the total number of events recorded by Kafka Connect and the last recorded offset.
2016-10-07 17:10:01 -05:00
Randall Hauch
50eb4094ac DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
The DDL parser now properly handles bit-set literals, and several minor case-sensitivity bugs dealing with other escaped literals.
2016-10-06 13:25:38 -05:00
Randall Hauch
64bab3b3cf DBZ-104 Added test to verify behavior of CREATE TABLE LIKE expressions with and without snapshot 2016-09-23 12:11:38 -05:00
Randall Hauch
dc03335049 DBZ-128 Additional fix to MySQL compatibility message. 2016-09-23 11:03:40 -05:00
Randall Hauch
7654321cfd DBZ-128 Improved checking of MySQL status and configuration
Added logic to verify that MySQL's row-level binlog is enabled, and whether it is likely that when snapshots are not performed that the binlog is likely to have been purged. Some situations will result in an error, while others are logged as warnings.
2016-09-22 17:06:14 -05:00
Randall Hauch
730603976d Merge pull request #107 from rhauch/dbz-123
DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
2016-09-21 15:22:00 -05:00
Randall Hauch
bcf60940db DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
Corrected how the MySQL connector is treating columns of type `BIT(n)`, where _n_ is the number of bits in the value. When  `n=1`, the resulting values are booleans; when `n>1`, the resulting values are little endian `byte[]` that have the minimum number of bytes to hold the `n` bits.
2016-09-21 15:04:20 -05:00
Randall Hauch
9aae6c62d9 DBZ-124 Eliminated the JMX "already registered" warning in the MySQL connector
The `KafkaDatabaseHistory` was always creating a new producer whenever its `start()` method was called, even if it were called more than once. And, the `MySqlSchema` was calling `start()` twice, resulting in multiple producers being created and registered with JMX. Both issues were fixed.

Also, UUIDs were being used as the name of the JMX MBean for the producer, unless the `database.history.consumer.client.id` and `database.history.producer.client.id` properties were being explicitly set. Now, the MySQL connector will by default set the `client.id` property on both the database history's Kafka consumer and producer to `{connectorName}-dbhistory`. Of course, the `database.history.consumer.client.id` and `database.history.producer.client.id` properties can still be set to define the name of the producer and consumer.
2016-09-21 10:05:15 -05:00
Randall Hauch
54b737edc1 DBZ-114 MySQL connector now handles "zero-value" dates and timestamps
MySQL supports "zero-value" dates and timestamps, but these cannot be represented as valid dates or timestamps using the Java types. For example, the zero-value `0000-00-00` for a date has what Java considers to be an invalid month and day-of-the-month.

This commit changes how the MySQL connector handles these values to not throw exceptions. When columns allow nulls, such values will be treated as nulls; when columns do not allow null values, these values will be converted to a "zero-value" for the corresponding Java representation (e.g., the epoch day or timestamp). A new test case verifies the behaviors.
2016-09-21 09:23:12 -05:00
Akshath
8a1a9c3542 Changed server.id to support Long instead of Int 2016-09-06 15:09:05 -07:00
Randall Hauch
de1edce895 DBZ-116 Improved logging when MySQL connector is reading binlog
The MySQL connector now outputs an INFO log message whenever its task's `poll()` method returns a non-empty list of `SourceRecord` objects, where the message includes the number of records and the offset of the last record.
2016-09-06 11:31:54 -05:00
Randall Hauch
330a27ce52 Merge pull request #97 from rhauch/dbz-102
DBZ-102 MySQL connector support for column charsets
2016-08-29 15:12:24 -05:00
Randall Hauch
cc8f45309a Merge pull request #98 from rhauch/dbz-112
DBZ-112 Corrected the logic of setting the MySQL driver's SSL-related system properties
2016-08-29 15:00:34 -05:00
Randall Hauch
5cef237aac DBZ-111 Corrected GTID set comparison logic of the MySQL connector
The MySQL connector was improperly comparing the GTID set required by the connector to the GTID set of the MySQL instance. In particular, when the GTID set of the MySQL server contained a newline character, the comparison logic failed. (This should have been fixed as part of DBZ-107.)
2016-08-29 14:53:21 -05:00
Randall Hauch
0861518788 DBZ-112 Corrected the logic of setting the MySQL driver's SSL-related system properties 2016-08-29 14:27:43 -05:00
Randall Hauch
a46a427b57 DBZ-102 Added MySQL integration test that verifies character encodings
Added a table with data to one of the MySQL databases used in the integration tests. It verifies that the UTF-8 data stored in the table is able to be handled properly when obtaining a snapshot and reading the binlog.
2016-08-29 13:42:10 -05:00
Randall Hauch
cc94bbc697 DBZ-102 MySQL connector now processes character sets
The MySQL binlog events contain the binary representation of string-like values as encoded per the column's character set. Properly decoding these into Java strings requires capturing the column, table, and database character set when parsing the DDL statements.

Unfortunately, MySQL DDL allows columns (at the time the columns are created or modified) to inherit the default character set for the table, or if that is not defined the default character set for the database, or if that is not defined the character set for the server. So, in addition to modifying the MySQL DDL parser to support capturing the character set name for each column, it also had to be changed to know what these default character set names are.

The default character sets are all available via MySQL server/session/local variables. Although strictly speaking the character set variables cannot be set globally, MySQL DDL does allow session and local variables to be set with `SET` statements. Therefore, this commit enhances the MySQL DDL parser to parse `SET` statements and to track the various global, session, and local variables as seen by the DDL parser. Upon connector startup, a subset of server variables (related to character sets and collations) are read from the database via JDBC and used to initialize the DDL parser via `SET` methods.

In addition to initializing the DDL parser with the system variables related to character sets and collation, it is important to also capture the server and database default character sets in the database history so that the correct character sets are used for columns even when the default character sets have changed on the database and/or the server. Therefore, upon startup or snapshot the MySQL connector records in the database history a `SET` statement for the `character_set_server` and `collation_server` system variables so that, upon a later restart, the history's DDL statements can be re-parsed with the correct default server and database character sets. Also, when the MySQL connector reloads the database history (upon startup), the recorded default server character set is compared with the MySQL instance's current server character set, and if they are different the current character set is recorded with a new `SET` statement.

These extra steps ensure that the connector use the correct character set for each column, even when the connector restarts and reloads the database history captured by a previous version of the connector. IOW, the MySQL connector can be safely upgraded, and the new version will correctly start using the columns' character sets to decode the string-like values.
2016-08-29 12:19:24 -05:00
Randall Hauch
257e81c540 DBZ-102 MySQL in-memory models of tables capture column character sets
The DDL parser and in-memory models of the relational schemas were changed to capture the character set for each column whose type is a string (e.g., `CHAR`, `VARCHAR`, etc.). This required handling `SET` statements used to change the system variables that hold the names of the default character set for the server and for each database. So, even if a column does not explicitly define the character set, the column's actual character set is identified from the table's character set, which might default to the current database's character set, which if not set defaults to the system character set.

These changes merely affect how MySQL DDL is parsed and the in-memory relational schema representation to accommodate the character set at various levels. It does not change the behavior of the MySQL connector; that will be done in a subsequent commit.

All tests pass with these changes, including quite a few additional tests for the new functionality.
2016-08-29 11:50:51 -05:00
Randall Hauch
93d0fae02b DBZ-109 Captured MySQL error code and SQLSTATE code in exceptions
The binlog reader and JDBC operations might throw exceptions with this information, so in these cases the connector now captures the error code and SQLSTATE code from the exception and includes them in the message.
2016-08-25 08:11:50 -05:00
Randall Hauch
638b459484 DBZ-108 Removed the TimeZoneAdapter and test, which is no longer used 2016-08-24 16:31:35 -05:00
Randall Hauch
4de56fd657 Merge pull request #94 from hchiorean/DZB-header-fix
Fixes the DBZ header required by checkstyle
2016-08-24 14:28:43 -05:00
Randall Hauch
ce2b2db80c DBZ-99 Added support for MySQL connector to connect securely to MySQL
Changed the MySQL connector to have several new configuration properties for setting up the SSL key store and trust store (which can be used in place of System or JDK properties) used for MySQL secure connections, and another property to specify what kind of SSL connection be used.

Modified several integration tests to ensure all MySQL connections are made with `useSSL=false`.
2016-08-24 13:27:35 -05:00
Horia Chiorean
2732d26ff0 Fixes the DBZ header required by checkstyle
This commit removes an extra space character from the first blank line of the header
2016-08-24 13:41:15 +03:00
Randall Hauch
40318f87a3 Merge pull request #92 from rhauch/dbz-107
DBZ-107 MySQL Connector should tolerate newlines in GTID sets read during snapshot
2016-08-23 17:45:58 -05:00
Randall Hauch
3051e3b2d7 DBZ-107 MySQL Connector should tolerate newlines in GTID sets read during snapshot 2016-08-23 17:37:48 -05:00
Randall Hauch
448d514c81 DBZ-106 Corrected the MySQL DDL parser to properly handled quoted keywords as column names. 2016-08-23 17:03:53 -05:00
Randall Hauch
e86fb83459 [maven-release-plugin] prepare for next development iteration 2016-08-16 09:56:47 -05:00
Randall Hauch
ccdb0a1a63 [maven-release-plugin] prepare release v0.3.0 2016-08-16 09:56:47 -05:00
Randall Hauch
83b2d2eab6 DBZ-92 Added a few log statements upon MySQL connector task startup 2016-08-15 21:57:09 -05:00
Randall Hauch
b1277ed196 DBZ-94 Corrections to mark the end of the snapshot 2016-08-15 21:56:20 -05:00
Randall Hauch
2fea8c8ee0 DBZ-100 Corrected JavaDoc and renamed methods to be more explicit 2016-08-15 12:37:39 -05:00
Randall Hauch
d8a5d2b50f DBZ-100 Corrected MySQL connector's use of ENUM and SET values
The ENUM and SET values read from the binlog contain the indexes of the options that are included in the value, but this doesn't compared with the string values returned by MySQL and JDBC that contain the comma-separated options. With this change, the values read from the binlog will also be comma-separated strings.
2016-08-15 12:11:35 -05:00
Randall Hauch
629542458e DBZ-91 Added option to force use Kafka Connect temporal types. 2016-08-11 10:48:07 -05:00
Randall Hauch
31641fb43e DBZ-91 Changed how temporal values are treated in MySQL connector
Rewrote how the MySQL connector converts temporal values to use schemas with names that identify the semantic
type of temporal value, and customized how the MySQL binlog client library creates Java object values from the
raw binlog events.

Several new "semantic" schema types were defined:

* `io.debezium.time.Year` represents a year number as an INT32 value (e.g., 2016, -345, etc.).
* `io.debezium.time.Date` represents a date by storing the epoch seconds (that is, the number of seconds past the epoch) as an INT64 value.
* `io.debezium.time.Time` represents a time by storing the milliseconds past midnight as an INT32 value.
* `io.debezium.time.MicroTime` represents a time by storing the microsconds past midnight as an INT32 value.
* `io.debezium.time.NanoTime` represents a time by storing the nanoseconds past midnight as an INT32 value.
* `io.debezium.time.Timestamp` represents a date and time (without timezone information) by storing the milliseconds past epoch as an INT64 value.
* `io.debezium.time.MicroTimestamp` represents a date and time (without timezone information) by storing the microseconds past epoch as an INT64 value.
* `io.debezium.time.NanoTimestamp` represents a date and time (without timezone information) by storing the nanoseconds past epoch as an INT64 value.
* `io.debezium.time.ZonedTime` represents a time with timezone and optional fractions of a second (but no date) by storing the ISO8601 form as a STRING value (e.g., `10:15:30+01:00`)
* `io.debezium.time.ZonedTimestamp` represents a date and time with timezone and optional fractions of a second by storing the ISO8601 form as a STRING value (e.g., `2011-12-03T10:15:30.030431+01:00`)

This range of semantic types allows for a far more accurate representation in the events of the temporal values stored within the database. The MySQL connector chooses the semantic type based upon the precision of the MySQL type (e.g., `TIMESTAMP(6)` will be represented with `io.debezium.time.MicroTimestamp`, whereas `TIMESTAMP(3)` will be represented with `io.debezium.time.Timestamp`). This ensures that the events do not lose precision and that the semantics of the database column values are retained in the events even though the values are represented with primitive values.

Obviously these Kafka Connect schema representations are different and more precise than the built-in `org.apache.kafka.connect.data.Date`, `org.apache.kafka.connect.data.Time`, and `org.apache.kafka.connect.data.Timestamp` logical types provided by Kafka Connect and used by the MySQL connector in all 0.2.x and 0.1.x versions. Migration to the new MySQL connector should be possible, although consumers may still need to know about these types to properly handle temporal values and the correct precision (i.e., consumers can just assume all date INT64 values represent milliseconds).

The MySQL binlog client library converted the raw binary event information to JDBC types using a local Calendar instance, which obviously incorporates the local timezone and cannot retain more than millisecond precision. This change extends the library's deserializers to instead use the Java 8 `javax.time` classes and to retain the exact semantics of the database values and to not lose any precisions (since the `javax.time` classes have nanosecond precision).

The same logic is also used to convert the JDBC values obtained during a snapshot from the MySQL Connect/J JDBC driver. The latter has a few quirks, such as not returning any fractional seconds for `TIME` columns, even though `java.sql.Time` can store up to milliseconds.

Most of the logic of the conversions of values and mapping to Kafka Connect schemas is handled in the new `JdbcValueConverters`, which was extracted from the existing `TableSchemaBuilder`. The MySQL connector reuses and actually extends the `JdbcValueConverters` class with its own `MySqlValueConverters` class that also adds support for MySQL-specific types such as `YEAR`. Other connectors whose values are based on JDBC types should be able to reuse and/or extend the `JdbcValueConverters` class.

Integration tests that deal with temporal types were modified to use proper expected values and comparisons.
2016-08-10 15:51:07 -05:00
Randall Hauch
774f105670 Merge pull request #85 from hchiorean/DBZ-95
DBZ-95 Adds support for `null` binlog filename in certain cases
2016-08-10 15:15:24 -05:00
Horia Chiorean
616e7dea72 DBZ-95 Adds support for null binlog filename in certain cases 2016-08-09 20:03:43 +03:00
Horia Chiorean
008263ea00 DBZ-92, DBZ-97 Makes logging more verbose and changes the snapshot reader to produce separate events for each DDL change 2016-08-09 19:04:51 +03:00
Horia Chiorean
ab24f013d1 DBZ-96 Removes some asserts on tables created by another test case 2016-08-08 14:25:38 +03:00
Randall Hauch
2ae26819af DBZ-94 Added support for copying very large tables during snapshot
By default the MySQL JDBC driver will put the entire result set into memory, which obviously doesn't work for tables of even moderate sizes. This change adds support for streaming rows in result sets when the tables have more than a configurable number of rows (defaults to 1,000).

This posed a problem for how we were previously finding the last row in the last table; the MySQL driver does not support `ResultSet.isLast()` on result sets that are streamed. Instead, this commit wraps the consumer to which the snapshot reader writes all source records, with a consumer that buffers the last record. When the snapshot completes, the offset is updated (denoting the end of the snapshot) and set on the last buffered record before that record is flushed to the normal consumer. This should add minimal overhead while simplifying the logic to ensure the last source record has the updated offset.

This also improves the log output of the snapshot process.
2016-08-04 16:06:50 -05:00
Horia Chiorean
bb1b7d5734 DBZ-92 Adds more logging information during MySQL snapshot recreation 2016-08-03 16:54:17 +03:00
Randall Hauch
b9e9f0fdf9 Corrected build to look for updated output of alt-mysql container
The mysql:5.7 docker image changed its output to be more like mysql/mysql-server:5.7, and this broke our build because of what our build is looking for while waiting to for the server to completely intialize. Simply changing the pattern corrects the problem.
2016-08-03 08:24:13 -05:00
Randall Hauch
6894e9c30d Merge pull request #78 from hchiorean/mysql-tests-fix
Fixes some more tests around date handling in the MySQL connector
2016-08-02 20:12:04 -05:00
Randall Hauch
8cb39eacf0 Reverted back to 0.3.0-SNAPSHOT, since the 0.3 candidate release was not acceptable. 2016-08-01 12:25:58 -05:00
Horia Chiorean
eaf295fbf0 Fixes some more tests around date handling in the MySQL connector 2016-07-29 08:57:47 +03:00
Randall Hauch
517272278d [maven-release-plugin] prepare for next development iteration 2016-07-25 17:50:31 -05:00
Randall Hauch
b89296e646 [maven-release-plugin] prepare release v0.3.0 2016-07-25 17:50:31 -05:00
Randall Hauch
e3a00e1992 DBZ-87 Added support for SIGNED in all numeric types in MySQL 2016-07-25 16:07:56 -05:00
Randall Hauch
447acb797d DBZ-62 Upgraded to Kafka and Kafka Connect 0.10.0.0
Upgraded from Kafka 0.9.0.1 to Kafka 0.10.0. The only required change was to override the `Connector.config()` method, which returns `null` or a `ConfigDef` instance that contains detailed metadata for each of the configuration fields, including supporting recommended values and marking fields as not visible (e.g., if they don't make sense given other configuration field values). This can be used by user interfaces to data-drive the configuration of a connector. Also, the default validation logic of the Connector implementations uses a `Validator` that is pretty restrictive in its functionality.

Debezium already had a fairly decent and simple `Configuration` framework. After several attempts to try and merge these concepts, reconciling the two validation mechanisms was very complicated and involved a lot of changes. It was easier to simply continue Debezium-specific validation and to override the `Connector.validate(...)` method to use Debezium's `Configuration`-based validation. Connector-based validation logic includes determining recommended values, so Debezium's `Field` class (used to define each configuration property) was enhanced with a new `Recommender` class that is similar to Kafka's.

Additional integration tests were added to verify that the `ConfigDef` result is acceptable and that the new connector validation logic works as expected, including getting recommended values for some fields (e.g., database names, table/collection names) from MySQL and MongoDB by connecting and dynamically reading the values. This was done in a way that remains backward compatible with the regular expression formats of these fields, but in a user interface that uses the `ConfigDef` mechanism the user can simply select the databases and table/collection identifiers.
2016-07-25 14:21:31 -05:00
Randall Hauch
30777e3345 DBZ-85 Added test case and made correction to temporal values
Added an integration test case to diagnose the loss of the fractional seconds from MySQL temporal values. The problem appears to be a bug in the MySQL Binary Log Connector library that we used, and this bug was reported as https://github.com/shyiko/mysql-binlog-connector-java/issues/103. That was fixed in version 0.3.2 of the library, which Stanley was kind enough to release for us.

During testing, though, several issues were discovered in how temporal values are handled and converted from the MySQL events, through the MySQL Binary Log client library, and through the Debezium MySQL connector to conform with Kafka Connect's various temporal logical schema types. Most of the issues involved converting most of the temporal values from local time zone (which is how they are created by the MySQL Binary Log client) into UTC (which is how Kafka Connect expects them). Really, java.util.Date doesn't have time zone information and instead tracks the number of milliseconds past epoch, but the conversion of normal timestamp information to the milliseconds past epoch in UTC depends on the time zone in which that conversion happens.
2016-07-20 17:07:56 -05:00
Randall Hauch
a5f4d0bf31 DBZ-87 Changed mapping of MySQL TINYINT and SMALLINT columns from INT32 to INT16
The MySQL connector now maps TINYINT and SMALLINT columns to INT16 (rather than INT32) because INT16 is smaller and yet still large enough for all TINYINT and SMALLINT values. Note that the range of TINYINT values is either -128 to 127 for signed or 0 to 255 for unsigned, and thus INT8 is not an acceptable choice since it can only handle values in the range 0 to 255. Additionally, the JDBC Specification also suggests the proper Java type for SQL-99's TINYINT is short, which maps to Kafka Connect's INT16.

This change will be backward compatible, although the generated Kafka Connect schema will be different than in previous versions. This shouldn't cause a problem, since clients should expect to handle schema changes, and this schema change does comply with Avro schema evolution rules.
2016-07-19 11:11:05 -05:00
Randall Hauch
04eef2da5c DBZ-84 Tried to replicate error with MySQL TINYINT columns
Tried unsuccessfully to replicate the problem reported in DBZ-84 with a new regression integration test.
2016-07-19 10:58:28 -05:00
Randall Hauch
ed1b494fdf DBZ-86 Cleaned up logging and error message in MySQL connector 2016-07-15 16:37:47 -05:00
Randall Hauch
a88bcb9ae7 DBZ-86 Generated Kafka Schema names will now also be valid Avro fullnames 2016-07-15 16:29:52 -05:00
Randall Hauch
85c9d4e5fe Merge pull request #35 from rhauch/dbz-2
DBZ-2 Created Maven module with a MongoDB connector
2016-07-14 14:11:35 -05:00
Randall Hauch
12e7cfb8d3 DBZ-2 Created initial Maven module with a MongoDB connector
Added a new `debezium-connector-mongodb` module that defines a MongoDB connector. The MongoDB connector can capture and record the changes within a MongoDB replica set, or when seeded with addresses of the configuration server of a MongoDB sharded cluster, the connector captures the changes from the each replica set used as a shard. In the latter case, the connector even discovers the addition of or removal of shards.

The connector monitors each replica set using multiple tasks and, if needed, separate threads within each task. When a replica set is being monitored for the first time, the connector will perform an "initial sync" of that replica set's databases and collections. Once the initial sync has completed, the connector will then begin tailing the oplog of the replica set, starting at the exact point in time at which it started the initial sync. This equivalent to how MongoDB replication works.

The connector always uses the replica set's primary node to tail the oplog. If the replica set undergoes an election and different node becomes primary, the connector will immediately stop tailing the oplog, connect to the new primary, and start tailing the oplog using the new primary node. Likewise, if connector experiences any problems communicating with the replica set members, it will try to reconnect (using exponential backoff so as to not overwhelm the replica set) and continue tailing the oplog from where it last left off. In this way the connector is able to dynamically adjust to changes in replica set membership and to automatically handle communication failures.

The MongoDB oplog contains limited information, and in particular the events describing updates and deletes do not actually have the before or after state of the documents. Instead, the oplog events are all idempotent, so updates contain the effective changes that were made during an update, and deletes merely contain the deleted document identifier. Consequently, the connector is limited in the information it includes in its output events. Create and read events do contain the initial state, but the update contain only the changes (rather than the before and/or after states of the document) and delete events do not have the before state of the deleted document. All connector events, however, do contain the local system timestamp at which the event was processed and _source_ information detailing the origins of the event, including the replica set name, the MongoDB transaction timestamp of the event, and the transactions identifier among other things.

It is possible for MongoDB to lose commits in specific failure situations. For exmaple, if the primary applies a change and records it in its oplog before it then crashes unexpectedly, the secondary nodes may not have had a chance to read those changes from the primary's oplog before the primary crashed. If one such secondary is then elected as primary, it's oplog is missing the last changes that the old primary had recorded and no longer has those changes. In these cases where MongoDB loses changes recorded in a primary's oplog, it is possible that the MongoDB connector may or may not capture these lost changes.
2016-07-14 13:02:36 -05:00
Randall Hauch
cc68a1beb7 DBZ-83 Correctly handle MySQL REFERENCES clause 2016-06-27 13:02:57 -05:00
Randall Hauch
f0d67143bd DBZ-82 Changed snapshot query to support pre-5.6.5 versions of MySQL 2016-06-27 09:23:12 -05:00
Randall Hauch
1c7aabf14f Changed MySQL file comment format to use standard prefix 2016-06-22 18:19:50 -05:00
Randall Hauch
a589d9ea84 DBZ-79 Changed public methods in GtidSet to reflect the MySQL Binary Log Connector's class
Removed several of the `GtidSet` convenience methods that are not in the [improved](https://github.com/shyiko/mysql-binlog-connector-java/pull/100) `com.github.shyiko.mysql.binlog.GtidSet` class. Getting these out of our API will make it easier to reuse the improved `com.github.shyiko.mysql.binlog.GtidSet` class.
2016-06-16 10:04:02 -05:00
Randall Hauch
d9cca5d254 DBZ-77 Corrected completion of offset snapshot mode
The snapshot mode within the offsets now are marked as complete with the last source record produced during the snapshot. This is the only sure way to update the offset.

Note that the `source` field shows the snapshot is in effect for _all_ records produced during the snapshot, including the very last one. This distinction w/r/t the offset was made possible due to recent changes for DBZ-73.

Previously, when the snapshot reader completed all generation of records, it then attempted to record an empty DDL statement. However, since this statement had no net effect on the schemas, no source record was produced and thus the offset's snapshot mode was never changed. Consequently, if the connector were stopped immediately after the snapshot completed but before other events could be read or produced, upon restart the connector would perform another snapshot.
2016-06-15 12:01:16 -05:00
Randall Hauch
ed27faa5f6 DBZ-73 Added unit tests to verify behavior of SourceInfo 2016-06-15 11:51:42 -05:00
Randall Hauch
49322dc9c1 DBZ-73, DBZ-76 Corrected how binlog coordinates are recorded and put into change events
Fixes two issues with how the binlog coordinates are handled.

The first, DBZ-73, fixes how the offsets are recording the _next_ binlog coordinates within the offsets, which is fine for single-row events but which can result in dropped events should Kafka Connect flush the offset of some but not all of the rows before the Kafka Connect crashes. Upon restart, the offset contains the binlog coordinates for the _next_ event, so any of the last rows from the previous events will be lost.

With this fix, the offset used with all but the last row (in the binlog event) has the binlog coordinates of the current event, with the event row number set to be the next row that needs to be processed. The offset for the last row will have the binlog coordinates of the next event.

The second issue, DBZ-76, is somewhat related: the `source` field of the change events has the binlog coordinates of the _next_ issue. The fix involves putting the binlog coordinates for the _current_ event into the `source` field.

Both of these issues are related and influenced a fix that could address both problems. Essentially, the `SourceInfo` is now recording the previous and next position, and the next and previous row numbers. The offset is created with parameters that specify the row number and the total number of rows, so this method correctly adjusts the binlog coordinates of the offset. The `struct` field produces the value for the `source` field, and it is always using the previous position and previous row number that reflect the change event in which it is used.
2016-06-14 17:43:58 -05:00
Randall Hauch
270150bcad DBZ-72 Corrected the naming of the Schemas for the keys and values 2016-06-09 21:30:29 -05:00
Randall Hauch
0f3ed9f50f DBZ-71 Corrected MySQL connector plugin archives and upgraded MySQL JDBC driver from 5.1.38 to 5.1.39 (the latest) 2016-06-09 21:15:34 -05:00
Randall Hauch
6749518f66 [maven-release-plugin] prepare for next development iteration 2016-06-08 13:00:50 -05:00
Randall Hauch
d5bbb116ed [maven-release-plugin] prepare release v0.2.0 2016-06-08 13:00:50 -05:00
Randall Hauch
ff49ba1742 DBZ-37 Renamed MySQL Docker images used in integration tests 2016-06-08 11:45:35 -05:00
Randall Hauch
d63a2e17a0 DBZ-37 Added documentation of various profiles to the MySQL module's README 2016-06-08 11:19:03 -05:00
Randall Hauch
3c7882ee9d DBZ-37 Run integration tests against MySQL and MySQL w/ GTIDs
Changed the build so that the `assembly` profile runs the MySQL integration tests three times, once against each of the three MySQL configurations:

# MySQL server w/o GTIDs
# MySQL server w/ GTIDs
# The Docker team's MySQL server image w/o GTIDs

The normal profiles are still available:

# The default profile runs the integration tests once against MySQL server w/o GTIDs
# `gtid-mysql` runs the integration tests against MySQL server w/ GTIDs
# `alt-mysql` runs the integration tests against the Docker team's MySQL server image w/o GTIDs
# `skip-integration-tests` (or `-DskipITs`) skips the integration tests altogether
2016-06-08 11:03:03 -05:00
Randall Hauch
cf26a5c4e0 Removed duplicate versions in POMs 2016-06-08 09:46:05 -05:00
Randall Hauch
a143871abd DBZ-61 Improved MySQL connector's handling of binary values
Binary values read from the MySQL binlog may include strings, in which case they need to be converted to binary values.

Interestingly, work on this uncovered [KAFKA-3803](https://issues.apache.org/jira/browse/KAFKA-3803) whereby Kafka Connect's `Struct.equals` method does not properly handle comparing `byte[]` values. Upon researching the problem and potentially supplying a patch, it was discovered that the Kafka Connect codebase and the Avro converter all use `ByteBuffer` objects rather than `byte[]`. Consequently, the Debezium code that converts JDBC values to Kafka Connect values was changed to return `ByteBuffer` objects rather than `byte[]` objects.

Unfortunately, the JSON converter rehydrates objects with just `byte[]`, so that still means that Debezium's `VerifyRecords` logic cannot rely upon `Struct.equals` for comparison, and instead needs custom logic.
2016-06-07 17:53:07 -05:00
Randall Hauch
f48d48e114 DBZ-37 Added integration test with MySQL GTIDs
Added a Maven profile to the MySQL connector component with a Docker image that runs MySQL with GTIDs enabled. The same integration tests can be run with it using `-Pgtid-mysql` or `-Dgtid-mysql` in the Maven build.

When the MySQL connector starts up, it now queries the MySQL server to detect whether GTIDs are enabled, and if they are it will also verify that any GTID sets from the most recently recorded offset are still available in the MySQL server (similarly to how it was already doing this for binlog filenames). If the server does not have the correct coordinates/GTIDs, the connector fails with a useful error message.

This commit also tests and adjusts the `GtidSet` class to better deal with comparisons of GTID sets for proper ordering.

It also changes the connector to output MySQL's timestamp for each event using _second_ precision rather than artificially in _millisecond_ precision. To clarify the different, this change renames the field in the event's `source` structure that records the MySQL timestamp from `ts` to `ts_sec`. Similarly, the envelope's field that records the time that the connector processed each record was renamed from `ts` to `ts_ms`.

All unit and integration tests pass with the default profile and with the new GTID-enabled profile.
2016-06-07 12:01:51 -05:00
Randall Hauch
a276d983f5 DBZ-37 Changed several constants related to MySQL offsets. This does not affect the offsets themselves. 2016-06-04 16:32:26 -05:00
Randall Hauch
e91aac5b18 DBZ-37 DatabaseHistory can now use custom logic to compare offsets
DatabaseHistory stores the DDL changes with the offset describing the position in the source where those DDL statements were found. When a connector restarts at a specific offset (supplied by Kafka Connect), connectors such as the MySQL connector reconstruct the database schemas by having DatabaseHistory load the history starting from the beginning and stopping at (or just before) the connector's starting offset. This change allows connectors to supply a custom comparison function.

To support GTIDs, the MySQL connector needed to store additional information in the offsets. This means the logic needed to compare offsets with and without GTIDs is non-trivial and unique to the MySQL connector. This commit adds a custom comparison function for offsets.

Per [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-failover.html), slaves are always expected to start with the same set of GTIDs as the master, so no matter which the MySQL connector follows it should always have the complete set of GTIDs seen by that server. Therefore:

* Two offsets with GTID sets can be compared using only the GTID sets.
* Any offset with a GTID set is always assumed to be newer than an offset without, since it is assumed once GTIDs are enabled they will remain enabled. (Otherwise, the connector likely needs to be restarted with a snapshot and tied to a specific master or slave with no failover.)
* Two offsets without GTIDs are compared using the binlog coordinates (filename, position, and row number).
* An offsets that is identical to another except for being in snapshot mode is considered earlier than without the snapshot. This is because snapshot mode begins by recording the position of the snapshot, and once complete the offset is recorded without the snapshot flag.
2016-06-04 16:20:26 -05:00
Randall Hauch
8fd89dacbf DBZ-37 Corrected JavaDoc 2016-06-02 19:06:13 -05:00
Randall Hauch
655aac7d4f DBZ-37 Added support for MySQL GTIDs
The BinlogClient library our MySQL connector uses already has support for GTIDs. This change makes use of that and adds the GTIDs from the server to the offsets created by the connector and used upon restarts.
2016-06-02 18:30:26 -05:00
Randall Hauch
264a9041df DBZ-64 Added Avro Converter to record verification utilities
The `VerifyRecord` utility class has methods that will verify a `SourceRecord`, and is used in many of our integration tests to check whether records are constructed in a valid manner. The utility already checks whether the records can be serialized and deserialized using the JSON converter (provided with Kafka Connect); this change also checks with the Avro Converter (which produces much smaller records and is more suitable for production).

Note that version 3.0.0 of the Confluent Avro Converter is required; version 2.1.0-alpha1 could not properly handle complex Schema objects with optional fields (see https://github.com/confluentinc/schema-registry/pull/280).

Also, the names of the Kafka Connect schemas used in MySQL source records has changed.

# The record's envelope Schema used to be "<serverName>.<database>.<table>" but is now "<serverName>.<database>.<table>.Envelope".
# The Schema for record keys used to be named "<database>.<table>/pk", but the '/' character is not valid within a Avro name, and has been changed to "<serverName>.<database>.<table>.Key".
# The Schema for record values used to be named "<database>.<table>", but to better fit with the other Schema names it has been changed to "<serverName>.<database>.<table>.Value".

Thus, all of the Schemas for a single database table have the same Avro namespace "<serverName>.<database>.<table>" (or "<topicName>") with Avro schema names of "Envelope", "Key", and "Value".

All unit and integration tests pass.
2016-06-02 16:54:21 -05:00
Randall Hauch
46c0ce9882 DBZ-58 Added MDC logging contexts to connector
Changed the MySQL connector to make use of MDC logging contexts, which allow thread-specific parameters that can be written out on every log line by simply changing the logging configuration (e.g., Log4J configuration file).

We adopt a convention for all Debezium connectors with the following MDC properties:

* `dbz.connectorType` - the type of connector, which would be a single well-known value for each connector (e.g., "MySQL" for the MySQL connector)
* `dbz.connectorName` - the name of the connector, which for the MySQL connector is simply the value of the `server.name` property (e.g., the logical name for the MySQL server/cluster). Unfortunately, Kafka Connect does not give us its name for the connector.
* `dbz.connectorContext` - the name of the thread, which is "main" for thread running the connector; the MySQL connector uses "snapshot" for the thread started by the snapshot reader, and "binlog" for the thread started by the binlog reader.

Different logging frameworks have their own way of using MDC properties. In a Log4J configuration, for example, simply use `%X{name}` in the logger's layout, where "name" is one of the properties listed above (or another MDC property).
2016-06-02 14:05:06 -05:00
Randall Hauch
aca863c225 DBZ-31 Write MySQL schema changes to topic by default 2016-06-02 11:00:04 -05:00
Randall Hauch
58a5d8c033 DBZ-31 Added support for possibly performing snapshot upon startup
Refactored the MySQL connector to break out the logic of reading the binlog into a separate class, added a similar class to read a full snapshot, and then updated the MySQL connector task class to use both. Added several test cases and updated the existing tests.
2016-06-01 21:40:53 -05:00
Randall Hauch
e6c0ff5e4d DBZ-31 Refactored the MySQL Connector
Several of the MySQL connector classes were fairly large and complicated, and to prepare for upcoming changes/enhancements these larger classes were refactored to pull out units of functionality. Currently all unit tests pass with these changes, with additional unit tests for these new components.
2016-05-26 15:58:58 -05:00
David Chen
339f03859c DBZ-63 Fix POM dependency management.
Thanks for the reminding from https://issues.jboss.org/browse/DBZ-63\?focusedCommentId\=13242595\&page\=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel\#comment-13242595
2016-05-25 15:21:45 +01:00
David Chen
b1a71318df DBZ-63 Rename "server-id" to "server_id" to fix org.apache.avro.SchemaParseException: Illegal character in: server-id 2016-05-25 14:33:20 +01:00
Randall Hauch
dc5a379764 DBZ-55 Corrected filtering of DDL statements based upon affected database
Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. However, that database name is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database _affected_ by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database).

With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements is still maintained, but since each DDL statement can apply to a separate database the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting "db1" will get batched together into one schema change record, followed by one statement affecting "db2" as a second schema change record, followed by another statement affecting "db1" as a third schema record.

Meanwhile, this change does not affect how the database history records the changes: it still records them as submitted using a single record for each separate binlog event/position. This is much safer as each binlog event (with specific position) is written atomically to the history stream. Also, since the database history stream is what the connector uses upon recovery, the database history records are now written _after_ any schema change records to ensure that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).
2016-05-23 11:01:27 -05:00
Randall Hauch
bb40875b2b DBZ-45 Confirmed and tested support for 'before' and 'after' states in UPDATE events
Added integration test logic to verify that UPDATE events include both 'before' and 'after' states (previously added as part of DBZ-52), to verify that altering a table does not generate events for the rows in that table, and that the 'before' and 'after' states (read from the binlog) are always defined in terms of the _current_ table schema. IOW, no special logic is needed to handle a 'before' state that has different columns than defined in the current table's definition.
2016-05-20 12:06:06 -05:00
Randall Hauch
b6ca57c6c1 DBZ-45 Added proper support for FIRST and AFTER clauses in an ALTER TABLE column definition
Added support for properly handling an ALTER TABLE statement that adds columns AFTER another existing column.
2016-05-20 12:05:53 -05:00
Randall Hauch
47a93b3ae1 DBZ-60 Added MySQL server ID and timestamp to event's source info
Added to the Debezium event message's `source` information the MySQL server ID for the cluster process that recorded the event and the MySQL timestamp at which the event was recorded.
2016-05-20 09:32:35 -05:00
Randall Hauch
b3167cd264 DBZ-57 Added unit test to confirm that DDL parser supports multi-column primary keys 2016-05-20 08:33:39 -05:00
Randall Hauch
c20b49a8fc DBZ-57 Added support for the shortened CHARSET alias for CHARACTER SET in MySQL DDL statements
Added explicit support for handling `CHARSET` as an alias for `CHARACTER SET` in both tables and columns.  `CREATE DATABASE` and `ALTER DATABASE` statements can also specify character sets, but the DDL parser handles but does not explicitly parse them so no modification is needed for them. Several unit tests were added to confirm the behavior.
2016-05-20 08:23:50 -05:00
Randall Hauch
e06f5c596c DBZ-43 Added explicit checking and validation of Schemas and Structs in integration tests 2016-05-19 17:06:22 -05:00
Randall Hauch
07315f2b4b DBZ-43 Changed form of schema change topic to use schemas 2016-05-19 16:54:22 -05:00
Randall Hauch
c0b7114424 DBZ-52 Added top-level container structure to all messages
The new envelope Struct contains fields for the local time at which the connector processed the event, the kind of operation (e.g., read, insert, update, or delete), the state of the record before and after the change, and the information about the event source. The latter two items are connector-specific. The timestamp is merely the time using the connector's process clock, and no guarantees are provided about accuracy, monotonicity, or relationship to the original source event.

The envelope structure is now used as the value for each event message in the MySQL connector; they keys of the event messages remain unchanged. Note that to facilitate Kafka log compaction (which requires a null value), a delete event containing the envelope with details about the deletion is followed by a "tombstone" event that contains the same key but null value.

An example of a message value with this new envelope is as follows:

{
    "schema" : {
      "type" : "struct",
      "fields" : [ {
        "type" : "struct",
        "fields" : [ {
          "type" : "int32",
          "optional" : false,
          "name" : "org.apache.kafka.connect.data.Date",
          "version" : 1,
          "field" : "order_date"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "purchaser"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "quantity"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "product_id"
        } ],
        "optional" : true,
        "name" : "connector_test.orders",
        "field" : "before"
      }, {
        "type" : "struct",
        "fields" : [ {
          "type" : "int32",
          "optional" : false,
          "name" : "org.apache.kafka.connect.data.Date",
          "version" : 1,
          "field" : "order_date"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "purchaser"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "quantity"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "product_id"
        } ],
        "optional" : true,
        "name" : "connector_test.orders",
        "field" : "after"
      }, {
        "type" : "struct",
        "fields" : [ {
          "type" : "string",
          "optional" : false,
          "field" : "server"
        }, {
          "type" : "string",
          "optional" : false,
          "field" : "file"
        }, {
          "type" : "int64",
          "optional" : false,
          "field" : "pos"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "row"
        } ],
        "optional" : false,
        "name" : "io.debezium.connector.mysql.Source",
        "field" : "source"
      }, {
        "type" : "string",
        "optional" : false,
        "field" : "op"
      }, {
        "type" : "int64",
        "optional" : true,
        "field" : "ts"
      } ],
      "optional" : false,
      "name" : "kafka-connect-2.connector_test.orders",
      "version" : 1
    },
    "payload" : {
      "before" : null,
      "after" : {
        "order_date" : 16852,
        "purchaser" : 1003,
        "quantity" : 1,
        "product_id" : 107
      },
      "source" : {
        "server" : "kafka-connect-2",
        "file" : "mysql-bin.000002",
        "pos" : 2887680,
        "row" : 4
      },
      "op" : "c",
      "ts" : 1463437199134
    }
}

Notice how the Schema is significantly larger, since it must describe all of the envelope's fields even when those fields are not used. In this case, the event signifies that a record was created as the 4th record of a single event recorded in the binlog.
2016-05-19 12:40:16 -05:00
Randall Hauch
e6710a5300 DBZ-44 Generate a tombstone for old key when row's key is change
When a row is updated in the database and the primary/unique key for that table is changed, the MySQL connector continues to generate an update event with the new key and new value, but now also generates a tombstone event for the old key. This ensures that when a Kafka topic is compacted, all prior events with the old key will (eventually) be removed. It also ensures that consumers see that the row represented by the old key has been removed.
2016-05-13 17:43:29 -05:00
Randall Hauch
97d5caa2db DBZ-49 MySQL DDL parser is more tolerant of REFERENCE clauses in CREATE TABLE statements
MySQL 5.6 using the MyISAM engine will create the `help_relation` system table using a CREATE TABLE statement that does not have in the columns' REFERENCE clause a list of columns in the referenced table. MySQL 5.7 using the InnoDB engine does not include the REFERENCE clauses.

Because Debezium's MySQL DDL parser is meant only to understand the statements recorded in the binlog, it does not have to validate the statements and therefore the DDL parser can be a bit more lenient by not requiring the list of columns in a REFERENCE clause in a CREATE TABLE statement's column definitions.

This commit also adds several unit tests that validate all of the DDL statements used by MySQL 5.6 and 5.7 during startup (in the configurations used in our integration tests).
2016-05-13 09:32:47 -05:00
Randall Hauch
b1e6eb1028 DBZ-29 Refactored ColumnMappers and enabled ColumnMapper impls to add parameters to the Kafka Connect Schema. 2016-05-12 12:26:04 -05:00
Randall Hauch
18995abfbd Merge pull request #38 from rhauch/dbz-29
DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns
2016-05-12 08:27:15 -05:00
Randall Hauch
ff9d0fc240 DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns
Changed the MySQL connector to use comma-separated lists of regular expressions for the database
and table whitelist/blacklists. Literals are still accepted and will match fully-qualified table names,
although the '.' character used as a delimiter is also a special character in regular expressions and
therefore may need to be escaped with a double backslash ('\\') to more carefully match fully-qualified
table names.

Added several new configuration properties for the MySQL connector that instruct it to hide,
truncate, and/or mask certain columns. The properties' values are all lists of regular expressions
or literal fully-qualified column names. For example, the following configuration property:

    column.blacklist=server.users.picture,server.users.other

will cause the connector to leave out of change event messages for the `server.users` table those
fields that correspond to the `picture` and `others` columns. This capability can be used to
This capability can be used to prevent dissemination of sensitive information in the change event
stream.

An alternative to blacklisting is masking. The following configuration property:

    column.mask.with.10.chars=server\\.users\\.(\\w*email)

will cause the connector to mask in the change event messages for the `server.users` table
all values for columns whose name ends in `email`. The values will be replaced in this case with
a constant string of 10 asterisk ('*') characters, even when the email value is null.
This capability can also be used to prevent dissemination of sensitive information in the change event
stream.

Another option is to truncate string values for specific columns. The following configuration
property:

    column.truncate.to.120.chars=server[.]users[.](description|biography)

will cause the connector to truncate to at most 120 characters the values of the `description` and
`biography` columns in the change event messages for the `server.users` table. Although this example
used a limit of 120 characters, any positive length can be specified; separate properties should
be used when different lengths are required. Note how the '.' delimiter in the fully-qualified names
is escaped since that same character is a special character in regular expressions. This capability
can be used to reduce the size of change event messages.
2016-05-11 15:57:06 -05:00
Christian Posta
8b736ef654 DBZ-48 Cannot parse COMMIT and flush statements 2016-05-05 15:36:24 -07:00
Christian Posta
ab2cdce279 DBZ-42 inherit from mysql images and add the custom config and startup scripts useful for integration testing 2016-04-26 08:49:27 -07:00
Randall Hauch
1fcb4b02cf DBZ-38 Changed DROP VIEW and TABLE to include single-table statements in events
Drop table/view statements that involve more than one table generate one event for each table/view. Previously, each of those statements had the original multi-table/view statement. Now, each event has a statement that applies to only that table (generated from the original with all the same clauses).
2016-04-12 18:18:13 -05:00
Randall Hauch
b1e428c986 DBZ-38 Adjusted how events are generated for RENAME TO statements
The previous change did not correctly capture the statements for a `RENAME TO` that renamed multiple tables, so fixed the code so that it generates a single `RENAME TO` for each table rename.
2016-04-12 17:58:07 -05:00
David Chen
eeff81b65d MySqlDdlParser should support "RENAME TABLE blue_table TO red_table, orange_table TO green_table, black_table TO white_table;" form. (#1) 2016-04-12 17:40:00 -05:00
Randall Hauch
5b30568650 DBZ-38 Changed the listening framework of the DDL parser
Refactored the mechanism by which components can listen to the activities of a DDL parser. The new approach
should be significantly more flexible for additional types of DDL events while making it easier to maintain
backward compatibility. It also will enable passing event-specific information on each DDL event.
2016-04-12 11:00:02 -05:00
Randall Hauch
137b9f6d4d DBZ-38 Changed the DDL parser framework to notify listeners as statements are applied. 2016-04-11 15:16:04 -05:00
Randall Hauch
8f5487b2c0 [maven-release-plugin] prepare for next development iteration 2016-03-17 16:28:40 -05:00
Randall Hauch
c2b8ac50ae [maven-release-plugin] prepare release v0.1.0 2016-03-17 16:28:40 -05:00
Randall Hauch
43f79aad5e Added missing version element to modules 2016-03-17 16:14:17 -05:00
Randall Hauch
b5945a24ec DBZ-32 Corrected assembly dependencies 2016-03-17 15:58:27 -05:00
Randall Hauch
0da37c8aee DBZ-15 Fixed a problem when handling deleted rows, since the generated record should not have a value schema when the value is null. 2016-03-17 12:34:53 -05:00
Randall Hauch
5a002dbf62 DBZ-15 Cached converters are now dropped upon log rotation. 2016-03-17 11:03:28 -05:00
Randall Hauch
91d200df51 DBZ-15 Removed some of the unnecessary JARs from the MySQL connector plugin kit 2016-03-17 11:03:27 -05:00
Randall Hauch
4998325de7 DBZ-30 Changed the MySQL connector to include all columns in the record value 2016-03-04 10:51:14 -06:00
Randall Hauch
235fa12ead DBZ-28 Fix formatting 2016-03-04 09:52:02 -06:00
Randall Hauch
2d99cb264c DBZ-28 Prevent the MySQL connector from sending a record with a null key and null value
There is no point in sending a record that contains a null key and null value. While this may not be likely for insert or update cases (since at least the value should not be null), it is possible when a row is deleted (meaning the record value will be null) but the table has no primary/unique key (meaning the record key will be null).
2016-03-04 09:51:32 -06:00
Randall Hauch
64d0e0b458 DBZ-28 Corrected MySQL connector's behavior for representing deletes
Corrects a bug where a deleted row was written to Kafka in the same as an insert, making them indistinguishable. Now, a deleted row is written with the row's primary/unique key as the record key, and a null record value. Note that if the row has no primary/unique key, no record is written to Kafka.
2016-03-04 09:48:52 -06:00
Randall Hauch
60d3307597 DBZ-26 Corrected how table info is recovered from the database history. 2016-03-03 15:27:39 -06:00
Randall Hauch
9034e26d1e DBZ-26 Corrected the embedded connector framework to enable stopping. Also improved logging statements. 2016-03-03 15:27:11 -06:00
Randall Hauch
5ba6da702d DBZ-26 Corrected the MySQL connector to not fail when it comes across an unknown table
Depending upon how the source MySQL database is configured, its binlog might not contain all the history for the database. In particular, it might not have the CREATE TABLE statements for some tables, which are then "unknown" to the connector. When the connector reads the binlog and comes across a change event for a row in one of those "unknown" tables, it previously resulted in a NPE. With this change, the condition results in a warning message in the log, and all subsequent change events on that table will be skipped.
2016-03-03 12:10:11 -06:00
Randall Hauch
5a32da3fac DBZ-26 Corrected how SourceInfo recovers persistent representations to better handle various converter capabilities 2016-03-03 12:06:33 -06:00
Chris Riccomini
b4d51c8946 DBZ-25 Properly set build version instead of build properties file path for MySQL Module 2016-03-02 16:41:57 -08:00
Randall Hauch
42e531dbe9 DBZ-23 Simplified MySQL Connector's use of Docker plugin 2016-02-25 10:24:39 -06:00
Randall Hauch
7d4a996406 DBZ-23 Docker image created by the module no longer is tagged 2016-02-25 09:43:11 -06:00
Randall Hauch
50e28d72a6 DBZ-17 Added plugin distribution ZIP that can be used for other Kafka Connector plugin modules 2016-02-23 13:23:36 -06:00
Randall Hauch
1d46e59048 DBZ-17 Minor changes to the POMs 2016-02-18 13:58:29 -06:00
Randall Hauch
0102f620a9 DBZ-13 Changed Maven build to attach JavaDoc JARs to each module
Modified the 'docs' profile to build and attach JavaDoc JARs for each module's source and test source artifacts. The profile will be automatically used when releasing.
2016-02-17 11:14:50 -06:00
Randall Hauch
dab0440612 DBZ-14 Corrected the 'alt-mysql' Maven profile so that it can be used with any of the other Maven commands. 2016-02-16 16:37:30 -06:00
Christian Posta
c730685a01 add option to run without integration tests 2016-02-15 16:26:32 -07:00
Randall Hauch
73f3c9836b DBZ-1 Completed integration testing and debugging of the MySQL connector 2016-02-15 14:46:12 -06:00
Randall Hauch
1a59f9b07c DBZ-11 Build can skip long-running unit and integration tests 2016-02-04 15:35:27 -06:00
Randall Hauch
c501f8486f DBZ-9 Added MySQL whitelist and blacklists on tables and databases. 2016-02-04 07:56:13 -06:00
Randall Hauch
37d6a5e7da DBZ-1 Expanded documentation and improved EmbeddedConnector framework
Changed the EmbeddedConnector framework to initialize all major components via configuration properties rather than through the public builder. This increases the size of the configurations, but it simplifies what embedding applications must do to obtain an EmbeddedConnector instance.

The DatabaseHistory framework was also changed to be configurable in similar ways to the OffsetBackingStore. Essentially, connectors that want to use it (like the MySqlConnector) will describe it as part of the connector's configuration, allowing more flexibility in which DatabaseHistory implementation is used and how it is configured whether in Kafka Connector or as part of the EmbeddedConnector.

Added a README.md to `debezium-embedded` to provide documentation and sample code showing how to use the EmbeddedConnector.
2016-02-03 14:11:53 -06:00
Randall Hauch
0e58dba9d6 DBZ-1 Renamed the connector modules and packages 2016-02-02 16:58:48 -06:00