tet123

Author	SHA1	Message	Date
Gunnar Morling	b99bdf7fdc	DBZ-543 Removing a few unused methods	2018-01-15 14:37:46 +01:00
Gunnar Morling	5c88431c07	DBZ-494 Making tests more lenient towards specific List implementations; also fixing a few typos.	2018-01-15 10:40:50 +01:00
Tom Bentley	5b839d3665	DBZ-540 Adding KafkaCluster#zkPort() If you start a cluster (e.g. in a test) without specifying a port you get a random port. Sometimes you might want to connect to the embedded zookeeper instance (for instance, to make an assertion about a znode). To do this you need to know the port number. So let's expose it.	2018-01-12 18:25:41 +01:00
Jiri Pechanec	0b269a6e41	DBZ-538 Improve invalid DDL statement reporting	2018-01-12 12:26:30 +01:00
Jiri Pechanec	2a7377a833	DBZ-455 Use valueOf instead of constructors	2018-01-05 02:32:37 +01:00
Jenkins user	6bb34b42f9	[maven-release-plugin] prepare for next development iteration	2017-12-20 07:15:12 +00:00
Jenkins user	16dcd4c980	[maven-release-plugin] prepare release v0.7.1	2017-12-20 07:15:12 +00:00
Jenkins user	5e09932cb9	[maven-release-plugin] prepare for next development iteration	2017-12-15 05:10:23 +00:00
Jenkins user	6c1d61e03b	[maven-release-plugin] prepare release v0.7.0	2017-12-15 05:10:23 +00:00
Gunnar Morling	4e8cedd094	DBZ-379 Postgres connector minimizes use of JDBC metadata	2017-12-13 12:20:37 +01:00
Jiri Pechanec	5ae236241b	DBZ-469 Fixed a backslash regex in testcase	2017-12-11 16:58:23 +01:00
Gunnar Morling	8e99dc3abd	DBZ-469 Adding one more test case	2017-12-11 16:58:23 +01:00
Gunnar Morling	1c55e41941	DBZ-469 Renaming listOfRegex() to setOfRegex(); * Simplifying test of that method * Adding test to ensure correctness of default DDL pattern	2017-12-11 16:58:23 +01:00
Jiri Pechanec	86d9e109fc	DBZ-469 Filter our RDS heartbeat INSERT statements	2017-12-11 16:58:23 +01:00
rkerner	c7ac481c43	[DBZ-342] fix broken MySQL data type "TIME" handling	2017-11-29 20:34:12 +01:00
Gunnar Morling	a55227aa83	DBZ-464 Don't stop after reaching max retry count, but raise an exception instead; Also increasing default value, as the connector can't start its work without fully recovering the history	2017-11-28 08:47:27 +01:00
Gunnar Morling	6537d904ce	DBZ-464 Reading until end offset of history topic	2017-11-28 08:47:27 +01:00
Gunnar Morling	bc2d0e5956	DBZ-464 Removing unused method parameters from AbstractDatabaseHistory#recoverRecords()	2017-11-28 08:47:27 +01:00
Jiri Pechanec	20a2cdfdea	DBZ-476 Doubled quotes are parsed as escaped	2017-11-23 14:51:51 +01:00
Jiri Pechanec	57e7f84163	DBZ-479 Forced fsync slows down tests	2017-11-22 14:34:59 +01:00
Gunnar Morling	2b3276be1d	DBZ-478 Correctly handling null value converters; * using simple for loop for the sake of easier debugging * log info about unsupported column type on WARN rather than TRACE level	2017-11-22 09:43:10 +01:00
David Szabo	1c07ff4775	DBZ-466 Remove hardcoding of schema version number, leaving it empty instead	2017-11-16 17:43:14 +01:00
Jiri Pechanec	4d253d2987	DBZ-437 String tokens and SQL words are handled separately in procedure parsing	2017-11-14 09:40:54 +01:00
Jiri Pechanec	5e7714eb00	DBZ-456 Reduce text execution time	2017-11-14 06:47:32 +01:00
Henryk Konsek	be40502cc7	DBZ-323 Cluster and server properties should be added into config, not set as default ones	2017-11-13 16:35:57 +01:00
Gunnar Morling	2744f54e9d	DBZ-438 Renaming BufferedBlockingConsumer#flush() to close() to make clear it's onyl meant to be called once at the end	2017-11-13 14:19:39 +01:00
Gunnar Morling	2b62943e4f	DBZ-438 Avoiding issues due to concurrent usage of BufferedBlockingConsumer	2017-11-13 14:19:39 +01:00
Gunnar Morling	5fbe742be8	DBZ-285 Specifying scope of dependencies in the individual POMs for the sake of comprehensibility	2017-11-10 16:48:32 +01:00
Gunnar Morling	580647b226	DBZ-285 Making more dependencies "provided"	2017-11-10 16:33:02 +01:00
Ewen Cheslack-Postava	8826669b43	DBZ-285: Use provided or test dependencies for Connect and Kafka dependencies	2017-11-04 12:01:24 -07:00
Jiri Pechanec	a6bd883857	DBZ-432 Rebased to Kafka 1.0.0	2017-11-03 11:06:18 +01:00
Jiri Pechanec	e841cbd609	DBZ-429 Fixed signed default value handling	2017-11-03 06:13:05 +01:00
Gunnar Morling	3f410c20e5	DBZ-422 Using existing constants BigDecimal.ONE and ZERO; Also using valueOf() methods to benefit from internal caching for most numeric types	2017-11-02 12:23:06 +01:00
pmx	3c91ad47ec	DBZ-422 Extracting numeric constants and boolean conversions into NumberConversions class	2017-11-02 12:12:54 +01:00
pmx	c3ce7c571a	DBZ-422 Making sure convertSmallInt() always returns a short	2017-11-02 12:12:54 +01:00
Jiri Pechanec	f93e1e9bcd	DBZ-395 Make temp table regex stricter	2017-10-26 11:11:22 +02:00
Jiri Pechanec	1c6e652c71	DBZ-395 Filter out DROP TEMPORARY TABLE statements	2017-10-26 11:11:22 +02:00
Ben Williams	a3b4fedd5f	DBZ-363 Add support for BIGINT UNSIGNED handling for MySQL	2017-10-18 10:20:03 +02:00
Gunnar Morling	73189892b3	DBZ-258 Misc. improvements while reviewing the change; * Removing superfluous config option * Making loggers static * JavaDoc fixes * Extracting hexStringToByteArray() to helper and adding test * Removing superfluous super() invocations	2017-10-18 09:21:22 +02:00
Jiri Pechanec	e47b4cb81c	DBZ-258 Changes after first review	2017-10-18 09:21:22 +02:00
Jiri Pechanec	0bc8129961	DBZ-258 Support for wal2json plugin	2017-10-18 09:21:22 +02:00
Jenkins user	75937711fa	[maven-release-plugin] prepare for next development iteration	2017-09-21 04:42:02 +00:00
Jenkins user	a89b9332e4	[maven-release-plugin] prepare release v0.6.0	2017-09-21 04:42:02 +00:00
Gunnar Morling	7f20aca03d	DBZ-226 Minor wording and formatting improvements	2017-09-20 15:11:13 +02:00
Jiri Pechanec	0afb687303	DBZ-226 SMT to change CDC to simple record	2017-09-20 12:05:12 +02:00
Jiri Pechanec	ba3d7d762b	DBZ-318 Support for a Decimale with variable scale schema	2017-09-19 08:39:07 +02:00
Gunnar Morling	99f39038bb	DBZ-347 Formatting	2017-09-18 17:04:14 +02:00
Jiri Pechanec	61ccefcc5c	DBZ-347 Configurable behaviour for unknown DDL, failing by default	2017-09-18 17:02:21 +02:00
Jiri Pechanec	f7736cd0ab	DBZ-341 Ignore malformed messages coming from database history	2017-09-18 17:02:21 +02:00
Jiri Pechanec	01980cf29b	DBZ-341 Ignore malformed messages coming from database history	2017-09-15 11:52:18 +02:00
Jiri Pechanec	b53b4f67d2	DBZ-319 Views are now ignored in snapshots	2017-09-15 11:37:38 +02:00
Jiri Pechanec	e4bc6670c8	DBZ-305 Rebase build process against Kafka 0.11	2017-08-17 18:51:01 +02:00
Jenkins user	214696ef0c	[maven-release-plugin] prepare for next development iteration	2017-08-17 11:51:05 +00:00
Jenkins user	c867e6fea6	[maven-release-plugin] prepare release v0.5.2	2017-08-17 11:51:05 +00:00
Gunnar Morling	1665026c8a	DBZ-298 Adding support for quoted identifiers for schemas, tables and columns; expanding test	2017-08-17 09:08:17 +02:00
Gunnar Morling	564d90dc09	DBZ-298 Formatting	2017-08-17 09:03:16 +02:00
Gunnar Morling	a4dc7620fe	DBZ-327 Simplifying validation implementation a bit via Strings#isNullOrEmpty()	2017-08-16 14:37:01 +02:00
Mario Mueller	2a2f911e74	DBZ-327 Fix broken SMT validation and add some tests for the configuration combinations	2017-08-16 11:57:48 +02:00
Emrul Islam	bb79b2f60f	DBZ-297 Initial support for some Postgres array types (one-dimensional only)	2017-07-27 10:04:13 +02:00
Gunnar Morling	79fbc028a8	DBZ-311 Precompiling and simplifying some regular expressions	2017-07-25 21:43:26 +02:00
Gunnar Morling	825530256e	DBZ-311 Removing trailing whitespace	2017-07-25 21:43:26 +02:00
Gunnar Morling	a8d1817c22	[maven-release-plugin] prepare for next development iteration	2017-06-09 16:14:31 +00:00
Gunnar Morling	3f512aace7	[maven-release-plugin] prepare release v0.5.1	2017-06-09 16:14:31 +00:00
Gunnar Morling	2a22f3d4b0	DBZ-254 Right-padding values for fixed-length BINARY columns with 0x00 (zero byte) characters for MySQL; Also fixing JDBC types for binary data types for MySQL.	2017-05-30 15:04:38 +02:00
Tom Bentley	96eda35da5	Respect any existing auto.create.topics.enable when creating a server	2017-05-23 16:58:12 +01:00
dleibovic	a683629519	Change ByLogicalTableRouter logging from info to debug This can be a bit spammy in the info level	2017-05-19 12:09:40 -04:00
Randall Hauch	02e655fa53	DBZ-198 Add another DDL parser test case Added an additional test that is unable to reproduce the problem reported on April 4.	2017-05-19 15:50:03 +02:00
Jiri Pechanec	ec915c559c	[DBZ-246] Correctly handle enumeratedvale type	2017-05-09 11:24:28 +02:00
Gunnar Morling	1158833524	DBZ-238 Removing some trailing whitespace	2017-05-05 21:59:06 +02:00
Gunnar Morling	64958d11db	DBZ-238 Moving EnumeratedValue to its own file	2017-05-05 21:55:46 +02:00
Brendan Maguire	928c9cb5f0	DBZ-238: Fix Postgres database.sslmode = "require" bug * Also added EnumeratedValue interface for Configuration so getValue is used instead of toString when getting the config value	2017-05-05 21:55:45 +02:00
Gunnar Morling	1074de4efa	DBZ-222 Some more clean-up	2017-05-04 09:25:25 +02:00
Omar Al-Safi	ea092f1baf	DBZ-222 Add null to Point comment	2017-05-04 08:53:46 +02:00
Gunnar Morling	5630b61be6	DBZ-222 dependency clean-up	2017-05-04 08:53:05 +02:00
Omar Al-Safi	791545c5f4	DBZ-222 Added support for MySQL POINT type	2017-05-04 08:53:05 +02:00
Randall Hauch	d42736e065	DBZ-121 Corrected JavaDoc and reorganized imports	2017-04-04 16:53:35 -05:00
dleibovic	8e304c4cba	rename fields and simplify descriptions	2017-04-03 11:35:33 -04:00
dleibovic	fa157b0159	make the name of the physical table identifier field added to the key schema configurable	2017-03-31 13:09:32 -04:00
dleibovic	d7acda04c4	DBZ-121: routing to custom topic names via Kafka Connect Transformations.	2017-03-30 15:02:59 -04:00
Randall Hauch	709cd8f3fe	[maven-release-plugin] prepare for next development iteration	2017-03-27 11:28:12 -05:00
Randall Hauch	2bc3d45954	[maven-release-plugin] prepare release v0.5.0	2017-03-27 11:28:11 -05:00
Randall Hauch	81f62b6961	DBZ-205 Corrected MySQL connector to handle 2-digit years MySQL has special handling of 2-digit years that it deems are ambiguous, such as the year value `17` that is actually treated as `2017`. Apparently the 2-digit values are stored in MySQL and the interpretation is performed when the data is extracted, so therefore the connector needs to also perform this adjustment of the year values. This commit uses the JDK’s `TemporalAdjuster` interface and passes this down to the requisite temporal-related datatype handling code. The MySQL connector then provides its own `TemporalAdjuster` implementation that adjusts the year values via the excellend JDK `Temporal` methods. A row in one of the MySQL test databases was changed to use a 2-digit year of `16` while the test method still checks that the year is still 2016`, verifying that the year value is properly adjusted.	2017-03-27 10:58:21 -05:00
Randall Hauch	613e6b6340	Fixed compilation warnings	2017-03-27 10:37:15 -05:00
Randall Hauch	7a72ed6ae6	Merge pull request #202 from don41382/upgrade-kafka-version-to-0.10.2.0 DBZ-203 Upgrade kafka version from 0.10.1.1 to 0.10.2.0	2017-03-17 17:21:48 -05:00
Randall Hauch	430d756062	[maven-release-plugin] prepare for next development iteration	2017-03-17 15:41:58 -05:00
Randall Hauch	536cbf6300	[maven-release-plugin] prepare release v0.4.1	2017-03-17 15:41:57 -05:00
Randall Hauch	e5ee3847dd	DBZ-195 Added tests to try to replicate a reported issue Added a table and inserted rows that tries to replicate the problem reported in DBZ-195, but the test was unable to replicate the problem. In fact, this really is no different than existing tests. Changed the log messages so that if/when this happens again it will be possible to know which row was problematic.	2017-03-17 10:47:32 -05:00
Felix Eckhardt	5d414c521a	upgraded kafka version from 0.10.1.1 to 0.10.2.0	2017-03-17 11:36:15 +11:00
Randall Hauch	cf5391482a	DBZ-198 Improved MySQL DDL parser to better handle blocks The MySQL parser now properly handles control blocks such as `BEGIN…END`, `IF…END IF`, `REPEAT…END REPEAT`, and `LOOP…END LOOP`, even in cases where the block is preceded by and terminated by a label.	2017-03-16 13:32:21 -05:00
Randall Hauch	b48ccce4b5	DBZ-200 Corrected MySQL DDL parser to better handle column definitions Apparently not all reserved words must be quoted when using them as colum names, so refactored MySQL’s DDL parser to better handle a variety of unquoted colum names that are reserved words.	2017-03-08 12:12:27 -06:00
Randall Hauch	d2986710a5	DBZ-188 More efficient GTID source filters for MySQL Connector Changed the GTID source filters in the MySQL connector to be far more efficient when the filters specify literal UUIDs rather than regex patterns. In these cases, the predicate just checks whether a supplied value is in a hash set, and no regular expression patterns are used. The GTID source filters can still be a combination of UUID literals and regular expressions, and the predicate will use the best implementation for each. For example, if the filters include all UUID literals, then regular expressions will never be used.	2017-02-10 11:34:24 -06:00
Randall Hauch	8c60c29883	[maven-release-plugin] prepare for next development iteration	2017-02-07 14:22:12 -06:00
Randall Hauch	20134286e9	[maven-release-plugin] prepare release v0.4.0	2017-02-07 14:22:11 -06:00
Randall Hauch	74e5ba6448	DBZ-176 Corrected MySQL DDL parser to support creating triggers with definers The MySQL DDL parser was not correclty handling `DEFINER` clauses within `CREATE TRIGGER` or `CREATE EVENT` statements. Support for `DEFINER` clauses was recently added for the various forms of `CREATE PROCEDURE`, `CREATE FUNCTION` and `CREATE VIEW` statements. These are the only kinds of statements that have the definer attribute, per the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/stored-programs-security.html).	2017-02-02 12:44:28 -06:00
Randall Hauch	972cfbe2c4	DBZ-173 Additional fixes to KafkaDatabaseHistory class for Kafka 0.10.1.0 The KafkaDatabaseHistory class was not behaving well in tests using my local development environment. When restoring from the persisted Kafka topic, the class would set up a Kafka consumer and see repeated messages. It is unclear whether the repeats were due to our test environment and very short poll timeouts. Regardless, the restore logic was refactored to track offsets so as to only process messages once.	2017-02-01 14:47:41 -06:00
Horia Chiorean	d035c4bc8d	DBZ-173 Changes the MySQL ITs to not use TZ information for expected dates and fixes the character set for parsing test files	2017-01-27 14:53:10 +02:00
Horia Chiorean	a2154d3d32	DBZ-173 Changes the MySQL ITs to use the database.hostname system property instead of always hardcoding 'localhost'	2017-01-27 09:19:57 +02:00
Horia Chiorean	7dfdef3558	DBZ-173 Upgrades the Kafka artifact versions to 0.10.1.1	2017-01-27 09:19:57 +02:00
Horia Chiorean	031c4a1552	DBZ-183 Fixes the BinlogReader's handling of TIMESTAMP columns to correctly account for timezones	2017-01-25 16:39:36 +02:00
Randall Hauch	f0db8d1b1f	DBZ-179 Corrected JavaDoc in PostgreSQL connector Corrected the JavaDoc and removed trailing spaces in the PostgreSQL connector code.	2017-01-20 11:51:17 -06:00
Randall Hauch	a73f85a80f	Merge pull request #162 from rareddy/DBZ-177 DBZ-177: Providing an alternative way to create JDBC connection based …	2017-01-13 13:37:38 -06:00
Ramesh Reddy	a9aace3480	DBZ-177: Providing an alternative way to create JDBC connection based on the configured JDBC driver class name and supplied classloader. The loading/creating the JDBC connections is not reliable when driver libraries in a different classloader than the DriverManager.	2017-01-13 12:58:14 -06:00
Horia Chiorean	ae85656851	DBZ-3 Fixes topic naming to include the name of the server	2016-12-30 09:42:48 +02:00
Horia Chiorean	737614a555	DBZ-3 Implements a connector for streaming changes from a Postgres database The version of the DB server required for this to work is at least 9.4. To be able to stream logical changes, the code relies on enhancements to the JDBC driver which are not yet public. Therefore, the current codebase includes the sources for the JDBC driver. The commit also updates the general DBZ build system for: * custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now * adds support for debugging Surefire and Failsafe	2016-12-27 14:44:32 +02:00
Horia Chiorean	23e3f59fa1	DBZ-3 Implements a connector for streaming changes from a Postgres database The version of the DB server required for this to work is at least 9.4 The commit also updates the general DBZ build system for: * custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now * adds support for debugging Surefire and Failsafe	2016-12-27 14:44:32 +02:00
Horia Chiorean	8e14f150db	DBZ-3 Adds the structure for a Postgres connector which uses a Debezium Postgres docker image that has the decoderbufs plugin enabled to read WAL changes	2016-12-27 14:44:29 +02:00
Randall Hauch	5dceb05f69	DBZ-151 Additional changes to improve test framework and MySQL integration tests	2016-12-20 10:58:56 -06:00
Randall Hauch	a3bece4472	DBZ-151 Added new integration test framework for easily comparing output of connectors to expected results.	2016-12-20 09:18:09 -06:00
Randall Hauch	0851d8280c	DBZ-166 Corrected shutdown logic of MySQL connector The MySQL connector uses several threads, so previously upon connector shutdown these threads were simply cancelled. This is fine for the binlog reader (which can stop at any moment), but is a poor approach for the snapshot as we didn’t always properly release the database resources and also didn’t complete the writing of the DDL history. With this change, the snapshot reader stops in a very controlled manner, basically by having the 10-step snapshot procedure frequently check whether the reader is to continue working, and to completely avoid thread interruption altogether. And, the snapshot procedure will always clean up its database resources (locks, transactions, etc.), even if the procedure is stopped before completion. This change also refactors how the snapshot and binlog reader are managed. This is no longer done in the MySqlConnectorTask class (which is busy enough), but rather the logic has been encapsulated in a new `ChainedReader` that makes use of a new `Reader` interface. This makes testing of `ChainedReader` easier, and ensure that `ChainedReader` relies only upon the primary methods of `Reader` rather than upon `AbstractReader`. `ChainedReader` handles multiple readers generically, and ensures that when stopped the readers are all handled correctly and completely process all records, yet avoid accidentally starting a subsequent reader(s) when stopping the previous reader.	2016-12-15 10:55:18 -06:00
Randall Hauch	c762a221b7	DBZ-162 Corrected DDL parsing of MySQL functions The MySQL DDL parser was not properly consuming function declarations. For functions, the parser consumes the entire statement without handline the various expressions within the function declaration, but the parser was not properly finding the end of the statement and instead was continuing to try to consume values beyond the end of the statement. Specifically, when the parser consumes a `BEGIN`, it looks for a corresponding `END`. However, if it encountered an `END IF`, the `IF` plus any remaining tokens were left on the token stream and unprocessed. This confused the parser, which keep looking for statements and ultimately ended with a `No more content` error. This case was replicated in integration tests, and the code fixed to properly find the end of the statements.	2016-12-06 17:34:52 -06:00
Sherafudheen PM	ee52219736	DBZ-160 - Issue while parsing create table script with ENUM type and default value 'b'	2016-12-02 17:42:44 +05:30
Randall Hauch	d80bc1bfd7	DBZ-153 MySQL connector supports enum and set values with parentheses Changed the MySQL connector to support ENUM and SET literals with parentheses.	2016-11-14 12:22:08 -06:00
Randall Hauch	b0ded5f383	DBZ-147 Added ability to treat MySQL DECIMAL as double By default the MySQL connector handles `DECIMAL` and `NUMERIC` columns using `java.math.BigDecimal` values and describing them using the `org.apache.kafka.connect.data.Decimal` schema type, which serializes the values to a binary form. This change adds a configuration option that will keep the default behavior, but will instead allow handling `DECIMAL` adn `NUMERIC` values as Java `double` and a schema type of `FLOAT64`.	2016-11-09 11:27:09 -06:00
Randall Hauch	ea5f7983c7	DBZ-144 Corrected MySQL connector restart Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped. The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart. This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.	2016-11-09 08:11:41 -06:00
Randall Hauch	207315e5df	DBZ-146 Improved error handling of MySQL Connector Improved the error handling of the MySQL connector to ensure that we’re always stopping the connector when we have a problem handling a binlog event or if we have problems starting.	2016-11-03 16:55:59 -05:00
Randall Hauch	99a86ad289	Merge pull request #112 from rhauch/dbz-123 DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals	2016-10-07 17:16:37 -05:00
Randall Hauch	332de18384	Corrected headers	2016-10-07 17:16:27 -05:00
Randall Hauch	beb47dd2de	DBZ-131 Improved logging while reading binlog When the MySQL connector is reading the binlog, it outputs INFO log messages reporting status at an exponentially-increasing rate, starting at every 5 seconds and doubling until a max period of 1 hour. This output is useful when the connector starts to know that it is working, but thereafter the usefulness decreases. Once an hour is probably acceptable output. This is not intended to replace the capturing of metrics, but is merely an aid to easily tell via the logs whether the connector continues to work. Also improved the log message when the binlog reader stops to capture the total number of events recorded by Kafka Connect and the last recorded offset.	2016-10-07 17:10:01 -05:00
Randall Hauch	50eb4094ac	DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals The DDL parser now properly handles bit-set literals, and several minor case-sensitivity bugs dealing with other escaped literals.	2016-10-06 13:25:38 -05:00
Randall Hauch	730603976d	Merge pull request #107 from rhauch/dbz-123 DBZ-123 Corrected MySQL Connector's support for BIT(n) columns	2016-09-21 15:22:00 -05:00
Randall Hauch	3e2d953b1a	Merge pull request #103 from rhauch/dbz-122 DBZ-122 Prevent logging of password configuration property values	2016-09-21 15:15:02 -05:00
Randall Hauch	bcf60940db	DBZ-123 Corrected MySQL Connector's support for BIT(n) columns Corrected how the MySQL connector is treating columns of type `BIT(n)`, where _n_ is the number of bits in the value. When `n=1`, the resulting values are booleans; when `n>1`, the resulting values are little endian `byte[]` that have the minimum number of bytes to hold the `n` bits.	2016-09-21 15:04:20 -05:00
Randall Hauch	9aae6c62d9	DBZ-124 Eliminated the JMX "already registered" warning in the MySQL connector The `KafkaDatabaseHistory` was always creating a new producer whenever its `start()` method was called, even if it were called more than once. And, the `MySqlSchema` was calling `start()` twice, resulting in multiple producers being created and registered with JMX. Both issues were fixed. Also, UUIDs were being used as the name of the JMX MBean for the producer, unless the `database.history.consumer.client.id` and `database.history.producer.client.id` properties were being explicitly set. Now, the MySQL connector will by default set the `client.id` property on both the database history's Kafka consumer and producer to `{connectorName}-dbhistory`. Of course, the `database.history.consumer.client.id` and `database.history.producer.client.id` properties can still be set to define the name of the producer and consumer.	2016-09-21 10:05:15 -05:00
Randall Hauch	54b737edc1	DBZ-114 MySQL connector now handles "zero-value" dates and timestamps MySQL supports "zero-value" dates and timestamps, but these cannot be represented as valid dates or timestamps using the Java types. For example, the zero-value `0000-00-00` for a date has what Java considers to be an invalid month and day-of-the-month. This commit changes how the MySQL connector handles these values to not throw exceptions. When columns allow nulls, such values will be treated as nulls; when columns do not allow null values, these values will be converted to a "zero-value" for the corresponding Java representation (e.g., the epoch day or timestamp). A new test case verifies the behaviors.	2016-09-21 09:23:12 -05:00
Randall Hauch	40c1398a95	DBZ-122 Prevent logging of password configuration property values Anytime we `toString()` a `Configuration`, any values for password properties should be masked. A password property is defined to be a property whose key ends in "password" in a case-insensitive manner.	2016-09-15 15:20:55 -05:00
Randall Hauch	330a27ce52	Merge pull request #97 from rhauch/dbz-102 DBZ-102 MySQL connector support for column charsets	2016-08-29 15:12:24 -05:00
Randall Hauch	cc94bbc697	DBZ-102 MySQL connector now processes character sets The MySQL binlog events contain the binary representation of string-like values as encoded per the column's character set. Properly decoding these into Java strings requires capturing the column, table, and database character set when parsing the DDL statements. Unfortunately, MySQL DDL allows columns (at the time the columns are created or modified) to inherit the default character set for the table, or if that is not defined the default character set for the database, or if that is not defined the character set for the server. So, in addition to modifying the MySQL DDL parser to support capturing the character set name for each column, it also had to be changed to know what these default character set names are. The default character sets are all available via MySQL server/session/local variables. Although strictly speaking the character set variables cannot be set globally, MySQL DDL does allow session and local variables to be set with `SET` statements. Therefore, this commit enhances the MySQL DDL parser to parse `SET` statements and to track the various global, session, and local variables as seen by the DDL parser. Upon connector startup, a subset of server variables (related to character sets and collations) are read from the database via JDBC and used to initialize the DDL parser via `SET` methods. In addition to initializing the DDL parser with the system variables related to character sets and collation, it is important to also capture the server and database default character sets in the database history so that the correct character sets are used for columns even when the default character sets have changed on the database and/or the server. Therefore, upon startup or snapshot the MySQL connector records in the database history a `SET` statement for the `character_set_server` and `collation_server` system variables so that, upon a later restart, the history's DDL statements can be re-parsed with the correct default server and database character sets. Also, when the MySQL connector reloads the database history (upon startup), the recorded default server character set is compared with the MySQL instance's current server character set, and if they are different the current character set is recorded with a new `SET` statement. These extra steps ensure that the connector use the correct character set for each column, even when the connector restarts and reloads the database history captured by a previous version of the connector. IOW, the MySQL connector can be safely upgraded, and the new version will correctly start using the columns' character sets to decode the string-like values.	2016-08-29 12:19:24 -05:00
Randall Hauch	257e81c540	DBZ-102 MySQL in-memory models of tables capture column character sets The DDL parser and in-memory models of the relational schemas were changed to capture the character set for each column whose type is a string (e.g., `CHAR`, `VARCHAR`, etc.). This required handling `SET` statements used to change the system variables that hold the names of the default character set for the server and for each database. So, even if a column does not explicitly define the character set, the column's actual character set is identified from the table's character set, which might default to the current database's character set, which if not set defaults to the system character set. These changes merely affect how MySQL DDL is parsed and the in-memory relational schema representation to accommodate the character set at various levels. It does not change the behavior of the MySQL connector; that will be done in a subsequent commit. All tests pass with these changes, including quite a few additional tests for the new functionality.	2016-08-29 11:50:51 -05:00
Randall Hauch	638b459484	DBZ-108 Removed the TimeZoneAdapter and test, which is no longer used	2016-08-24 16:31:35 -05:00
Randall Hauch	4de56fd657	Merge pull request #94 from hchiorean/DZB-header-fix Fixes the DBZ header required by checkstyle	2016-08-24 14:28:43 -05:00
Randall Hauch	ce2b2db80c	DBZ-99 Added support for MySQL connector to connect securely to MySQL Changed the MySQL connector to have several new configuration properties for setting up the SSL key store and trust store (which can be used in place of System or JDK properties) used for MySQL secure connections, and another property to specify what kind of SSL connection be used. Modified several integration tests to ensure all MySQL connections are made with `useSSL=false`.	2016-08-24 13:27:35 -05:00
Horia Chiorean	2732d26ff0	Fixes the DBZ header required by checkstyle This commit removes an extra space character from the first blank line of the header	2016-08-24 13:41:15 +03:00
Randall Hauch	448d514c81	DBZ-106 Corrected the MySQL DDL parser to properly handled quoted keywords as column names.	2016-08-23 17:03:53 -05:00
Randall Hauch	e86fb83459	[maven-release-plugin] prepare for next development iteration	2016-08-16 09:56:47 -05:00
Randall Hauch	ccdb0a1a63	[maven-release-plugin] prepare release v0.3.0	2016-08-16 09:56:47 -05:00
Randall Hauch	918a523f12	DBZ-100 Changed the MongoDB connector to use a new JSON semantic type Added a semantic type for JSON strings, and used it in the MongoDB connector.	2016-08-15 12:11:35 -05:00
Randall Hauch	db49f0b17b	DBZ-100 Removed unused IsoTimestamp and IsoTime semantic types	2016-08-15 12:11:35 -05:00
Randall Hauch	d8a5d2b50f	DBZ-100 Corrected MySQL connector's use of ENUM and SET values The ENUM and SET values read from the binlog contain the indexes of the options that are included in the value, but this doesn't compared with the string values returned by MySQL and JDBC that contain the comma-separated options. With this change, the values read from the binlog will also be comma-separated strings.	2016-08-15 12:11:35 -05:00
Randall Hauch	6b591fc9b0	DBZ-91 Added a unit test for temporal conversions Also removed a non-unit-test test.	2016-08-15 10:29:16 -05:00
Randall Hauch	ba553c91e8	DBZ-91 Changed MicroTime to use INT64 There are more microseconds per day than can be represented with INT32, so this was changed to INT64.	2016-08-11 12:09:24 -05:00
Randall Hauch	19fc95fe08	DBZ-91 Simplified the temporal conversion functions to use primitives.	2016-08-11 10:48:38 -05:00
Randall Hauch	629542458e	DBZ-91 Added option to force use Kafka Connect temporal types.	2016-08-11 10:48:07 -05:00
Randall Hauch	31641fb43e	DBZ-91 Changed how temporal values are treated in MySQL connector Rewrote how the MySQL connector converts temporal values to use schemas with names that identify the semantic type of temporal value, and customized how the MySQL binlog client library creates Java object values from the raw binlog events. Several new "semantic" schema types were defined: * `io.debezium.time.Year` represents a year number as an INT32 value (e.g., 2016, -345, etc.). * `io.debezium.time.Date` represents a date by storing the epoch seconds (that is, the number of seconds past the epoch) as an INT64 value. * `io.debezium.time.Time` represents a time by storing the milliseconds past midnight as an INT32 value. * `io.debezium.time.MicroTime` represents a time by storing the microsconds past midnight as an INT32 value. * `io.debezium.time.NanoTime` represents a time by storing the nanoseconds past midnight as an INT32 value. * `io.debezium.time.Timestamp` represents a date and time (without timezone information) by storing the milliseconds past epoch as an INT64 value. * `io.debezium.time.MicroTimestamp` represents a date and time (without timezone information) by storing the microseconds past epoch as an INT64 value. * `io.debezium.time.NanoTimestamp` represents a date and time (without timezone information) by storing the nanoseconds past epoch as an INT64 value. * `io.debezium.time.ZonedTime` represents a time with timezone and optional fractions of a second (but no date) by storing the ISO8601 form as a STRING value (e.g., `10:15:30+01:00`) * `io.debezium.time.ZonedTimestamp` represents a date and time with timezone and optional fractions of a second by storing the ISO8601 form as a STRING value (e.g., `2011-12-03T10:15:30.030431+01:00`) This range of semantic types allows for a far more accurate representation in the events of the temporal values stored within the database. The MySQL connector chooses the semantic type based upon the precision of the MySQL type (e.g., `TIMESTAMP(6)` will be represented with `io.debezium.time.MicroTimestamp`, whereas `TIMESTAMP(3)` will be represented with `io.debezium.time.Timestamp`). This ensures that the events do not lose precision and that the semantics of the database column values are retained in the events even though the values are represented with primitive values. Obviously these Kafka Connect schema representations are different and more precise than the built-in `org.apache.kafka.connect.data.Date`, `org.apache.kafka.connect.data.Time`, and `org.apache.kafka.connect.data.Timestamp` logical types provided by Kafka Connect and used by the MySQL connector in all 0.2.x and 0.1.x versions. Migration to the new MySQL connector should be possible, although consumers may still need to know about these types to properly handle temporal values and the correct precision (i.e., consumers can just assume all date INT64 values represent milliseconds). The MySQL binlog client library converted the raw binary event information to JDBC types using a local Calendar instance, which obviously incorporates the local timezone and cannot retain more than millisecond precision. This change extends the library's deserializers to instead use the Java 8 `javax.time` classes and to retain the exact semantics of the database values and to not lose any precisions (since the `javax.time` classes have nanosecond precision). The same logic is also used to convert the JDBC values obtained during a snapshot from the MySQL Connect/J JDBC driver. The latter has a few quirks, such as not returning any fractional seconds for `TIME` columns, even though `java.sql.Time` can store up to milliseconds. Most of the logic of the conversions of values and mapping to Kafka Connect schemas is handled in the new `JdbcValueConverters`, which was extracted from the existing `TableSchemaBuilder`. The MySQL connector reuses and actually extends the `JdbcValueConverters` class with its own `MySqlValueConverters` class that also adds support for MySQL-specific types such as `YEAR`. Other connectors whose values are based on JDBC types should be able to reuse and/or extend the `JdbcValueConverters` class. Integration tests that deal with temporal types were modified to use proper expected values and comparisons.	2016-08-10 15:51:07 -05:00
Horia Chiorean	ab24f013d1	DBZ-96 Removes some asserts on tables created by another test case	2016-08-08 14:25:38 +03:00
Randall Hauch	2ae26819af	DBZ-94 Added support for copying very large tables during snapshot By default the MySQL JDBC driver will put the entire result set into memory, which obviously doesn't work for tables of even moderate sizes. This change adds support for streaming rows in result sets when the tables have more than a configurable number of rows (defaults to 1,000). This posed a problem for how we were previously finding the last row in the last table; the MySQL driver does not support `ResultSet.isLast()` on result sets that are streamed. Instead, this commit wraps the consumer to which the snapshot reader writes all source records, with a consumer that buffers the last record. When the snapshot completes, the offset is updated (denoting the end of the snapshot) and set on the last buffered record before that record is flushed to the normal consumer. This should add minimal overhead while simplifying the logic to ensure the last source record has the updated offset. This also improves the log output of the snapshot process.	2016-08-04 16:06:50 -05:00
Horia Chiorean	bb1b7d5734	DBZ-92 Adds more logging information during MySQL snapshot recreation	2016-08-03 16:54:17 +03:00
Randall Hauch	8cb39eacf0	Reverted back to 0.3.0-SNAPSHOT, since the 0.3 candidate release was not acceptable.	2016-08-01 12:25:58 -05:00
Horia Chiorean	a6dddaed92	Fixes a couple of test related issues for debezium-core * fixes a java.sql.Date conversion test to take into account zone offsets * makes sure the ZK DB is closed during testing, otherwise file handles may leak and cause test failures	2016-07-26 14:17:31 +03:00
Randall Hauch	517272278d	[maven-release-plugin] prepare for next development iteration	2016-07-25 17:50:31 -05:00
Randall Hauch	b89296e646	[maven-release-plugin] prepare release v0.3.0	2016-07-25 17:50:31 -05:00
Randall Hauch	a8fa33e44b	DBZ-85 Corrected log statements to be debug	2016-07-25 16:59:46 -05:00
Randall Hauch	447acb797d	DBZ-62 Upgraded to Kafka and Kafka Connect 0.10.0.0 Upgraded from Kafka 0.9.0.1 to Kafka 0.10.0. The only required change was to override the `Connector.config()` method, which returns `null` or a `ConfigDef` instance that contains detailed metadata for each of the configuration fields, including supporting recommended values and marking fields as not visible (e.g., if they don't make sense given other configuration field values). This can be used by user interfaces to data-drive the configuration of a connector. Also, the default validation logic of the Connector implementations uses a `Validator` that is pretty restrictive in its functionality. Debezium already had a fairly decent and simple `Configuration` framework. After several attempts to try and merge these concepts, reconciling the two validation mechanisms was very complicated and involved a lot of changes. It was easier to simply continue Debezium-specific validation and to override the `Connector.validate(...)` method to use Debezium's `Configuration`-based validation. Connector-based validation logic includes determining recommended values, so Debezium's `Field` class (used to define each configuration property) was enhanced with a new `Recommender` class that is similar to Kafka's. Additional integration tests were added to verify that the `ConfigDef` result is acceptable and that the new connector validation logic works as expected, including getting recommended values for some fields (e.g., database names, table/collection names) from MySQL and MongoDB by connecting and dynamically reading the values. This was done in a way that remains backward compatible with the regular expression formats of these fields, but in a user interface that uses the `ConfigDef` mechanism the user can simply select the databases and table/collection identifiers.	2016-07-25 14:21:31 -05:00
Randall Hauch	30777e3345	DBZ-85 Added test case and made correction to temporal values Added an integration test case to diagnose the loss of the fractional seconds from MySQL temporal values. The problem appears to be a bug in the MySQL Binary Log Connector library that we used, and this bug was reported as https://github.com/shyiko/mysql-binlog-connector-java/issues/103. That was fixed in version 0.3.2 of the library, which Stanley was kind enough to release for us. During testing, though, several issues were discovered in how temporal values are handled and converted from the MySQL events, through the MySQL Binary Log client library, and through the Debezium MySQL connector to conform with Kafka Connect's various temporal logical schema types. Most of the issues involved converting most of the temporal values from local time zone (which is how they are created by the MySQL Binary Log client) into UTC (which is how Kafka Connect expects them). Really, java.util.Date doesn't have time zone information and instead tracks the number of milliseconds past epoch, but the conversion of normal timestamp information to the milliseconds past epoch in UTC depends on the time zone in which that conversion happens.	2016-07-20 17:07:56 -05:00
Randall Hauch	a5f4d0bf31	DBZ-87 Changed mapping of MySQL TINYINT and SMALLINT columns from INT32 to INT16 The MySQL connector now maps TINYINT and SMALLINT columns to INT16 (rather than INT32) because INT16 is smaller and yet still large enough for all TINYINT and SMALLINT values. Note that the range of TINYINT values is either -128 to 127 for signed or 0 to 255 for unsigned, and thus INT8 is not an acceptable choice since it can only handle values in the range 0 to 255. Additionally, the JDBC Specification also suggests the proper Java type for SQL-99's TINYINT is short, which maps to Kafka Connect's INT16. This change will be backward compatible, although the generated Kafka Connect schema will be different than in previous versions. This shouldn't cause a problem, since clients should expect to handle schema changes, and this schema change does comply with Avro schema evolution rules.	2016-07-19 11:11:05 -05:00
Randall Hauch	04eef2da5c	DBZ-84 Tried to replicate error with MySQL TINYINT columns Tried unsuccessfully to replicate the problem reported in DBZ-84 with a new regression integration test.	2016-07-19 10:58:28 -05:00
Randall Hauch	a88bcb9ae7	DBZ-86 Generated Kafka Schema names will now also be valid Avro fullnames	2016-07-15 16:29:52 -05:00
Randall Hauch	12e7cfb8d3	DBZ-2 Created initial Maven module with a MongoDB connector Added a new `debezium-connector-mongodb` module that defines a MongoDB connector. The MongoDB connector can capture and record the changes within a MongoDB replica set, or when seeded with addresses of the configuration server of a MongoDB sharded cluster, the connector captures the changes from the each replica set used as a shard. In the latter case, the connector even discovers the addition of or removal of shards. The connector monitors each replica set using multiple tasks and, if needed, separate threads within each task. When a replica set is being monitored for the first time, the connector will perform an "initial sync" of that replica set's databases and collections. Once the initial sync has completed, the connector will then begin tailing the oplog of the replica set, starting at the exact point in time at which it started the initial sync. This equivalent to how MongoDB replication works. The connector always uses the replica set's primary node to tail the oplog. If the replica set undergoes an election and different node becomes primary, the connector will immediately stop tailing the oplog, connect to the new primary, and start tailing the oplog using the new primary node. Likewise, if connector experiences any problems communicating with the replica set members, it will try to reconnect (using exponential backoff so as to not overwhelm the replica set) and continue tailing the oplog from where it last left off. In this way the connector is able to dynamically adjust to changes in replica set membership and to automatically handle communication failures. The MongoDB oplog contains limited information, and in particular the events describing updates and deletes do not actually have the before or after state of the documents. Instead, the oplog events are all idempotent, so updates contain the effective changes that were made during an update, and deletes merely contain the deleted document identifier. Consequently, the connector is limited in the information it includes in its output events. Create and read events do contain the initial state, but the update contain only the changes (rather than the before and/or after states of the document) and delete events do not have the before state of the deleted document. All connector events, however, do contain the local system timestamp at which the event was processed and _source_ information detailing the origins of the event, including the replica set name, the MongoDB transaction timestamp of the event, and the transactions identifier among other things. It is possible for MongoDB to lose commits in specific failure situations. For exmaple, if the primary applies a change and records it in its oplog before it then crashes unexpectedly, the secondary nodes may not have had a chance to read those changes from the primary's oplog before the primary crashed. If one such secondary is then elected as primary, it's oplog is missing the last changes that the old primary had recorded and no longer has those changes. In these cases where MongoDB loses changes recorded in a primary's oplog, it is possible that the MongoDB connector may or may not capture these lost changes.	2016-07-14 13:02:36 -05:00
Randall Hauch	6749518f66	[maven-release-plugin] prepare for next development iteration	2016-06-08 13:00:50 -05:00
Randall Hauch	d5bbb116ed	[maven-release-plugin] prepare release v0.2.0	2016-06-08 13:00:50 -05:00
Randall Hauch	cf26a5c4e0	Removed duplicate versions in POMs	2016-06-08 09:46:05 -05:00
Randall Hauch	a143871abd	DBZ-61 Improved MySQL connector's handling of binary values Binary values read from the MySQL binlog may include strings, in which case they need to be converted to binary values. Interestingly, work on this uncovered [KAFKA-3803](https://issues.apache.org/jira/browse/KAFKA-3803) whereby Kafka Connect's `Struct.equals` method does not properly handle comparing `byte[]` values. Upon researching the problem and potentially supplying a patch, it was discovered that the Kafka Connect codebase and the Avro converter all use `ByteBuffer` objects rather than `byte[]`. Consequently, the Debezium code that converts JDBC values to Kafka Connect values was changed to return `ByteBuffer` objects rather than `byte[]` objects. Unfortunately, the JSON converter rehydrates objects with just `byte[]`, so that still means that Debezium's `VerifyRecords` logic cannot rely upon `Struct.equals` for comparison, and instead needs custom logic.	2016-06-07 17:53:07 -05:00
Randall Hauch	f48d48e114	DBZ-37 Added integration test with MySQL GTIDs Added a Maven profile to the MySQL connector component with a Docker image that runs MySQL with GTIDs enabled. The same integration tests can be run with it using `-Pgtid-mysql` or `-Dgtid-mysql` in the Maven build. When the MySQL connector starts up, it now queries the MySQL server to detect whether GTIDs are enabled, and if they are it will also verify that any GTID sets from the most recently recorded offset are still available in the MySQL server (similarly to how it was already doing this for binlog filenames). If the server does not have the correct coordinates/GTIDs, the connector fails with a useful error message. This commit also tests and adjusts the `GtidSet` class to better deal with comparisons of GTID sets for proper ordering. It also changes the connector to output MySQL's timestamp for each event using _second_ precision rather than artificially in _millisecond_ precision. To clarify the different, this change renames the field in the event's `source` structure that records the MySQL timestamp from `ts` to `ts_sec`. Similarly, the envelope's field that records the time that the connector processed each record was renamed from `ts` to `ts_ms`. All unit and integration tests pass with the default profile and with the new GTID-enabled profile.	2016-06-07 12:01:51 -05:00
Randall Hauch	e91aac5b18	DBZ-37 DatabaseHistory can now use custom logic to compare offsets DatabaseHistory stores the DDL changes with the offset describing the position in the source where those DDL statements were found. When a connector restarts at a specific offset (supplied by Kafka Connect), connectors such as the MySQL connector reconstruct the database schemas by having DatabaseHistory load the history starting from the beginning and stopping at (or just before) the connector's starting offset. This change allows connectors to supply a custom comparison function. To support GTIDs, the MySQL connector needed to store additional information in the offsets. This means the logic needed to compare offsets with and without GTIDs is non-trivial and unique to the MySQL connector. This commit adds a custom comparison function for offsets. Per [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-failover.html), slaves are always expected to start with the same set of GTIDs as the master, so no matter which the MySQL connector follows it should always have the complete set of GTIDs seen by that server. Therefore: * Two offsets with GTID sets can be compared using only the GTID sets. * Any offset with a GTID set is always assumed to be newer than an offset without, since it is assumed once GTIDs are enabled they will remain enabled. (Otherwise, the connector likely needs to be restarted with a snapshot and tied to a specific master or slave with no failover.) * Two offsets without GTIDs are compared using the binlog coordinates (filename, position, and row number). * An offsets that is identical to another except for being in snapshot mode is considered earlier than without the snapshot. This is because snapshot mode begins by recording the position of the snapshot, and once complete the offset is recorded without the snapshot flag.	2016-06-04 16:20:26 -05:00
Randall Hauch	655aac7d4f	DBZ-37 Added support for MySQL GTIDs The BinlogClient library our MySQL connector uses already has support for GTIDs. This change makes use of that and adds the GTIDs from the server to the offsets created by the connector and used upon restarts.	2016-06-02 18:30:26 -05:00
Randall Hauch	264a9041df	DBZ-64 Added Avro Converter to record verification utilities The `VerifyRecord` utility class has methods that will verify a `SourceRecord`, and is used in many of our integration tests to check whether records are constructed in a valid manner. The utility already checks whether the records can be serialized and deserialized using the JSON converter (provided with Kafka Connect); this change also checks with the Avro Converter (which produces much smaller records and is more suitable for production). Note that version 3.0.0 of the Confluent Avro Converter is required; version 2.1.0-alpha1 could not properly handle complex Schema objects with optional fields (see https://github.com/confluentinc/schema-registry/pull/280). Also, the names of the Kafka Connect schemas used in MySQL source records has changed. # The record's envelope Schema used to be "<serverName>.<database>.<table>" but is now "<serverName>.<database>.<table>.Envelope". # The Schema for record keys used to be named "<database>.<table>/pk", but the '/' character is not valid within a Avro name, and has been changed to "<serverName>.<database>.<table>.Key". # The Schema for record values used to be named "<database>.<table>", but to better fit with the other Schema names it has been changed to "<serverName>.<database>.<table>.Value". Thus, all of the Schemas for a single database table have the same Avro namespace "<serverName>.<database>.<table>" (or "<topicName>") with Avro schema names of "Envelope", "Key", and "Value". All unit and integration tests pass.	2016-06-02 16:54:21 -05:00
Randall Hauch	46c0ce9882	DBZ-58 Added MDC logging contexts to connector Changed the MySQL connector to make use of MDC logging contexts, which allow thread-specific parameters that can be written out on every log line by simply changing the logging configuration (e.g., Log4J configuration file). We adopt a convention for all Debezium connectors with the following MDC properties: * `dbz.connectorType` - the type of connector, which would be a single well-known value for each connector (e.g., "MySQL" for the MySQL connector) * `dbz.connectorName` - the name of the connector, which for the MySQL connector is simply the value of the `server.name` property (e.g., the logical name for the MySQL server/cluster). Unfortunately, Kafka Connect does not give us its name for the connector. * `dbz.connectorContext` - the name of the thread, which is "main" for thread running the connector; the MySQL connector uses "snapshot" for the thread started by the snapshot reader, and "binlog" for the thread started by the binlog reader. Different logging frameworks have their own way of using MDC properties. In a Log4J configuration, for example, simply use `%X{name}` in the logger's layout, where "name" is one of the properties listed above (or another MDC property).	2016-06-02 14:05:06 -05:00
Randall Hauch	58a5d8c033	DBZ-31 Added support for possibly performing snapshot upon startup Refactored the MySQL connector to break out the logic of reading the binlog into a separate class, added a similar class to read a full snapshot, and then updated the MySQL connector task class to use both. Added several test cases and updated the existing tests.	2016-06-01 21:40:53 -05:00
Randall Hauch	e6c0ff5e4d	DBZ-31 Refactored the MySQL Connector Several of the MySQL connector classes were fairly large and complicated, and to prepare for upcoming changes/enhancements these larger classes were refactored to pull out units of functionality. Currently all unit tests pass with these changes, with additional unit tests for these new components.	2016-05-26 15:58:58 -05:00
Randall Hauch	24e99fb28f	DBZ-31 DDL parser now supports '#' as comment line prefix	2016-05-26 15:40:50 -05:00
Randall Hauch	dc5a379764	DBZ-55 Corrected filtering of DDL statements based upon affected database Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. However, that database name is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database _affected_ by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database). With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements is still maintained, but since each DDL statement can apply to a separate database the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting "db1" will get batched together into one schema change record, followed by one statement affecting "db2" as a second schema change record, followed by another statement affecting "db1" as a third schema record. Meanwhile, this change does not affect how the database history records the changes: it still records them as submitted using a single record for each separate binlog event/position. This is much safer as each binlog event (with specific position) is written atomically to the history stream. Also, since the database history stream is what the connector uses upon recovery, the database history records are now written _after_ any schema change records to ensure that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).	2016-05-23 11:01:27 -05:00
Randall Hauch	07315f2b4b	DBZ-43 Changed form of schema change topic to use schemas	2016-05-19 16:54:22 -05:00
Randall Hauch	c0b7114424	DBZ-52 Added top-level container structure to all messages The new envelope Struct contains fields for the local time at which the connector processed the event, the kind of operation (e.g., read, insert, update, or delete), the state of the record before and after the change, and the information about the event source. The latter two items are connector-specific. The timestamp is merely the time using the connector's process clock, and no guarantees are provided about accuracy, monotonicity, or relationship to the original source event. The envelope structure is now used as the value for each event message in the MySQL connector; they keys of the event messages remain unchanged. Note that to facilitate Kafka log compaction (which requires a null value), a delete event containing the envelope with details about the deletion is followed by a "tombstone" event that contains the same key but null value. An example of a message value with this new envelope is as follows: { "schema" : { "type" : "struct", "fields" : [ { "type" : "struct", "fields" : [ { "type" : "int32", "optional" : false, "name" : "org.apache.kafka.connect.data.Date", "version" : 1, "field" : "order_date" }, { "type" : "int32", "optional" : false, "field" : "purchaser" }, { "type" : "int32", "optional" : false, "field" : "quantity" }, { "type" : "int32", "optional" : false, "field" : "product_id" } ], "optional" : true, "name" : "connector_test.orders", "field" : "before" }, { "type" : "struct", "fields" : [ { "type" : "int32", "optional" : false, "name" : "org.apache.kafka.connect.data.Date", "version" : 1, "field" : "order_date" }, { "type" : "int32", "optional" : false, "field" : "purchaser" }, { "type" : "int32", "optional" : false, "field" : "quantity" }, { "type" : "int32", "optional" : false, "field" : "product_id" } ], "optional" : true, "name" : "connector_test.orders", "field" : "after" }, { "type" : "struct", "fields" : [ { "type" : "string", "optional" : false, "field" : "server" }, { "type" : "string", "optional" : false, "field" : "file" }, { "type" : "int64", "optional" : false, "field" : "pos" }, { "type" : "int32", "optional" : false, "field" : "row" } ], "optional" : false, "name" : "io.debezium.connector.mysql.Source", "field" : "source" }, { "type" : "string", "optional" : false, "field" : "op" }, { "type" : "int64", "optional" : true, "field" : "ts" } ], "optional" : false, "name" : "kafka-connect-2.connector_test.orders", "version" : 1 }, "payload" : { "before" : null, "after" : { "order_date" : 16852, "purchaser" : 1003, "quantity" : 1, "product_id" : 107 }, "source" : { "server" : "kafka-connect-2", "file" : "mysql-bin.000002", "pos" : 2887680, "row" : 4 }, "op" : "c", "ts" : 1463437199134 } } Notice how the Schema is significantly larger, since it must describe all of the envelope's fields even when those fields are not used. In this case, the event signifies that a record was created as the 4th record of a single event recorded in the binlog.	2016-05-19 12:40:16 -05:00
Randall Hauch	6d56a8f3d0	DBZ-50 Added parameters for truncated length and when the field is masked.	2016-05-12 16:31:33 -05:00
Randall Hauch	b1e6eb1028	DBZ-29 Refactored ColumnMappers and enabled ColumnMapper impls to add parameters to the Kafka Connect Schema.	2016-05-12 12:26:04 -05:00
Randall Hauch	18995abfbd	Merge pull request #38 from rhauch/dbz-29 DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns	2016-05-12 08:27:15 -05:00
Randall Hauch	ff9d0fc240	DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns Changed the MySQL connector to use comma-separated lists of regular expressions for the database and table whitelist/blacklists. Literals are still accepted and will match fully-qualified table names, although the '.' character used as a delimiter is also a special character in regular expressions and therefore may need to be escaped with a double backslash ('\\') to more carefully match fully-qualified table names. Added several new configuration properties for the MySQL connector that instruct it to hide, truncate, and/or mask certain columns. The properties' values are all lists of regular expressions or literal fully-qualified column names. For example, the following configuration property: column.blacklist=server.users.picture,server.users.other will cause the connector to leave out of change event messages for the `server.users` table those fields that correspond to the `picture` and `others` columns. This capability can be used to This capability can be used to prevent dissemination of sensitive information in the change event stream. An alternative to blacklisting is masking. The following configuration property: column.mask.with.10.chars=server\\.users\\.(\\wemail) will cause the connector to mask in the change event messages for the `server.users` table all values for columns whose name ends in `email`. The values will be replaced in this case with a constant string of 10 asterisk ('') characters, even when the email value is null. This capability can also be used to prevent dissemination of sensitive information in the change event stream. Another option is to truncate string values for specific columns. The following configuration property: column.truncate.to.120.chars=server[.]users[.](description\|biography) will cause the connector to truncate to at most 120 characters the values of the `description` and `biography` columns in the change event messages for the `server.users` table. Although this example used a limit of 120 characters, any positive length can be specified; separate properties should be used when different lengths are required. Note how the '.' delimiter in the fully-qualified names is escaped since that same character is a special character in regular expressions. This capability can be used to reduce the size of change event messages.	2016-05-11 15:57:06 -05:00
Christian Posta	8b736ef654	DBZ-48 Cannot parse COMMIT and flush statements	2016-05-05 15:36:24 -07:00
Randall Hauch	1fcb4b02cf	DBZ-38 Changed DROP VIEW and TABLE to include single-table statements in events Drop table/view statements that involve more than one table generate one event for each table/view. Previously, each of those statements had the original multi-table/view statement. Now, each event has a statement that applies to only that table (generated from the original with all the same clauses).	2016-04-12 18:18:13 -05:00
Randall Hauch	b1e428c986	DBZ-38 Adjusted how events are generated for RENAME TO statements The previous change did not correctly capture the statements for a `RENAME TO` that renamed multiple tables, so fixed the code so that it generates a single `RENAME TO` for each table rename.	2016-04-12 17:58:07 -05:00
Randall Hauch	5b30568650	DBZ-38 Changed the listening framework of the DDL parser Refactored the mechanism by which components can listen to the activities of a DDL parser. The new approach should be significantly more flexible for additional types of DDL events while making it easier to maintain backward compatibility. It also will enable passing event-specific information on each DDL event.	2016-04-12 11:00:02 -05:00
Randall Hauch	137b9f6d4d	DBZ-38 Changed the DDL parser framework to notify listeners as statements are applied.	2016-04-11 15:16:04 -05:00
Randall Hauch	8f5487b2c0	[maven-release-plugin] prepare for next development iteration	2016-03-17 16:28:40 -05:00
Randall Hauch	c2b8ac50ae	[maven-release-plugin] prepare release v0.1.0	2016-03-17 16:28:40 -05:00
Randall Hauch	43f79aad5e	Added missing version element to modules	2016-03-17 16:14:17 -05:00
Randall Hauch	5a002dbf62	DBZ-15 Cached converters are now dropped upon log rotation.	2016-03-17 11:03:28 -05:00
Randall Hauch	4998325de7	DBZ-30 Changed the MySQL connector to include all columns in the record value	2016-03-04 10:51:14 -06:00
Randall Hauch	9034e26d1e	DBZ-26 Corrected the embedded connector framework to enable stopping. Also improved logging statements.	2016-03-03 15:27:11 -06:00
Randall Hauch	1d46e59048	DBZ-17 Minor changes to the POMs	2016-02-18 13:58:29 -06:00
Randall Hauch	2e5dfd837b	DBZ-13 Minor code changes to eliminate JavaDoc warnings	2016-02-17 11:15:21 -06:00
Randall Hauch	73f3c9836b	DBZ-1 Completed integration testing and debugging of the MySQL connector	2016-02-15 14:46:12 -06:00
Randall Hauch	1a59f9b07c	DBZ-11 Build can skip long-running unit and integration tests	2016-02-04 15:35:27 -06:00
Randall Hauch	54b822bb72	DBZ-10 Added small utility so unit tests can run an embedded Kafka cluster within the same process. This utility is only suitable for unit tests and therefore is defined in the test JAR of the `debezium-core` module. It certainly should never be used for production purposes.	2016-02-04 15:18:27 -06:00
Randall Hauch	c501f8486f	DBZ-9 Added MySQL whitelist and blacklists on tables and databases.	2016-02-04 07:56:13 -06:00
Randall Hauch	37d6a5e7da	DBZ-1 Expanded documentation and improved EmbeddedConnector framework Changed the EmbeddedConnector framework to initialize all major components via configuration properties rather than through the public builder. This increases the size of the configurations, but it simplifies what embedding applications must do to obtain an EmbeddedConnector instance. The DatabaseHistory framework was also changed to be configurable in similar ways to the OffsetBackingStore. Essentially, connectors that want to use it (like the MySqlConnector) will describe it as part of the connector's configuration, allowing more flexibility in which DatabaseHistory implementation is used and how it is configured whether in Kafka Connector or as part of the EmbeddedConnector. Added a README.md to `debezium-embedded` to provide documentation and sample code showing how to use the EmbeddedConnector.	2016-02-03 14:11:53 -06:00
Randall Hauch	2da5b37f76	DBZ-1 Added support for recording and recovering database schema Adds a small framework for recording the DDL operations on the schema state (e.g., Tables) as they are read and applied from the log, and when restarting the connector task to recover the accumulated schema state. Where and how the DDL operations are recorded is an abstraction called `DatabaseHistory`, with three options: in-memory (primarily for testing purposes), file-based (for embedded cases and perhaps standalone Kafka Connect uses), and Kafka (for normal Kafka Connect deployments). The `DatabaseHistory` interface methods take several parameters that are used to construct a `SourceRecord`. The `SourceRecord` type was not used, however, since that would result in this interface (and potential extension mechanism) having a dependency on and exposing the Kafka API. Instead, the more general parameters are used to keep the API simple. The `FileDatabaseHistory` and `MemoryDatabaseHistory` implementations are both fairly simple, but the `FileDatabaseHistory` relies upon representing each recorded change as a JSON document. This is simple, is easily written to files, allows for recovery of data from the raw file, etc. Although this was done initially using Jackson, the code to read and write the JSON documents required a lot of boilerplate. Instead, the `Document` framework developed during Debezium's very early prototype stages was brought back. It provides a very usable API for working with documents, including the ability to compare documents semantically (e.g., numeric values are converted to be able to compare their numeric values rather than just compare representations) and with or without field order. The `KafkaDatabaseHistory` is a bit more complicated, since it uses a Kafka broker to record all database schema changes on a single topic with single partition, and then upon restart uses it to recover the history from the dedicated topics. This implementation also records the changes as JSON documents, keeping it simple and independent of the Kafka Connect converters.	2016-02-02 14:27:14 -06:00
Randall Hauch	6796fe32be	DBZ-1 Added the initial stages of a MySQL source connector The connector is in a basic working state, although it is not well tested yet and upon restart does not recover the schema state from the previous run.	2016-01-29 10:12:28 -06:00
Randall Hauch	d9090ed67b	DBZ-4 Removed unused files, most of which were originally copied from the ModeShape codebase.	2016-01-27 08:37:23 -06:00
Randall Hauch	4c538d4e54	DBZ-4 Changed copyright statement in source code headers and adjusted checkstyle rules.	2016-01-27 08:12:01 -06:00
Randall Hauch	eff1f665fa	Updated checkstyle rule for headers, and corrected several incorrect headers.	2016-01-25 18:59:25 -06:00
Randall Hauch	a0a8953d2a	Updated the copyright dates per new approach.	2016-01-25 18:33:08 -06:00

... 2 3 4 5 6 ...

356 Commits