Commit Graph

206 Commits

Author SHA1 Message Date
Jiri Pechanec
b53b4f67d2 DBZ-319 Views are now ignored in snapshots 2017-09-15 11:37:38 +02:00
Jiri Pechanec
e4bc6670c8 DBZ-305 Rebase build process against Kafka 0.11 2017-08-17 18:51:01 +02:00
Jenkins user
214696ef0c [maven-release-plugin] prepare for next development iteration 2017-08-17 11:51:05 +00:00
Jenkins user
c867e6fea6 [maven-release-plugin] prepare release v0.5.2 2017-08-17 11:51:05 +00:00
Gunnar Morling
1665026c8a DBZ-298 Adding support for quoted identifiers for schemas, tables and
columns; expanding test
2017-08-17 09:08:17 +02:00
Gunnar Morling
564d90dc09 DBZ-298 Formatting 2017-08-17 09:03:16 +02:00
Gunnar Morling
a4dc7620fe DBZ-327 Simplifying validation implementation a bit via Strings#isNullOrEmpty() 2017-08-16 14:37:01 +02:00
Mario Mueller
2a2f911e74 DBZ-327 Fix broken SMT validation and add some tests for the
configuration combinations
2017-08-16 11:57:48 +02:00
Emrul Islam
bb79b2f60f DBZ-297 Initial support for some Postgres array types (one-dimensional only) 2017-07-27 10:04:13 +02:00
Gunnar Morling
79fbc028a8 DBZ-311 Precompiling and simplifying some regular expressions 2017-07-25 21:43:26 +02:00
Gunnar Morling
825530256e DBZ-311 Removing trailing whitespace 2017-07-25 21:43:26 +02:00
Gunnar Morling
a8d1817c22 [maven-release-plugin] prepare for next development iteration 2017-06-09 16:14:31 +00:00
Gunnar Morling
3f512aace7 [maven-release-plugin] prepare release v0.5.1 2017-06-09 16:14:31 +00:00
Gunnar Morling
2a22f3d4b0 DBZ-254 Right-padding values for fixed-length BINARY columns with 0x00 (zero byte) characters for MySQL;
Also fixing JDBC types for binary data types for MySQL.
2017-05-30 15:04:38 +02:00
Tom Bentley
96eda35da5 Respect any existing auto.create.topics.enable when creating a server 2017-05-23 16:58:12 +01:00
dleibovic
a683629519 Change ByLogicalTableRouter logging from info to debug
This can be a bit spammy in the info level
2017-05-19 12:09:40 -04:00
Randall Hauch
02e655fa53 DBZ-198 Add another DDL parser test case
Added an additional test that is unable to reproduce the problem reported on April 4.
2017-05-19 15:50:03 +02:00
Jiri Pechanec
ec915c559c [DBZ-246] Correctly handle enumeratedvale type 2017-05-09 11:24:28 +02:00
Gunnar Morling
1158833524 DBZ-238 Removing some trailing whitespace 2017-05-05 21:59:06 +02:00
Gunnar Morling
64958d11db DBZ-238 Moving EnumeratedValue to its own file 2017-05-05 21:55:46 +02:00
Brendan Maguire
928c9cb5f0 DBZ-238: Fix Postgres database.sslmode = "require" bug
* Also added EnumeratedValue interface for Configuration so getValue is used instead of toString when getting the config value
2017-05-05 21:55:45 +02:00
Gunnar Morling
1074de4efa DBZ-222 Some more clean-up 2017-05-04 09:25:25 +02:00
Omar Al-Safi
ea092f1baf DBZ-222 Add null to Point comment 2017-05-04 08:53:46 +02:00
Gunnar Morling
5630b61be6 DBZ-222 dependency clean-up 2017-05-04 08:53:05 +02:00
Omar Al-Safi
791545c5f4 DBZ-222 Added support for MySQL POINT type 2017-05-04 08:53:05 +02:00
Randall Hauch
d42736e065 DBZ-121 Corrected JavaDoc and reorganized imports 2017-04-04 16:53:35 -05:00
dleibovic
8e304c4cba rename fields and simplify descriptions 2017-04-03 11:35:33 -04:00
dleibovic
fa157b0159 make the name of the physical table identifier field added to the key schema configurable 2017-03-31 13:09:32 -04:00
dleibovic
d7acda04c4 DBZ-121: routing to custom topic names via Kafka Connect Transformations. 2017-03-30 15:02:59 -04:00
Randall Hauch
709cd8f3fe [maven-release-plugin] prepare for next development iteration 2017-03-27 11:28:12 -05:00
Randall Hauch
2bc3d45954 [maven-release-plugin] prepare release v0.5.0 2017-03-27 11:28:11 -05:00
Randall Hauch
81f62b6961 DBZ-205 Corrected MySQL connector to handle 2-digit years
MySQL has special handling of 2-digit years that it deems are ambiguous, such as the year value `17` that is actually treated as `2017`. Apparently the 2-digit values are stored in MySQL and the interpretation is performed when the data is extracted, so therefore the connector needs to also perform this adjustment of the year values. This commit uses the JDK’s `TemporalAdjuster` interface and passes this down to the requisite temporal-related datatype handling code. The MySQL connector then provides its own `TemporalAdjuster` implementation that adjusts the year values via the excellend JDK `Temporal` methods.

A row in one of the MySQL test databases was changed to use a 2-digit year of `16` while the test method still checks that the year is still 2016`, verifying that the year value is properly adjusted.
2017-03-27 10:58:21 -05:00
Randall Hauch
613e6b6340 Fixed compilation warnings 2017-03-27 10:37:15 -05:00
Randall Hauch
7a72ed6ae6 Merge pull request #202 from don41382/upgrade-kafka-version-to-0.10.2.0
DBZ-203 Upgrade kafka version from 0.10.1.1 to 0.10.2.0
2017-03-17 17:21:48 -05:00
Randall Hauch
430d756062 [maven-release-plugin] prepare for next development iteration 2017-03-17 15:41:58 -05:00
Randall Hauch
536cbf6300 [maven-release-plugin] prepare release v0.4.1 2017-03-17 15:41:57 -05:00
Randall Hauch
e5ee3847dd DBZ-195 Added tests to try to replicate a reported issue
Added a table and inserted rows that tries to replicate the problem reported in DBZ-195, but the test was unable to replicate the problem. In fact, this really is no different than existing tests. Changed the log messages so that if/when this happens again it will be possible to know which row was problematic.
2017-03-17 10:47:32 -05:00
Felix Eckhardt
5d414c521a upgraded kafka version from 0.10.1.1 to 0.10.2.0 2017-03-17 11:36:15 +11:00
Randall Hauch
cf5391482a DBZ-198 Improved MySQL DDL parser to better handle blocks
The MySQL parser now properly handles control blocks such as `BEGIN…END`, `IF…END IF`, `REPEAT…END REPEAT`, and `LOOP…END LOOP`, even in cases where the block is preceded by and terminated by a label.
2017-03-16 13:32:21 -05:00
Randall Hauch
b48ccce4b5 DBZ-200 Corrected MySQL DDL parser to better handle column definitions
Apparently not all reserved words must be quoted when using them as colum names, so refactored MySQL’s DDL parser to better handle a variety of unquoted colum names that are reserved words.
2017-03-08 12:12:27 -06:00
Randall Hauch
d2986710a5 DBZ-188 More efficient GTID source filters for MySQL Connector
Changed the GTID source filters in the MySQL connector to be far more efficient when the filters specify literal UUIDs rather than regex patterns. In these cases, the predicate just checks whether a supplied value is in a hash set, and no regular expression patterns are used.

The GTID source filters can still be a combination of UUID literals and regular expressions, and the predicate will use the best implementation for each. For example, if the filters include all UUID literals, then regular expressions will never be used.
2017-02-10 11:34:24 -06:00
Randall Hauch
8c60c29883 [maven-release-plugin] prepare for next development iteration 2017-02-07 14:22:12 -06:00
Randall Hauch
20134286e9 [maven-release-plugin] prepare release v0.4.0 2017-02-07 14:22:11 -06:00
Randall Hauch
74e5ba6448 DBZ-176 Corrected MySQL DDL parser to support creating triggers with definers
The MySQL DDL parser was not correclty handling `DEFINER` clauses within `CREATE TRIGGER` or `CREATE EVENT` statements. Support for `DEFINER` clauses was recently added for the various forms of `CREATE PROCEDURE`, `CREATE FUNCTION` and `CREATE VIEW` statements. These are the only kinds of statements that have the definer attribute, per the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/stored-programs-security.html).
2017-02-02 12:44:28 -06:00
Randall Hauch
972cfbe2c4 DBZ-173 Additional fixes to KafkaDatabaseHistory class for Kafka 0.10.1.0
The KafkaDatabaseHistory class was not behaving well in tests using my local development environment. When restoring from the persisted Kafka topic, the class would set up a Kafka consumer and see repeated messages. It is unclear whether the repeats were due to our test environment and very short poll timeouts. Regardless, the restore logic was refactored to track offsets so as to only process messages once.
2017-02-01 14:47:41 -06:00
Horia Chiorean
d035c4bc8d DBZ-173 Changes the MySQL ITs to not use TZ information for expected dates and fixes the character set for parsing test files 2017-01-27 14:53:10 +02:00
Horia Chiorean
a2154d3d32 DBZ-173 Changes the MySQL ITs to use the database.hostname system property instead of always hardcoding 'localhost' 2017-01-27 09:19:57 +02:00
Horia Chiorean
7dfdef3558 DBZ-173 Upgrades the Kafka artifact versions to 0.10.1.1 2017-01-27 09:19:57 +02:00
Horia Chiorean
031c4a1552 DBZ-183 Fixes the BinlogReader's handling of TIMESTAMP columns to correctly account for timezones 2017-01-25 16:39:36 +02:00
Randall Hauch
f0db8d1b1f DBZ-179 Corrected JavaDoc in PostgreSQL connector
Corrected the JavaDoc and removed trailing spaces in the PostgreSQL connector code.
2017-01-20 11:51:17 -06:00
Randall Hauch
a73f85a80f Merge pull request #162 from rareddy/DBZ-177
DBZ-177: Providing an alternative way to create JDBC connection based …
2017-01-13 13:37:38 -06:00
Ramesh Reddy
a9aace3480 DBZ-177: Providing an alternative way to create JDBC connection based on the configured JDBC driver class name and supplied classloader. The loading/creating the JDBC connections is not reliable when driver libraries in a different classloader than the DriverManager. 2017-01-13 12:58:14 -06:00
Horia Chiorean
ae85656851 DBZ-3 Fixes topic naming to include the name of the server 2016-12-30 09:42:48 +02:00
Horia Chiorean
737614a555 DBZ-3 Implements a connector for streaming changes from a Postgres database
The version of the DB server required for this to work is at least 9.4. To be able to stream logical changes, the code relies on enhancements to the JDBC driver which are not yet public. Therefore, the current codebase includes the sources for the JDBC driver.
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
2016-12-27 14:44:32 +02:00
Horia Chiorean
23e3f59fa1 DBZ-3 Implements a connector for streaming changes from a Postgres database
The version of the DB server required for this to work is at least 9.4
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
2016-12-27 14:44:32 +02:00
Horia Chiorean
8e14f150db DBZ-3 Adds the structure for a Postgres connector which uses a Debezium Postgres docker image that has the decoderbufs plugin enabled to read WAL changes 2016-12-27 14:44:29 +02:00
Randall Hauch
5dceb05f69 DBZ-151 Additional changes to improve test framework and MySQL integration tests 2016-12-20 10:58:56 -06:00
Randall Hauch
a3bece4472 DBZ-151 Added new integration test framework for easily comparing output of connectors to expected results. 2016-12-20 09:18:09 -06:00
Randall Hauch
0851d8280c DBZ-166 Corrected shutdown logic of MySQL connector
The MySQL connector uses several threads, so previously upon connector shutdown these threads were simply cancelled. This is fine for the binlog reader (which can stop at any moment), but is a poor approach for the snapshot as we didn’t always properly release the database resources and also didn’t complete the writing of the DDL history.

With this change, the snapshot reader stops in a very controlled manner, basically by having the 10-step snapshot procedure frequently check whether the reader is to continue working, and to completely avoid thread interruption altogether. And, the snapshot procedure will always clean up its database resources (locks, transactions, etc.), even if the procedure is stopped before completion.

This change also refactors how the snapshot and binlog reader are managed. This is no longer done in the MySqlConnectorTask class (which is busy enough), but rather the logic has been encapsulated in a new `ChainedReader` that makes use of a new `Reader` interface. This makes testing of `ChainedReader` easier, and ensure that `ChainedReader` relies only upon the primary methods of `Reader` rather than upon `AbstractReader`. `ChainedReader` handles multiple readers generically, and ensures that when stopped the readers are all handled correctly and completely process all records, yet avoid accidentally starting a subsequent reader(s) when stopping the previous reader.
2016-12-15 10:55:18 -06:00
Randall Hauch
c762a221b7 DBZ-162 Corrected DDL parsing of MySQL functions
The MySQL DDL parser was not properly consuming function declarations. For functions, the parser consumes the entire statement without handline the various expressions within the function declaration, but the parser was not properly finding the end of the statement and instead was continuing to try to consume values beyond the end of the statement.

Specifically, when the parser consumes a `BEGIN`, it looks for a corresponding `END`. However, if it encountered an `END IF`, the `IF` plus any remaining tokens were left on the token stream and unprocessed. This confused the parser, which keep looking for statements and ultimately ended with a `No more content` error.

This case was replicated in integration tests, and the code fixed to properly find the end of the statements.
2016-12-06 17:34:52 -06:00
Sherafudheen PM
ee52219736 DBZ-160 - Issue while parsing create table script with ENUM type and default value 'b' 2016-12-02 17:42:44 +05:30
Randall Hauch
d80bc1bfd7 DBZ-153 MySQL connector supports enum and set values with parentheses
Changed the MySQL connector to support ENUM and SET literals with parentheses.
2016-11-14 12:22:08 -06:00
Randall Hauch
b0ded5f383 DBZ-147 Added ability to treat MySQL DECIMAL as double
By default the MySQL connector handles `DECIMAL` and `NUMERIC` columns using `java.math.BigDecimal` values and describing them using the `org.apache.kafka.connect.data.Decimal` schema type, which serializes the values to a binary form.

This change adds a configuration option that will keep the default behavior, but will instead allow handling `DECIMAL` adn `NUMERIC` values as Java `double` and a schema type of `FLOAT64`.
2016-11-09 11:27:09 -06:00
Randall Hauch
ea5f7983c7 DBZ-144 Corrected MySQL connector restart
Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped.

The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart.

This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.
2016-11-09 08:11:41 -06:00
Randall Hauch
207315e5df DBZ-146 Improved error handling of MySQL Connector
Improved the error handling of the MySQL connector to ensure that we’re always stopping the connector when we have a problem handling a binlog event or if we have problems starting.
2016-11-03 16:55:59 -05:00
Randall Hauch
99a86ad289 Merge pull request #112 from rhauch/dbz-123
DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
2016-10-07 17:16:37 -05:00
Randall Hauch
332de18384 Corrected headers 2016-10-07 17:16:27 -05:00
Randall Hauch
beb47dd2de DBZ-131 Improved logging while reading binlog
When the MySQL connector is reading the binlog, it outputs INFO log messages reporting status at an exponentially-increasing rate, starting at every 5 seconds and doubling until a max period of 1 hour. This output is useful when the connector starts to know that it is working, but thereafter the usefulness decreases. Once an hour is probably acceptable output.

This is not intended to replace the capturing of metrics, but is merely an aid to easily tell via the logs whether the connector continues to work.

Also improved the log message when the binlog reader stops to capture the total number of events recorded by Kafka Connect and the last recorded offset.
2016-10-07 17:10:01 -05:00
Randall Hauch
50eb4094ac DBZ-123 Corrected the MySQL DDL parser to properly handle bit-set literals
The DDL parser now properly handles bit-set literals, and several minor case-sensitivity bugs dealing with other escaped literals.
2016-10-06 13:25:38 -05:00
Randall Hauch
730603976d Merge pull request #107 from rhauch/dbz-123
DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
2016-09-21 15:22:00 -05:00
Randall Hauch
3e2d953b1a Merge pull request #103 from rhauch/dbz-122
DBZ-122 Prevent logging of password configuration property values
2016-09-21 15:15:02 -05:00
Randall Hauch
bcf60940db DBZ-123 Corrected MySQL Connector's support for BIT(n) columns
Corrected how the MySQL connector is treating columns of type `BIT(n)`, where _n_ is the number of bits in the value. When  `n=1`, the resulting values are booleans; when `n>1`, the resulting values are little endian `byte[]` that have the minimum number of bytes to hold the `n` bits.
2016-09-21 15:04:20 -05:00
Randall Hauch
9aae6c62d9 DBZ-124 Eliminated the JMX "already registered" warning in the MySQL connector
The `KafkaDatabaseHistory` was always creating a new producer whenever its `start()` method was called, even if it were called more than once. And, the `MySqlSchema` was calling `start()` twice, resulting in multiple producers being created and registered with JMX. Both issues were fixed.

Also, UUIDs were being used as the name of the JMX MBean for the producer, unless the `database.history.consumer.client.id` and `database.history.producer.client.id` properties were being explicitly set. Now, the MySQL connector will by default set the `client.id` property on both the database history's Kafka consumer and producer to `{connectorName}-dbhistory`. Of course, the `database.history.consumer.client.id` and `database.history.producer.client.id` properties can still be set to define the name of the producer and consumer.
2016-09-21 10:05:15 -05:00
Randall Hauch
54b737edc1 DBZ-114 MySQL connector now handles "zero-value" dates and timestamps
MySQL supports "zero-value" dates and timestamps, but these cannot be represented as valid dates or timestamps using the Java types. For example, the zero-value `0000-00-00` for a date has what Java considers to be an invalid month and day-of-the-month.

This commit changes how the MySQL connector handles these values to not throw exceptions. When columns allow nulls, such values will be treated as nulls; when columns do not allow null values, these values will be converted to a "zero-value" for the corresponding Java representation (e.g., the epoch day or timestamp). A new test case verifies the behaviors.
2016-09-21 09:23:12 -05:00
Randall Hauch
40c1398a95 DBZ-122 Prevent logging of password configuration property values
Anytime we `toString()` a `Configuration`, any values for password properties should be masked. A password property is defined to be a property whose key ends in "password" in a case-insensitive manner.
2016-09-15 15:20:55 -05:00
Randall Hauch
330a27ce52 Merge pull request #97 from rhauch/dbz-102
DBZ-102 MySQL connector support for column charsets
2016-08-29 15:12:24 -05:00
Randall Hauch
cc94bbc697 DBZ-102 MySQL connector now processes character sets
The MySQL binlog events contain the binary representation of string-like values as encoded per the column's character set. Properly decoding these into Java strings requires capturing the column, table, and database character set when parsing the DDL statements.

Unfortunately, MySQL DDL allows columns (at the time the columns are created or modified) to inherit the default character set for the table, or if that is not defined the default character set for the database, or if that is not defined the character set for the server. So, in addition to modifying the MySQL DDL parser to support capturing the character set name for each column, it also had to be changed to know what these default character set names are.

The default character sets are all available via MySQL server/session/local variables. Although strictly speaking the character set variables cannot be set globally, MySQL DDL does allow session and local variables to be set with `SET` statements. Therefore, this commit enhances the MySQL DDL parser to parse `SET` statements and to track the various global, session, and local variables as seen by the DDL parser. Upon connector startup, a subset of server variables (related to character sets and collations) are read from the database via JDBC and used to initialize the DDL parser via `SET` methods.

In addition to initializing the DDL parser with the system variables related to character sets and collation, it is important to also capture the server and database default character sets in the database history so that the correct character sets are used for columns even when the default character sets have changed on the database and/or the server. Therefore, upon startup or snapshot the MySQL connector records in the database history a `SET` statement for the `character_set_server` and `collation_server` system variables so that, upon a later restart, the history's DDL statements can be re-parsed with the correct default server and database character sets. Also, when the MySQL connector reloads the database history (upon startup), the recorded default server character set is compared with the MySQL instance's current server character set, and if they are different the current character set is recorded with a new `SET` statement.

These extra steps ensure that the connector use the correct character set for each column, even when the connector restarts and reloads the database history captured by a previous version of the connector. IOW, the MySQL connector can be safely upgraded, and the new version will correctly start using the columns' character sets to decode the string-like values.
2016-08-29 12:19:24 -05:00
Randall Hauch
257e81c540 DBZ-102 MySQL in-memory models of tables capture column character sets
The DDL parser and in-memory models of the relational schemas were changed to capture the character set for each column whose type is a string (e.g., `CHAR`, `VARCHAR`, etc.). This required handling `SET` statements used to change the system variables that hold the names of the default character set for the server and for each database. So, even if a column does not explicitly define the character set, the column's actual character set is identified from the table's character set, which might default to the current database's character set, which if not set defaults to the system character set.

These changes merely affect how MySQL DDL is parsed and the in-memory relational schema representation to accommodate the character set at various levels. It does not change the behavior of the MySQL connector; that will be done in a subsequent commit.

All tests pass with these changes, including quite a few additional tests for the new functionality.
2016-08-29 11:50:51 -05:00
Randall Hauch
638b459484 DBZ-108 Removed the TimeZoneAdapter and test, which is no longer used 2016-08-24 16:31:35 -05:00
Randall Hauch
4de56fd657 Merge pull request #94 from hchiorean/DZB-header-fix
Fixes the DBZ header required by checkstyle
2016-08-24 14:28:43 -05:00
Randall Hauch
ce2b2db80c DBZ-99 Added support for MySQL connector to connect securely to MySQL
Changed the MySQL connector to have several new configuration properties for setting up the SSL key store and trust store (which can be used in place of System or JDK properties) used for MySQL secure connections, and another property to specify what kind of SSL connection be used.

Modified several integration tests to ensure all MySQL connections are made with `useSSL=false`.
2016-08-24 13:27:35 -05:00
Horia Chiorean
2732d26ff0 Fixes the DBZ header required by checkstyle
This commit removes an extra space character from the first blank line of the header
2016-08-24 13:41:15 +03:00
Randall Hauch
448d514c81 DBZ-106 Corrected the MySQL DDL parser to properly handled quoted keywords as column names. 2016-08-23 17:03:53 -05:00
Randall Hauch
e86fb83459 [maven-release-plugin] prepare for next development iteration 2016-08-16 09:56:47 -05:00
Randall Hauch
ccdb0a1a63 [maven-release-plugin] prepare release v0.3.0 2016-08-16 09:56:47 -05:00
Randall Hauch
918a523f12 DBZ-100 Changed the MongoDB connector to use a new JSON semantic type
Added a semantic type for JSON strings, and used it in the MongoDB connector.
2016-08-15 12:11:35 -05:00
Randall Hauch
db49f0b17b DBZ-100 Removed unused IsoTimestamp and IsoTime semantic types 2016-08-15 12:11:35 -05:00
Randall Hauch
d8a5d2b50f DBZ-100 Corrected MySQL connector's use of ENUM and SET values
The ENUM and SET values read from the binlog contain the indexes of the options that are included in the value, but this doesn't compared with the string values returned by MySQL and JDBC that contain the comma-separated options. With this change, the values read from the binlog will also be comma-separated strings.
2016-08-15 12:11:35 -05:00
Randall Hauch
6b591fc9b0 DBZ-91 Added a unit test for temporal conversions
Also removed a non-unit-test test.
2016-08-15 10:29:16 -05:00
Randall Hauch
ba553c91e8 DBZ-91 Changed MicroTime to use INT64
There are more microseconds per day than can be represented with INT32, so this was changed to INT64.
2016-08-11 12:09:24 -05:00
Randall Hauch
19fc95fe08 DBZ-91 Simplified the temporal conversion functions to use primitives. 2016-08-11 10:48:38 -05:00
Randall Hauch
629542458e DBZ-91 Added option to force use Kafka Connect temporal types. 2016-08-11 10:48:07 -05:00
Randall Hauch
31641fb43e DBZ-91 Changed how temporal values are treated in MySQL connector
Rewrote how the MySQL connector converts temporal values to use schemas with names that identify the semantic
type of temporal value, and customized how the MySQL binlog client library creates Java object values from the
raw binlog events.

Several new "semantic" schema types were defined:

* `io.debezium.time.Year` represents a year number as an INT32 value (e.g., 2016, -345, etc.).
* `io.debezium.time.Date` represents a date by storing the epoch seconds (that is, the number of seconds past the epoch) as an INT64 value.
* `io.debezium.time.Time` represents a time by storing the milliseconds past midnight as an INT32 value.
* `io.debezium.time.MicroTime` represents a time by storing the microsconds past midnight as an INT32 value.
* `io.debezium.time.NanoTime` represents a time by storing the nanoseconds past midnight as an INT32 value.
* `io.debezium.time.Timestamp` represents a date and time (without timezone information) by storing the milliseconds past epoch as an INT64 value.
* `io.debezium.time.MicroTimestamp` represents a date and time (without timezone information) by storing the microseconds past epoch as an INT64 value.
* `io.debezium.time.NanoTimestamp` represents a date and time (without timezone information) by storing the nanoseconds past epoch as an INT64 value.
* `io.debezium.time.ZonedTime` represents a time with timezone and optional fractions of a second (but no date) by storing the ISO8601 form as a STRING value (e.g., `10:15:30+01:00`)
* `io.debezium.time.ZonedTimestamp` represents a date and time with timezone and optional fractions of a second by storing the ISO8601 form as a STRING value (e.g., `2011-12-03T10:15:30.030431+01:00`)

This range of semantic types allows for a far more accurate representation in the events of the temporal values stored within the database. The MySQL connector chooses the semantic type based upon the precision of the MySQL type (e.g., `TIMESTAMP(6)` will be represented with `io.debezium.time.MicroTimestamp`, whereas `TIMESTAMP(3)` will be represented with `io.debezium.time.Timestamp`). This ensures that the events do not lose precision and that the semantics of the database column values are retained in the events even though the values are represented with primitive values.

Obviously these Kafka Connect schema representations are different and more precise than the built-in `org.apache.kafka.connect.data.Date`, `org.apache.kafka.connect.data.Time`, and `org.apache.kafka.connect.data.Timestamp` logical types provided by Kafka Connect and used by the MySQL connector in all 0.2.x and 0.1.x versions. Migration to the new MySQL connector should be possible, although consumers may still need to know about these types to properly handle temporal values and the correct precision (i.e., consumers can just assume all date INT64 values represent milliseconds).

The MySQL binlog client library converted the raw binary event information to JDBC types using a local Calendar instance, which obviously incorporates the local timezone and cannot retain more than millisecond precision. This change extends the library's deserializers to instead use the Java 8 `javax.time` classes and to retain the exact semantics of the database values and to not lose any precisions (since the `javax.time` classes have nanosecond precision).

The same logic is also used to convert the JDBC values obtained during a snapshot from the MySQL Connect/J JDBC driver. The latter has a few quirks, such as not returning any fractional seconds for `TIME` columns, even though `java.sql.Time` can store up to milliseconds.

Most of the logic of the conversions of values and mapping to Kafka Connect schemas is handled in the new `JdbcValueConverters`, which was extracted from the existing `TableSchemaBuilder`. The MySQL connector reuses and actually extends the `JdbcValueConverters` class with its own `MySqlValueConverters` class that also adds support for MySQL-specific types such as `YEAR`. Other connectors whose values are based on JDBC types should be able to reuse and/or extend the `JdbcValueConverters` class.

Integration tests that deal with temporal types were modified to use proper expected values and comparisons.
2016-08-10 15:51:07 -05:00
Horia Chiorean
ab24f013d1 DBZ-96 Removes some asserts on tables created by another test case 2016-08-08 14:25:38 +03:00
Randall Hauch
2ae26819af DBZ-94 Added support for copying very large tables during snapshot
By default the MySQL JDBC driver will put the entire result set into memory, which obviously doesn't work for tables of even moderate sizes. This change adds support for streaming rows in result sets when the tables have more than a configurable number of rows (defaults to 1,000).

This posed a problem for how we were previously finding the last row in the last table; the MySQL driver does not support `ResultSet.isLast()` on result sets that are streamed. Instead, this commit wraps the consumer to which the snapshot reader writes all source records, with a consumer that buffers the last record. When the snapshot completes, the offset is updated (denoting the end of the snapshot) and set on the last buffered record before that record is flushed to the normal consumer. This should add minimal overhead while simplifying the logic to ensure the last source record has the updated offset.

This also improves the log output of the snapshot process.
2016-08-04 16:06:50 -05:00
Horia Chiorean
bb1b7d5734 DBZ-92 Adds more logging information during MySQL snapshot recreation 2016-08-03 16:54:17 +03:00
Randall Hauch
8cb39eacf0 Reverted back to 0.3.0-SNAPSHOT, since the 0.3 candidate release was not acceptable. 2016-08-01 12:25:58 -05:00
Horia Chiorean
a6dddaed92 Fixes a couple of test related issues for debezium-core
* fixes a java.sql.Date conversion test to take into account zone offsets
* makes sure the ZK DB is closed during testing, otherwise file handles may leak and cause test failures
2016-07-26 14:17:31 +03:00
Randall Hauch
517272278d [maven-release-plugin] prepare for next development iteration 2016-07-25 17:50:31 -05:00
Randall Hauch
b89296e646 [maven-release-plugin] prepare release v0.3.0 2016-07-25 17:50:31 -05:00