tet123

Author	SHA1	Message	Date
Randall Hauch	774f105670	Merge pull request #85 from hchiorean/DBZ-95 DBZ-95 Adds support for `null` binlog filename in certain cases	2016-08-10 15:15:24 -05:00
Horia Chiorean	616e7dea72	DBZ-95 Adds support for `null` binlog filename in certain cases	2016-08-09 20:03:43 +03:00
Horia Chiorean	008263ea00	DBZ-92, DBZ-97 Makes logging more verbose and changes the snapshot reader to produce separate events for each DDL change	2016-08-09 19:04:51 +03:00
Horia Chiorean	ab24f013d1	DBZ-96 Removes some asserts on tables created by another test case	2016-08-08 14:25:38 +03:00
Randall Hauch	2ae26819af	DBZ-94 Added support for copying very large tables during snapshot By default the MySQL JDBC driver will put the entire result set into memory, which obviously doesn't work for tables of even moderate sizes. This change adds support for streaming rows in result sets when the tables have more than a configurable number of rows (defaults to 1,000). This posed a problem for how we were previously finding the last row in the last table; the MySQL driver does not support `ResultSet.isLast()` on result sets that are streamed. Instead, this commit wraps the consumer to which the snapshot reader writes all source records, with a consumer that buffers the last record. When the snapshot completes, the offset is updated (denoting the end of the snapshot) and set on the last buffered record before that record is flushed to the normal consumer. This should add minimal overhead while simplifying the logic to ensure the last source record has the updated offset. This also improves the log output of the snapshot process.	2016-08-04 16:06:50 -05:00
Horia Chiorean	bb1b7d5734	DBZ-92 Adds more logging information during MySQL snapshot recreation	2016-08-03 16:54:17 +03:00
Randall Hauch	b9e9f0fdf9	Corrected build to look for updated output of alt-mysql container The mysql:5.7 docker image changed its output to be more like mysql/mysql-server:5.7, and this broke our build because of what our build is looking for while waiting to for the server to completely intialize. Simply changing the pattern corrects the problem.	2016-08-03 08:24:13 -05:00
Randall Hauch	6894e9c30d	Merge pull request #78 from hchiorean/mysql-tests-fix Fixes some more tests around date handling in the MySQL connector	2016-08-02 20:12:04 -05:00
Randall Hauch	8cb39eacf0	Reverted back to 0.3.0-SNAPSHOT, since the 0.3 candidate release was not acceptable.	2016-08-01 12:25:58 -05:00
Horia Chiorean	eaf295fbf0	Fixes some more tests around date handling in the MySQL connector	2016-07-29 08:57:47 +03:00
Randall Hauch	517272278d	[maven-release-plugin] prepare for next development iteration	2016-07-25 17:50:31 -05:00
Randall Hauch	b89296e646	[maven-release-plugin] prepare release v0.3.0	2016-07-25 17:50:31 -05:00
Randall Hauch	e3a00e1992	DBZ-87 Added support for SIGNED in all numeric types in MySQL	2016-07-25 16:07:56 -05:00
Randall Hauch	447acb797d	DBZ-62 Upgraded to Kafka and Kafka Connect 0.10.0.0 Upgraded from Kafka 0.9.0.1 to Kafka 0.10.0. The only required change was to override the `Connector.config()` method, which returns `null` or a `ConfigDef` instance that contains detailed metadata for each of the configuration fields, including supporting recommended values and marking fields as not visible (e.g., if they don't make sense given other configuration field values). This can be used by user interfaces to data-drive the configuration of a connector. Also, the default validation logic of the Connector implementations uses a `Validator` that is pretty restrictive in its functionality. Debezium already had a fairly decent and simple `Configuration` framework. After several attempts to try and merge these concepts, reconciling the two validation mechanisms was very complicated and involved a lot of changes. It was easier to simply continue Debezium-specific validation and to override the `Connector.validate(...)` method to use Debezium's `Configuration`-based validation. Connector-based validation logic includes determining recommended values, so Debezium's `Field` class (used to define each configuration property) was enhanced with a new `Recommender` class that is similar to Kafka's. Additional integration tests were added to verify that the `ConfigDef` result is acceptable and that the new connector validation logic works as expected, including getting recommended values for some fields (e.g., database names, table/collection names) from MySQL and MongoDB by connecting and dynamically reading the values. This was done in a way that remains backward compatible with the regular expression formats of these fields, but in a user interface that uses the `ConfigDef` mechanism the user can simply select the databases and table/collection identifiers.	2016-07-25 14:21:31 -05:00
Randall Hauch	30777e3345	DBZ-85 Added test case and made correction to temporal values Added an integration test case to diagnose the loss of the fractional seconds from MySQL temporal values. The problem appears to be a bug in the MySQL Binary Log Connector library that we used, and this bug was reported as https://github.com/shyiko/mysql-binlog-connector-java/issues/103. That was fixed in version 0.3.2 of the library, which Stanley was kind enough to release for us. During testing, though, several issues were discovered in how temporal values are handled and converted from the MySQL events, through the MySQL Binary Log client library, and through the Debezium MySQL connector to conform with Kafka Connect's various temporal logical schema types. Most of the issues involved converting most of the temporal values from local time zone (which is how they are created by the MySQL Binary Log client) into UTC (which is how Kafka Connect expects them). Really, java.util.Date doesn't have time zone information and instead tracks the number of milliseconds past epoch, but the conversion of normal timestamp information to the milliseconds past epoch in UTC depends on the time zone in which that conversion happens.	2016-07-20 17:07:56 -05:00
Randall Hauch	a5f4d0bf31	DBZ-87 Changed mapping of MySQL TINYINT and SMALLINT columns from INT32 to INT16 The MySQL connector now maps TINYINT and SMALLINT columns to INT16 (rather than INT32) because INT16 is smaller and yet still large enough for all TINYINT and SMALLINT values. Note that the range of TINYINT values is either -128 to 127 for signed or 0 to 255 for unsigned, and thus INT8 is not an acceptable choice since it can only handle values in the range 0 to 255. Additionally, the JDBC Specification also suggests the proper Java type for SQL-99's TINYINT is short, which maps to Kafka Connect's INT16. This change will be backward compatible, although the generated Kafka Connect schema will be different than in previous versions. This shouldn't cause a problem, since clients should expect to handle schema changes, and this schema change does comply with Avro schema evolution rules.	2016-07-19 11:11:05 -05:00
Randall Hauch	04eef2da5c	DBZ-84 Tried to replicate error with MySQL TINYINT columns Tried unsuccessfully to replicate the problem reported in DBZ-84 with a new regression integration test.	2016-07-19 10:58:28 -05:00
Randall Hauch	ed1b494fdf	DBZ-86 Cleaned up logging and error message in MySQL connector	2016-07-15 16:37:47 -05:00
Randall Hauch	a88bcb9ae7	DBZ-86 Generated Kafka Schema names will now also be valid Avro fullnames	2016-07-15 16:29:52 -05:00
Randall Hauch	85c9d4e5fe	Merge pull request #35 from rhauch/dbz-2 DBZ-2 Created Maven module with a MongoDB connector	2016-07-14 14:11:35 -05:00
Randall Hauch	12e7cfb8d3	DBZ-2 Created initial Maven module with a MongoDB connector Added a new `debezium-connector-mongodb` module that defines a MongoDB connector. The MongoDB connector can capture and record the changes within a MongoDB replica set, or when seeded with addresses of the configuration server of a MongoDB sharded cluster, the connector captures the changes from the each replica set used as a shard. In the latter case, the connector even discovers the addition of or removal of shards. The connector monitors each replica set using multiple tasks and, if needed, separate threads within each task. When a replica set is being monitored for the first time, the connector will perform an "initial sync" of that replica set's databases and collections. Once the initial sync has completed, the connector will then begin tailing the oplog of the replica set, starting at the exact point in time at which it started the initial sync. This equivalent to how MongoDB replication works. The connector always uses the replica set's primary node to tail the oplog. If the replica set undergoes an election and different node becomes primary, the connector will immediately stop tailing the oplog, connect to the new primary, and start tailing the oplog using the new primary node. Likewise, if connector experiences any problems communicating with the replica set members, it will try to reconnect (using exponential backoff so as to not overwhelm the replica set) and continue tailing the oplog from where it last left off. In this way the connector is able to dynamically adjust to changes in replica set membership and to automatically handle communication failures. The MongoDB oplog contains limited information, and in particular the events describing updates and deletes do not actually have the before or after state of the documents. Instead, the oplog events are all idempotent, so updates contain the effective changes that were made during an update, and deletes merely contain the deleted document identifier. Consequently, the connector is limited in the information it includes in its output events. Create and read events do contain the initial state, but the update contain only the changes (rather than the before and/or after states of the document) and delete events do not have the before state of the deleted document. All connector events, however, do contain the local system timestamp at which the event was processed and _source_ information detailing the origins of the event, including the replica set name, the MongoDB transaction timestamp of the event, and the transactions identifier among other things. It is possible for MongoDB to lose commits in specific failure situations. For exmaple, if the primary applies a change and records it in its oplog before it then crashes unexpectedly, the secondary nodes may not have had a chance to read those changes from the primary's oplog before the primary crashed. If one such secondary is then elected as primary, it's oplog is missing the last changes that the old primary had recorded and no longer has those changes. In these cases where MongoDB loses changes recorded in a primary's oplog, it is possible that the MongoDB connector may or may not capture these lost changes.	2016-07-14 13:02:36 -05:00
Randall Hauch	cc68a1beb7	DBZ-83 Correctly handle MySQL REFERENCES clause	2016-06-27 13:02:57 -05:00
Randall Hauch	f0d67143bd	DBZ-82 Changed snapshot query to support pre-5.6.5 versions of MySQL	2016-06-27 09:23:12 -05:00
Randall Hauch	1c7aabf14f	Changed MySQL file comment format to use standard prefix	2016-06-22 18:19:50 -05:00
Randall Hauch	a589d9ea84	DBZ-79 Changed public methods in GtidSet to reflect the MySQL Binary Log Connector's class Removed several of the `GtidSet` convenience methods that are not in the [improved](https://github.com/shyiko/mysql-binlog-connector-java/pull/100) `com.github.shyiko.mysql.binlog.GtidSet` class. Getting these out of our API will make it easier to reuse the improved `com.github.shyiko.mysql.binlog.GtidSet` class.	2016-06-16 10:04:02 -05:00
Randall Hauch	d9cca5d254	DBZ-77 Corrected completion of offset snapshot mode The snapshot mode within the offsets now are marked as complete with the last source record produced during the snapshot. This is the only sure way to update the offset. Note that the `source` field shows the snapshot is in effect for _all_ records produced during the snapshot, including the very last one. This distinction w/r/t the offset was made possible due to recent changes for DBZ-73. Previously, when the snapshot reader completed all generation of records, it then attempted to record an empty DDL statement. However, since this statement had no net effect on the schemas, no source record was produced and thus the offset's snapshot mode was never changed. Consequently, if the connector were stopped immediately after the snapshot completed but before other events could be read or produced, upon restart the connector would perform another snapshot.	2016-06-15 12:01:16 -05:00
Randall Hauch	ed27faa5f6	DBZ-73 Added unit tests to verify behavior of SourceInfo	2016-06-15 11:51:42 -05:00
Randall Hauch	49322dc9c1	DBZ-73, DBZ-76 Corrected how binlog coordinates are recorded and put into change events Fixes two issues with how the binlog coordinates are handled. The first, DBZ-73, fixes how the offsets are recording the _next_ binlog coordinates within the offsets, which is fine for single-row events but which can result in dropped events should Kafka Connect flush the offset of some but not all of the rows before the Kafka Connect crashes. Upon restart, the offset contains the binlog coordinates for the _next_ event, so any of the last rows from the previous events will be lost. With this fix, the offset used with all but the last row (in the binlog event) has the binlog coordinates of the current event, with the event row number set to be the next row that needs to be processed. The offset for the last row will have the binlog coordinates of the next event. The second issue, DBZ-76, is somewhat related: the `source` field of the change events has the binlog coordinates of the _next_ issue. The fix involves putting the binlog coordinates for the _current_ event into the `source` field. Both of these issues are related and influenced a fix that could address both problems. Essentially, the `SourceInfo` is now recording the previous and next position, and the next and previous row numbers. The offset is created with parameters that specify the row number and the total number of rows, so this method correctly adjusts the binlog coordinates of the offset. The `struct` field produces the value for the `source` field, and it is always using the previous position and previous row number that reflect the change event in which it is used.	2016-06-14 17:43:58 -05:00
Randall Hauch	270150bcad	DBZ-72 Corrected the naming of the Schemas for the keys and values	2016-06-09 21:30:29 -05:00
Randall Hauch	0f3ed9f50f	DBZ-71 Corrected MySQL connector plugin archives and upgraded MySQL JDBC driver from 5.1.38 to 5.1.39 (the latest)	2016-06-09 21:15:34 -05:00
Randall Hauch	6749518f66	[maven-release-plugin] prepare for next development iteration	2016-06-08 13:00:50 -05:00
Randall Hauch	d5bbb116ed	[maven-release-plugin] prepare release v0.2.0	2016-06-08 13:00:50 -05:00
Randall Hauch	ff49ba1742	DBZ-37 Renamed MySQL Docker images used in integration tests	2016-06-08 11:45:35 -05:00
Randall Hauch	d63a2e17a0	DBZ-37 Added documentation of various profiles to the MySQL module's README	2016-06-08 11:19:03 -05:00
Randall Hauch	3c7882ee9d	DBZ-37 Run integration tests against MySQL and MySQL w/ GTIDs Changed the build so that the `assembly` profile runs the MySQL integration tests three times, once against each of the three MySQL configurations: # MySQL server w/o GTIDs # MySQL server w/ GTIDs # The Docker team's MySQL server image w/o GTIDs The normal profiles are still available: # The default profile runs the integration tests once against MySQL server w/o GTIDs # `gtid-mysql` runs the integration tests against MySQL server w/ GTIDs # `alt-mysql` runs the integration tests against the Docker team's MySQL server image w/o GTIDs # `skip-integration-tests` (or `-DskipITs`) skips the integration tests altogether	2016-06-08 11:03:03 -05:00
Randall Hauch	cf26a5c4e0	Removed duplicate versions in POMs	2016-06-08 09:46:05 -05:00
Randall Hauch	a143871abd	DBZ-61 Improved MySQL connector's handling of binary values Binary values read from the MySQL binlog may include strings, in which case they need to be converted to binary values. Interestingly, work on this uncovered [KAFKA-3803](https://issues.apache.org/jira/browse/KAFKA-3803) whereby Kafka Connect's `Struct.equals` method does not properly handle comparing `byte[]` values. Upon researching the problem and potentially supplying a patch, it was discovered that the Kafka Connect codebase and the Avro converter all use `ByteBuffer` objects rather than `byte[]`. Consequently, the Debezium code that converts JDBC values to Kafka Connect values was changed to return `ByteBuffer` objects rather than `byte[]` objects. Unfortunately, the JSON converter rehydrates objects with just `byte[]`, so that still means that Debezium's `VerifyRecords` logic cannot rely upon `Struct.equals` for comparison, and instead needs custom logic.	2016-06-07 17:53:07 -05:00
Randall Hauch	f48d48e114	DBZ-37 Added integration test with MySQL GTIDs Added a Maven profile to the MySQL connector component with a Docker image that runs MySQL with GTIDs enabled. The same integration tests can be run with it using `-Pgtid-mysql` or `-Dgtid-mysql` in the Maven build. When the MySQL connector starts up, it now queries the MySQL server to detect whether GTIDs are enabled, and if they are it will also verify that any GTID sets from the most recently recorded offset are still available in the MySQL server (similarly to how it was already doing this for binlog filenames). If the server does not have the correct coordinates/GTIDs, the connector fails with a useful error message. This commit also tests and adjusts the `GtidSet` class to better deal with comparisons of GTID sets for proper ordering. It also changes the connector to output MySQL's timestamp for each event using _second_ precision rather than artificially in _millisecond_ precision. To clarify the different, this change renames the field in the event's `source` structure that records the MySQL timestamp from `ts` to `ts_sec`. Similarly, the envelope's field that records the time that the connector processed each record was renamed from `ts` to `ts_ms`. All unit and integration tests pass with the default profile and with the new GTID-enabled profile.	2016-06-07 12:01:51 -05:00
Randall Hauch	a276d983f5	DBZ-37 Changed several constants related to MySQL offsets. This does not affect the offsets themselves.	2016-06-04 16:32:26 -05:00
Randall Hauch	e91aac5b18	DBZ-37 DatabaseHistory can now use custom logic to compare offsets DatabaseHistory stores the DDL changes with the offset describing the position in the source where those DDL statements were found. When a connector restarts at a specific offset (supplied by Kafka Connect), connectors such as the MySQL connector reconstruct the database schemas by having DatabaseHistory load the history starting from the beginning and stopping at (or just before) the connector's starting offset. This change allows connectors to supply a custom comparison function. To support GTIDs, the MySQL connector needed to store additional information in the offsets. This means the logic needed to compare offsets with and without GTIDs is non-trivial and unique to the MySQL connector. This commit adds a custom comparison function for offsets. Per [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-failover.html), slaves are always expected to start with the same set of GTIDs as the master, so no matter which the MySQL connector follows it should always have the complete set of GTIDs seen by that server. Therefore: * Two offsets with GTID sets can be compared using only the GTID sets. * Any offset with a GTID set is always assumed to be newer than an offset without, since it is assumed once GTIDs are enabled they will remain enabled. (Otherwise, the connector likely needs to be restarted with a snapshot and tied to a specific master or slave with no failover.) * Two offsets without GTIDs are compared using the binlog coordinates (filename, position, and row number). * An offsets that is identical to another except for being in snapshot mode is considered earlier than without the snapshot. This is because snapshot mode begins by recording the position of the snapshot, and once complete the offset is recorded without the snapshot flag.	2016-06-04 16:20:26 -05:00
Randall Hauch	8fd89dacbf	DBZ-37 Corrected JavaDoc	2016-06-02 19:06:13 -05:00
Randall Hauch	655aac7d4f	DBZ-37 Added support for MySQL GTIDs The BinlogClient library our MySQL connector uses already has support for GTIDs. This change makes use of that and adds the GTIDs from the server to the offsets created by the connector and used upon restarts.	2016-06-02 18:30:26 -05:00
Randall Hauch	264a9041df	DBZ-64 Added Avro Converter to record verification utilities The `VerifyRecord` utility class has methods that will verify a `SourceRecord`, and is used in many of our integration tests to check whether records are constructed in a valid manner. The utility already checks whether the records can be serialized and deserialized using the JSON converter (provided with Kafka Connect); this change also checks with the Avro Converter (which produces much smaller records and is more suitable for production). Note that version 3.0.0 of the Confluent Avro Converter is required; version 2.1.0-alpha1 could not properly handle complex Schema objects with optional fields (see https://github.com/confluentinc/schema-registry/pull/280). Also, the names of the Kafka Connect schemas used in MySQL source records has changed. # The record's envelope Schema used to be "<serverName>.<database>.<table>" but is now "<serverName>.<database>.<table>.Envelope". # The Schema for record keys used to be named "<database>.<table>/pk", but the '/' character is not valid within a Avro name, and has been changed to "<serverName>.<database>.<table>.Key". # The Schema for record values used to be named "<database>.<table>", but to better fit with the other Schema names it has been changed to "<serverName>.<database>.<table>.Value". Thus, all of the Schemas for a single database table have the same Avro namespace "<serverName>.<database>.<table>" (or "<topicName>") with Avro schema names of "Envelope", "Key", and "Value". All unit and integration tests pass.	2016-06-02 16:54:21 -05:00
Randall Hauch	46c0ce9882	DBZ-58 Added MDC logging contexts to connector Changed the MySQL connector to make use of MDC logging contexts, which allow thread-specific parameters that can be written out on every log line by simply changing the logging configuration (e.g., Log4J configuration file). We adopt a convention for all Debezium connectors with the following MDC properties: * `dbz.connectorType` - the type of connector, which would be a single well-known value for each connector (e.g., "MySQL" for the MySQL connector) * `dbz.connectorName` - the name of the connector, which for the MySQL connector is simply the value of the `server.name` property (e.g., the logical name for the MySQL server/cluster). Unfortunately, Kafka Connect does not give us its name for the connector. * `dbz.connectorContext` - the name of the thread, which is "main" for thread running the connector; the MySQL connector uses "snapshot" for the thread started by the snapshot reader, and "binlog" for the thread started by the binlog reader. Different logging frameworks have their own way of using MDC properties. In a Log4J configuration, for example, simply use `%X{name}` in the logger's layout, where "name" is one of the properties listed above (or another MDC property).	2016-06-02 14:05:06 -05:00
Randall Hauch	aca863c225	DBZ-31 Write MySQL schema changes to topic by default	2016-06-02 11:00:04 -05:00
Randall Hauch	58a5d8c033	DBZ-31 Added support for possibly performing snapshot upon startup Refactored the MySQL connector to break out the logic of reading the binlog into a separate class, added a similar class to read a full snapshot, and then updated the MySQL connector task class to use both. Added several test cases and updated the existing tests.	2016-06-01 21:40:53 -05:00
Randall Hauch	e6c0ff5e4d	DBZ-31 Refactored the MySQL Connector Several of the MySQL connector classes were fairly large and complicated, and to prepare for upcoming changes/enhancements these larger classes were refactored to pull out units of functionality. Currently all unit tests pass with these changes, with additional unit tests for these new components.	2016-05-26 15:58:58 -05:00
David Chen	339f03859c	DBZ-63 Fix POM dependency management. Thanks for the reminding from https://issues.jboss.org/browse/DBZ-63\?focusedCommentId\=13242595\&page\=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel\#comment-13242595	2016-05-25 15:21:45 +01:00
David Chen	b1a71318df	DBZ-63 Rename "server-id" to "server_id" to fix org.apache.avro.SchemaParseException: Illegal character in: server-id	2016-05-25 14:33:20 +01:00
Randall Hauch	dc5a379764	DBZ-55 Corrected filtering of DDL statements based upon affected database Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. However, that database name is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database _affected_ by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database). With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements is still maintained, but since each DDL statement can apply to a separate database the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting "db1" will get batched together into one schema change record, followed by one statement affecting "db2" as a second schema change record, followed by another statement affecting "db1" as a third schema record. Meanwhile, this change does not affect how the database history records the changes: it still records them as submitted using a single record for each separate binlog event/position. This is much safer as each binlog event (with specific position) is written atomically to the history stream. Also, since the database history stream is what the connector uses upon recovery, the database history records are now written _after_ any schema change records to ensure that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).	2016-05-23 11:01:27 -05:00
Randall Hauch	bb40875b2b	DBZ-45 Confirmed and tested support for 'before' and 'after' states in UPDATE events Added integration test logic to verify that UPDATE events include both 'before' and 'after' states (previously added as part of DBZ-52), to verify that altering a table does not generate events for the rows in that table, and that the 'before' and 'after' states (read from the binlog) are always defined in terms of the _current_ table schema. IOW, no special logic is needed to handle a 'before' state that has different columns than defined in the current table's definition.	2016-05-20 12:06:06 -05:00
Randall Hauch	b6ca57c6c1	DBZ-45 Added proper support for FIRST and AFTER clauses in an ALTER TABLE column definition Added support for properly handling an ALTER TABLE statement that adds columns AFTER another existing column.	2016-05-20 12:05:53 -05:00
Randall Hauch	47a93b3ae1	DBZ-60 Added MySQL server ID and timestamp to event's source info Added to the Debezium event message's `source` information the MySQL server ID for the cluster process that recorded the event and the MySQL timestamp at which the event was recorded.	2016-05-20 09:32:35 -05:00
Randall Hauch	b3167cd264	DBZ-57 Added unit test to confirm that DDL parser supports multi-column primary keys	2016-05-20 08:33:39 -05:00
Randall Hauch	c20b49a8fc	DBZ-57 Added support for the shortened CHARSET alias for CHARACTER SET in MySQL DDL statements Added explicit support for handling `CHARSET` as an alias for `CHARACTER SET` in both tables and columns. `CREATE DATABASE` and `ALTER DATABASE` statements can also specify character sets, but the DDL parser handles but does not explicitly parse them so no modification is needed for them. Several unit tests were added to confirm the behavior.	2016-05-20 08:23:50 -05:00
Randall Hauch	e06f5c596c	DBZ-43 Added explicit checking and validation of Schemas and Structs in integration tests	2016-05-19 17:06:22 -05:00
Randall Hauch	07315f2b4b	DBZ-43 Changed form of schema change topic to use schemas	2016-05-19 16:54:22 -05:00
Randall Hauch	c0b7114424	DBZ-52 Added top-level container structure to all messages The new envelope Struct contains fields for the local time at which the connector processed the event, the kind of operation (e.g., read, insert, update, or delete), the state of the record before and after the change, and the information about the event source. The latter two items are connector-specific. The timestamp is merely the time using the connector's process clock, and no guarantees are provided about accuracy, monotonicity, or relationship to the original source event. The envelope structure is now used as the value for each event message in the MySQL connector; they keys of the event messages remain unchanged. Note that to facilitate Kafka log compaction (which requires a null value), a delete event containing the envelope with details about the deletion is followed by a "tombstone" event that contains the same key but null value. An example of a message value with this new envelope is as follows: { "schema" : { "type" : "struct", "fields" : [ { "type" : "struct", "fields" : [ { "type" : "int32", "optional" : false, "name" : "org.apache.kafka.connect.data.Date", "version" : 1, "field" : "order_date" }, { "type" : "int32", "optional" : false, "field" : "purchaser" }, { "type" : "int32", "optional" : false, "field" : "quantity" }, { "type" : "int32", "optional" : false, "field" : "product_id" } ], "optional" : true, "name" : "connector_test.orders", "field" : "before" }, { "type" : "struct", "fields" : [ { "type" : "int32", "optional" : false, "name" : "org.apache.kafka.connect.data.Date", "version" : 1, "field" : "order_date" }, { "type" : "int32", "optional" : false, "field" : "purchaser" }, { "type" : "int32", "optional" : false, "field" : "quantity" }, { "type" : "int32", "optional" : false, "field" : "product_id" } ], "optional" : true, "name" : "connector_test.orders", "field" : "after" }, { "type" : "struct", "fields" : [ { "type" : "string", "optional" : false, "field" : "server" }, { "type" : "string", "optional" : false, "field" : "file" }, { "type" : "int64", "optional" : false, "field" : "pos" }, { "type" : "int32", "optional" : false, "field" : "row" } ], "optional" : false, "name" : "io.debezium.connector.mysql.Source", "field" : "source" }, { "type" : "string", "optional" : false, "field" : "op" }, { "type" : "int64", "optional" : true, "field" : "ts" } ], "optional" : false, "name" : "kafka-connect-2.connector_test.orders", "version" : 1 }, "payload" : { "before" : null, "after" : { "order_date" : 16852, "purchaser" : 1003, "quantity" : 1, "product_id" : 107 }, "source" : { "server" : "kafka-connect-2", "file" : "mysql-bin.000002", "pos" : 2887680, "row" : 4 }, "op" : "c", "ts" : 1463437199134 } } Notice how the Schema is significantly larger, since it must describe all of the envelope's fields even when those fields are not used. In this case, the event signifies that a record was created as the 4th record of a single event recorded in the binlog.	2016-05-19 12:40:16 -05:00
Randall Hauch	e6710a5300	DBZ-44 Generate a tombstone for old key when row's key is change When a row is updated in the database and the primary/unique key for that table is changed, the MySQL connector continues to generate an update event with the new key and new value, but now also generates a tombstone event for the old key. This ensures that when a Kafka topic is compacted, all prior events with the old key will (eventually) be removed. It also ensures that consumers see that the row represented by the old key has been removed.	2016-05-13 17:43:29 -05:00
Randall Hauch	97d5caa2db	DBZ-49 MySQL DDL parser is more tolerant of REFERENCE clauses in CREATE TABLE statements MySQL 5.6 using the MyISAM engine will create the `help_relation` system table using a CREATE TABLE statement that does not have in the columns' REFERENCE clause a list of columns in the referenced table. MySQL 5.7 using the InnoDB engine does not include the REFERENCE clauses. Because Debezium's MySQL DDL parser is meant only to understand the statements recorded in the binlog, it does not have to validate the statements and therefore the DDL parser can be a bit more lenient by not requiring the list of columns in a REFERENCE clause in a CREATE TABLE statement's column definitions. This commit also adds several unit tests that validate all of the DDL statements used by MySQL 5.6 and 5.7 during startup (in the configurations used in our integration tests).	2016-05-13 09:32:47 -05:00
Randall Hauch	b1e6eb1028	DBZ-29 Refactored ColumnMappers and enabled ColumnMapper impls to add parameters to the Kafka Connect Schema.	2016-05-12 12:26:04 -05:00
Randall Hauch	18995abfbd	Merge pull request #38 from rhauch/dbz-29 DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns	2016-05-12 08:27:15 -05:00
Randall Hauch	ff9d0fc240	DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns Changed the MySQL connector to use comma-separated lists of regular expressions for the database and table whitelist/blacklists. Literals are still accepted and will match fully-qualified table names, although the '.' character used as a delimiter is also a special character in regular expressions and therefore may need to be escaped with a double backslash ('\\') to more carefully match fully-qualified table names. Added several new configuration properties for the MySQL connector that instruct it to hide, truncate, and/or mask certain columns. The properties' values are all lists of regular expressions or literal fully-qualified column names. For example, the following configuration property: column.blacklist=server.users.picture,server.users.other will cause the connector to leave out of change event messages for the `server.users` table those fields that correspond to the `picture` and `others` columns. This capability can be used to This capability can be used to prevent dissemination of sensitive information in the change event stream. An alternative to blacklisting is masking. The following configuration property: column.mask.with.10.chars=server\\.users\\.(\\wemail) will cause the connector to mask in the change event messages for the `server.users` table all values for columns whose name ends in `email`. The values will be replaced in this case with a constant string of 10 asterisk ('') characters, even when the email value is null. This capability can also be used to prevent dissemination of sensitive information in the change event stream. Another option is to truncate string values for specific columns. The following configuration property: column.truncate.to.120.chars=server[.]users[.](description\|biography) will cause the connector to truncate to at most 120 characters the values of the `description` and `biography` columns in the change event messages for the `server.users` table. Although this example used a limit of 120 characters, any positive length can be specified; separate properties should be used when different lengths are required. Note how the '.' delimiter in the fully-qualified names is escaped since that same character is a special character in regular expressions. This capability can be used to reduce the size of change event messages.	2016-05-11 15:57:06 -05:00
Christian Posta	8b736ef654	DBZ-48 Cannot parse COMMIT and flush statements	2016-05-05 15:36:24 -07:00
Christian Posta	ab2cdce279	DBZ-42 inherit from mysql images and add the custom config and startup scripts useful for integration testing	2016-04-26 08:49:27 -07:00
Randall Hauch	1fcb4b02cf	DBZ-38 Changed DROP VIEW and TABLE to include single-table statements in events Drop table/view statements that involve more than one table generate one event for each table/view. Previously, each of those statements had the original multi-table/view statement. Now, each event has a statement that applies to only that table (generated from the original with all the same clauses).	2016-04-12 18:18:13 -05:00
Randall Hauch	b1e428c986	DBZ-38 Adjusted how events are generated for RENAME TO statements The previous change did not correctly capture the statements for a `RENAME TO` that renamed multiple tables, so fixed the code so that it generates a single `RENAME TO` for each table rename.	2016-04-12 17:58:07 -05:00
David Chen	eeff81b65d	MySqlDdlParser should support "RENAME TABLE blue_table TO red_table, orange_table TO green_table, black_table TO white_table;" form. (#1 )	2016-04-12 17:40:00 -05:00
Randall Hauch	5b30568650	DBZ-38 Changed the listening framework of the DDL parser Refactored the mechanism by which components can listen to the activities of a DDL parser. The new approach should be significantly more flexible for additional types of DDL events while making it easier to maintain backward compatibility. It also will enable passing event-specific information on each DDL event.	2016-04-12 11:00:02 -05:00
Randall Hauch	137b9f6d4d	DBZ-38 Changed the DDL parser framework to notify listeners as statements are applied.	2016-04-11 15:16:04 -05:00
Randall Hauch	8f5487b2c0	[maven-release-plugin] prepare for next development iteration	2016-03-17 16:28:40 -05:00
Randall Hauch	c2b8ac50ae	[maven-release-plugin] prepare release v0.1.0	2016-03-17 16:28:40 -05:00
Randall Hauch	43f79aad5e	Added missing version element to modules	2016-03-17 16:14:17 -05:00
Randall Hauch	b5945a24ec	DBZ-32 Corrected assembly dependencies	2016-03-17 15:58:27 -05:00
Randall Hauch	0da37c8aee	DBZ-15 Fixed a problem when handling deleted rows, since the generated record should not have a value schema when the value is null.	2016-03-17 12:34:53 -05:00
Randall Hauch	5a002dbf62	DBZ-15 Cached converters are now dropped upon log rotation.	2016-03-17 11:03:28 -05:00
Randall Hauch	91d200df51	DBZ-15 Removed some of the unnecessary JARs from the MySQL connector plugin kit	2016-03-17 11:03:27 -05:00
Randall Hauch	4998325de7	DBZ-30 Changed the MySQL connector to include all columns in the record value	2016-03-04 10:51:14 -06:00
Randall Hauch	235fa12ead	DBZ-28 Fix formatting	2016-03-04 09:52:02 -06:00
Randall Hauch	2d99cb264c	DBZ-28 Prevent the MySQL connector from sending a record with a null key and null value There is no point in sending a record that contains a null key and null value. While this may not be likely for insert or update cases (since at least the value should not be null), it is possible when a row is deleted (meaning the record value will be null) but the table has no primary/unique key (meaning the record key will be null).	2016-03-04 09:51:32 -06:00
Randall Hauch	64d0e0b458	DBZ-28 Corrected MySQL connector's behavior for representing deletes Corrects a bug where a deleted row was written to Kafka in the same as an insert, making them indistinguishable. Now, a deleted row is written with the row's primary/unique key as the record key, and a null record value. Note that if the row has no primary/unique key, no record is written to Kafka.	2016-03-04 09:48:52 -06:00
Randall Hauch	60d3307597	DBZ-26 Corrected how table info is recovered from the database history.	2016-03-03 15:27:39 -06:00
Randall Hauch	9034e26d1e	DBZ-26 Corrected the embedded connector framework to enable stopping. Also improved logging statements.	2016-03-03 15:27:11 -06:00
Randall Hauch	5ba6da702d	DBZ-26 Corrected the MySQL connector to not fail when it comes across an unknown table Depending upon how the source MySQL database is configured, its binlog might not contain all the history for the database. In particular, it might not have the CREATE TABLE statements for some tables, which are then "unknown" to the connector. When the connector reads the binlog and comes across a change event for a row in one of those "unknown" tables, it previously resulted in a NPE. With this change, the condition results in a warning message in the log, and all subsequent change events on that table will be skipped.	2016-03-03 12:10:11 -06:00
Randall Hauch	5a32da3fac	DBZ-26 Corrected how SourceInfo recovers persistent representations to better handle various converter capabilities	2016-03-03 12:06:33 -06:00
Chris Riccomini	b4d51c8946	DBZ-25 Properly set build version instead of build properties file path for MySQL Module	2016-03-02 16:41:57 -08:00
Randall Hauch	42e531dbe9	DBZ-23 Simplified MySQL Connector's use of Docker plugin	2016-02-25 10:24:39 -06:00
Randall Hauch	7d4a996406	DBZ-23 Docker image created by the module no longer is tagged	2016-02-25 09:43:11 -06:00
Randall Hauch	50e28d72a6	DBZ-17 Added plugin distribution ZIP that can be used for other Kafka Connector plugin modules	2016-02-23 13:23:36 -06:00
Randall Hauch	1d46e59048	DBZ-17 Minor changes to the POMs	2016-02-18 13:58:29 -06:00
Randall Hauch	0102f620a9	DBZ-13 Changed Maven build to attach JavaDoc JARs to each module Modified the 'docs' profile to build and attach JavaDoc JARs for each module's source and test source artifacts. The profile will be automatically used when releasing.	2016-02-17 11:14:50 -06:00
Randall Hauch	dab0440612	DBZ-14 Corrected the 'alt-mysql' Maven profile so that it can be used with any of the other Maven commands.	2016-02-16 16:37:30 -06:00
Christian Posta	c730685a01	add option to run without integration tests	2016-02-15 16:26:32 -07:00
Randall Hauch	73f3c9836b	DBZ-1 Completed integration testing and debugging of the MySQL connector	2016-02-15 14:46:12 -06:00
Randall Hauch	1a59f9b07c	DBZ-11 Build can skip long-running unit and integration tests	2016-02-04 15:35:27 -06:00
Randall Hauch	c501f8486f	DBZ-9 Added MySQL whitelist and blacklists on tables and databases.	2016-02-04 07:56:13 -06:00
Randall Hauch	37d6a5e7da	DBZ-1 Expanded documentation and improved EmbeddedConnector framework Changed the EmbeddedConnector framework to initialize all major components via configuration properties rather than through the public builder. This increases the size of the configurations, but it simplifies what embedding applications must do to obtain an EmbeddedConnector instance. The DatabaseHistory framework was also changed to be configurable in similar ways to the OffsetBackingStore. Essentially, connectors that want to use it (like the MySqlConnector) will describe it as part of the connector's configuration, allowing more flexibility in which DatabaseHistory implementation is used and how it is configured whether in Kafka Connector or as part of the EmbeddedConnector. Added a README.md to `debezium-embedded` to provide documentation and sample code showing how to use the EmbeddedConnector.	2016-02-03 14:11:53 -06:00
Randall Hauch	0e58dba9d6	DBZ-1 Renamed the connector modules and packages	2016-02-02 16:58:48 -06:00

... 17 18 19 20 21

1048 Commits