Commit Graph

437 Commits

Author SHA1 Message Date
Jenkins user
f4e151b23a [maven-release-plugin] prepare for next development iteration 2018-03-20 08:14:19 +00:00
Jenkins user
93b3252332 [maven-release-plugin] prepare release v0.7.5 2018-03-20 08:14:19 +00:00
Andrew Tongen
2792f4cac3 DBZ-665 Passing correct value for timeSinceLastCommit to commit offset policy 2018-03-19 12:23:18 +01:00
Jenkins user
daf27207be [maven-release-plugin] prepare for next development iteration 2018-03-07 08:31:07 +00:00
Jenkins user
9c73774928 [maven-release-plugin] prepare release v0.7.4 2018-03-07 08:31:07 +00:00
Jenkins user
6d0cd88e12 [maven-release-plugin] prepare for next development iteration 2018-02-15 04:15:34 +00:00
Jenkins user
7d1e1a989e [maven-release-plugin] prepare release v0.7.3 2018-02-15 04:15:34 +00:00
Gunnar Morling
2a724d9611 DBZ-537 Misc. adjustments:
* Renaming ConfigurationHelper to Instantiator
* Doc improvements and typo fixes
* Bringing getInstance() methods into consistent order
* Raising exception instead of logging error if instantion fails
2018-02-09 15:45:57 +01:00
Jiri Pechanec
5be55ae8ff DBZ-537 OffsetCommitPolicy supports Configuration constructor initialization 2018-02-09 15:19:48 +01:00
Jiri Pechanec
a8750221cb DBZ-537 OffsetCommitPolicy is configurable via builder 2018-02-09 15:19:48 +01:00
Jiri Pechanec
eb7cc3f28e DBZ-537 Change OffsetCommitPolicy to use Duration in API 2018-02-09 15:19:48 +01:00
Jiri Pechanec
8c3daa387d DBZ-537 Move OffsetCommitPolicy to SPI 2018-02-09 15:19:48 +01:00
Jiri Pechanec
452f9af52d DBZ-537 Configurable offset commit strategy 2018-02-09 15:19:48 +01:00
Jiri Pechanec
9b592204ac DBZ-587 Centralize and unify thread management 2018-02-01 10:04:20 +01:00
Jenkins user
04624341f5 [maven-release-plugin] prepare for next development iteration 2018-01-25 09:39:44 +00:00
Jenkins user
898f6884e1 [maven-release-plugin] prepare release v0.7.2 2018-01-25 09:39:44 +00:00
Gunnar Morling
1bc9dec6f5 DBZ-555 Adding Denis Mikhaylov to COPYRIGHT.txt; formatting 2018-01-19 12:30:26 +01:00
Denis Mikhaylov
048f1323cd DBZ-555 Add missing config fields to EmbeddedEngine 2018-01-19 12:30:17 +01:00
Jenkins user
6bb34b42f9 [maven-release-plugin] prepare for next development iteration 2017-12-20 07:15:12 +00:00
Jenkins user
16dcd4c980 [maven-release-plugin] prepare release v0.7.1 2017-12-20 07:15:12 +00:00
Jenkins user
5e09932cb9 [maven-release-plugin] prepare for next development iteration 2017-12-15 05:10:23 +00:00
Jenkins user
6c1d61e03b [maven-release-plugin] prepare release v0.7.0 2017-12-15 05:10:23 +00:00
Jiri Pechanec
196f6b3571 DBZ-406 Fixing test, adding warning for disabled rollback 2017-12-13 14:12:27 +01:00
Gunnar Morling
5fbe742be8 DBZ-285 Specifying scope of dependencies in the individual POMs for the sake of comprehensibility 2017-11-10 16:48:32 +01:00
Ben Williams
a3b4fedd5f DBZ-363 Add support for BIGINT UNSIGNED handling for MySQL 2017-10-18 10:20:03 +02:00
Jiri Pechanec
0bc8129961 DBZ-258 Support for wal2json plugin 2017-10-18 09:21:22 +02:00
Jenkins user
75937711fa [maven-release-plugin] prepare for next development iteration 2017-09-21 04:42:02 +00:00
Jenkins user
a89b9332e4 [maven-release-plugin] prepare release v0.6.0 2017-09-21 04:42:02 +00:00
Jenkins user
214696ef0c [maven-release-plugin] prepare for next development iteration 2017-08-17 11:51:05 +00:00
Jenkins user
c867e6fea6 [maven-release-plugin] prepare release v0.5.2 2017-08-17 11:51:05 +00:00
Gunnar Morling
a8d1817c22 [maven-release-plugin] prepare for next development iteration 2017-06-09 16:14:31 +00:00
Gunnar Morling
3f512aace7 [maven-release-plugin] prepare release v0.5.1 2017-06-09 16:14:31 +00:00
Randall Hauch
709cd8f3fe [maven-release-plugin] prepare for next development iteration 2017-03-27 11:28:12 -05:00
Randall Hauch
2bc3d45954 [maven-release-plugin] prepare release v0.5.0 2017-03-27 11:28:11 -05:00
Randall Hauch
430d756062 [maven-release-plugin] prepare for next development iteration 2017-03-17 15:41:58 -05:00
Randall Hauch
536cbf6300 [maven-release-plugin] prepare release v0.4.1 2017-03-17 15:41:57 -05:00
Randall Hauch
8c60c29883 [maven-release-plugin] prepare for next development iteration 2017-02-07 14:22:12 -06:00
Randall Hauch
20134286e9 [maven-release-plugin] prepare release v0.4.0 2017-02-07 14:22:11 -06:00
Randall Hauch
fe17b246af DBZ-113 Added MySQL threads to the event’s source metadata
Changed the events’ `source` structure to optionally contain the identifier of the MySQL thread where appropriate. The thread is included on each `BEGIN` binlog event, so these are captured and added to all of the associated change events produced for that transaction.
2017-02-02 11:53:32 -06:00
Horia Chiorean
d035c4bc8d DBZ-173 Changes the MySQL ITs to not use TZ information for expected dates and fixes the character set for parsing test files 2017-01-27 14:53:10 +02:00
Horia Chiorean
a2154d3d32 DBZ-173 Changes the MySQL ITs to use the database.hostname system property instead of always hardcoding 'localhost' 2017-01-27 09:19:57 +02:00
Horia Chiorean
7dfdef3558 DBZ-173 Upgrades the Kafka artifact versions to 0.10.1.1 2017-01-27 09:19:57 +02:00
Randall Hauch
e8b06b0ec1 Merge pull request #144 from hchiorean/DBZ-3
DBZ-3 Implements a Debezium Connector for ingesting Postgresql changes via logical decoding
2017-01-11 13:32:09 -06:00
Ramesh Reddy
29a7043fe3 DBZ-178: Correcting error where error message is only logged in success scenarios in EmbeddedEngine 2017-01-11 09:39:47 -06:00
Horia Chiorean
737614a555 DBZ-3 Implements a connector for streaming changes from a Postgres database
The version of the DB server required for this to work is at least 9.4. To be able to stream logical changes, the code relies on enhancements to the JDBC driver which are not yet public. Therefore, the current codebase includes the sources for the JDBC driver.
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
2016-12-27 14:44:32 +02:00
Randall Hauch
5dceb05f69 DBZ-151 Additional changes to improve test framework and MySQL integration tests 2016-12-20 10:58:56 -06:00
Randall Hauch
a3bece4472 DBZ-151 Added new integration test framework for easily comparing output of connectors to expected results. 2016-12-20 09:18:09 -06:00
Randall Hauch
eedc4fba00 DBZ-163 Corrected assembly profile in build
The Travis-CI builds run the Maven build using the `assembly` profile, and this has been failing quite a bit lately.

The first problem appears to be that the Travis-CI environment recently changed to have port 3306 taken, which means that our build fails to start any Docker containers for MySQL that attempt to use this port. A simple fix is to use different ports for the assembly build.

However, trying to change the port numbers for some of the profiles caused a lot of problems, and to correct these required refactoring how the properties are set. The Docker Maven plugin is now configured with separate properties that are set once (depending upon the profile) to determine the port assignments of the various Docker containers. The Failsafe plugin executions then use these Maven properties when setting the system variables (e.g., `database.host`) needed in the integration tests. This appears to have worked, but it still is a bit fragile. For example, the assembly profile defines several Failsafe executions, and during this profile these should be the only executions run; however, if not all the properties are set properly, the build seems to also run the default Failsafe execution in addition to the other `assembly` profile executions. (I think properties can’t only be defined in the execution, but need to also be defined in the Failsafe configuration.)

The “alternative” MySQL Docker images were removed, since they basically should not provide any different behavior than the `mysql/mysql-server` images we normally used. The extra containers required a lot more resources to run and dramatically increased the complexity of the build.

A few other trivial changes were made.
2016-12-05 16:37:59 -06:00
Horia Chiorean
968cf62b23 DBZ-156 Adds better error handling to the EmbeddedEngine 2016-11-18 11:04:00 +02:00
Horia Chiorean
506457c13b DBZ-156 Updates EmbeddedEngine to better handle exceptional cases and provide more feedback during startup
It also updates  EmbeddedEngine to use the Kafka commit callbacks introduced after 0.10 and updates AbstractConnectorTest to better synchronize with the embedded engine
2016-11-17 19:18:07 +02:00
Randall Hauch
ea5f7983c7 DBZ-144 Corrected MySQL connector restart
Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped.

The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart.

This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.
2016-11-09 08:11:41 -06:00
Randall Hauch
4de56fd657 Merge pull request #94 from hchiorean/DZB-header-fix
Fixes the DBZ header required by checkstyle
2016-08-24 14:28:43 -05:00
Randall Hauch
ce2b2db80c DBZ-99 Added support for MySQL connector to connect securely to MySQL
Changed the MySQL connector to have several new configuration properties for setting up the SSL key store and trust store (which can be used in place of System or JDK properties) used for MySQL secure connections, and another property to specify what kind of SSL connection be used.

Modified several integration tests to ensure all MySQL connections are made with `useSSL=false`.
2016-08-24 13:27:35 -05:00
Horia Chiorean
2732d26ff0 Fixes the DBZ header required by checkstyle
This commit removes an extra space character from the first blank line of the header
2016-08-24 13:41:15 +03:00
Randall Hauch
e86fb83459 [maven-release-plugin] prepare for next development iteration 2016-08-16 09:56:47 -05:00
Randall Hauch
ccdb0a1a63 [maven-release-plugin] prepare release v0.3.0 2016-08-16 09:56:47 -05:00
Horia Chiorean
ab24f013d1 DBZ-96 Removes some asserts on tables created by another test case 2016-08-08 14:25:38 +03:00
Chris Riccomini
265c2e8c88 Update README.md 2016-08-05 13:31:46 -07:00
Randall Hauch
6894e9c30d Merge pull request #78 from hchiorean/mysql-tests-fix
Fixes some more tests around date handling in the MySQL connector
2016-08-02 20:12:04 -05:00
Randall Hauch
8cb39eacf0 Reverted back to 0.3.0-SNAPSHOT, since the 0.3 candidate release was not acceptable. 2016-08-01 12:25:58 -05:00
Horia Chiorean
eaf295fbf0 Fixes some more tests around date handling in the MySQL connector 2016-07-29 08:57:47 +03:00
Randall Hauch
517272278d [maven-release-plugin] prepare for next development iteration 2016-07-25 17:50:31 -05:00
Randall Hauch
b89296e646 [maven-release-plugin] prepare release v0.3.0 2016-07-25 17:50:31 -05:00
Randall Hauch
447acb797d DBZ-62 Upgraded to Kafka and Kafka Connect 0.10.0.0
Upgraded from Kafka 0.9.0.1 to Kafka 0.10.0. The only required change was to override the `Connector.config()` method, which returns `null` or a `ConfigDef` instance that contains detailed metadata for each of the configuration fields, including supporting recommended values and marking fields as not visible (e.g., if they don't make sense given other configuration field values). This can be used by user interfaces to data-drive the configuration of a connector. Also, the default validation logic of the Connector implementations uses a `Validator` that is pretty restrictive in its functionality.

Debezium already had a fairly decent and simple `Configuration` framework. After several attempts to try and merge these concepts, reconciling the two validation mechanisms was very complicated and involved a lot of changes. It was easier to simply continue Debezium-specific validation and to override the `Connector.validate(...)` method to use Debezium's `Configuration`-based validation. Connector-based validation logic includes determining recommended values, so Debezium's `Field` class (used to define each configuration property) was enhanced with a new `Recommender` class that is similar to Kafka's.

Additional integration tests were added to verify that the `ConfigDef` result is acceptable and that the new connector validation logic works as expected, including getting recommended values for some fields (e.g., database names, table/collection names) from MySQL and MongoDB by connecting and dynamically reading the values. This was done in a way that remains backward compatible with the regular expression formats of these fields, but in a user interface that uses the `ConfigDef` mechanism the user can simply select the databases and table/collection identifiers.
2016-07-25 14:21:31 -05:00
Randall Hauch
12e7cfb8d3 DBZ-2 Created initial Maven module with a MongoDB connector
Added a new `debezium-connector-mongodb` module that defines a MongoDB connector. The MongoDB connector can capture and record the changes within a MongoDB replica set, or when seeded with addresses of the configuration server of a MongoDB sharded cluster, the connector captures the changes from the each replica set used as a shard. In the latter case, the connector even discovers the addition of or removal of shards.

The connector monitors each replica set using multiple tasks and, if needed, separate threads within each task. When a replica set is being monitored for the first time, the connector will perform an "initial sync" of that replica set's databases and collections. Once the initial sync has completed, the connector will then begin tailing the oplog of the replica set, starting at the exact point in time at which it started the initial sync. This equivalent to how MongoDB replication works.

The connector always uses the replica set's primary node to tail the oplog. If the replica set undergoes an election and different node becomes primary, the connector will immediately stop tailing the oplog, connect to the new primary, and start tailing the oplog using the new primary node. Likewise, if connector experiences any problems communicating with the replica set members, it will try to reconnect (using exponential backoff so as to not overwhelm the replica set) and continue tailing the oplog from where it last left off. In this way the connector is able to dynamically adjust to changes in replica set membership and to automatically handle communication failures.

The MongoDB oplog contains limited information, and in particular the events describing updates and deletes do not actually have the before or after state of the documents. Instead, the oplog events are all idempotent, so updates contain the effective changes that were made during an update, and deletes merely contain the deleted document identifier. Consequently, the connector is limited in the information it includes in its output events. Create and read events do contain the initial state, but the update contain only the changes (rather than the before and/or after states of the document) and delete events do not have the before state of the deleted document. All connector events, however, do contain the local system timestamp at which the event was processed and _source_ information detailing the origins of the event, including the replica set name, the MongoDB transaction timestamp of the event, and the transactions identifier among other things.

It is possible for MongoDB to lose commits in specific failure situations. For exmaple, if the primary applies a change and records it in its oplog before it then crashes unexpectedly, the secondary nodes may not have had a chance to read those changes from the primary's oplog before the primary crashed. If one such secondary is then elected as primary, it's oplog is missing the last changes that the old primary had recorded and no longer has those changes. In these cases where MongoDB loses changes recorded in a primary's oplog, it is possible that the MongoDB connector may or may not capture these lost changes.
2016-07-14 13:02:36 -05:00
Randall Hauch
d9cca5d254 DBZ-77 Corrected completion of offset snapshot mode
The snapshot mode within the offsets now are marked as complete with the last source record produced during the snapshot. This is the only sure way to update the offset.

Note that the `source` field shows the snapshot is in effect for _all_ records produced during the snapshot, including the very last one. This distinction w/r/t the offset was made possible due to recent changes for DBZ-73.

Previously, when the snapshot reader completed all generation of records, it then attempted to record an empty DDL statement. However, since this statement had no net effect on the schemas, no source record was produced and thus the offset's snapshot mode was never changed. Consequently, if the connector were stopped immediately after the snapshot completed but before other events could be read or produced, upon restart the connector would perform another snapshot.
2016-06-15 12:01:16 -05:00
Randall Hauch
6749518f66 [maven-release-plugin] prepare for next development iteration 2016-06-08 13:00:50 -05:00
Randall Hauch
d5bbb116ed [maven-release-plugin] prepare release v0.2.0 2016-06-08 13:00:50 -05:00
Randall Hauch
cf26a5c4e0 Removed duplicate versions in POMs 2016-06-08 09:46:05 -05:00
Randall Hauch
58a5d8c033 DBZ-31 Added support for possibly performing snapshot upon startup
Refactored the MySQL connector to break out the logic of reading the binlog into a separate class, added a similar class to read a full snapshot, and then updated the MySQL connector task class to use both. Added several test cases and updated the existing tests.
2016-06-01 21:40:53 -05:00
Randall Hauch
dc5a379764 DBZ-55 Corrected filtering of DDL statements based upon affected database
Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. However, that database name is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database _affected_ by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database).

With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements is still maintained, but since each DDL statement can apply to a separate database the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting "db1" will get batched together into one schema change record, followed by one statement affecting "db2" as a second schema change record, followed by another statement affecting "db1" as a third schema record.

Meanwhile, this change does not affect how the database history records the changes: it still records them as submitted using a single record for each separate binlog event/position. This is much safer as each binlog event (with specific position) is written atomically to the history stream. Also, since the database history stream is what the connector uses upon recovery, the database history records are now written _after_ any schema change records to ensure that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).
2016-05-23 11:01:27 -05:00
Randall Hauch
e06f5c596c DBZ-43 Added explicit checking and validation of Schemas and Structs in integration tests 2016-05-19 17:06:22 -05:00
Randall Hauch
07315f2b4b DBZ-43 Changed form of schema change topic to use schemas 2016-05-19 16:54:22 -05:00
Randall Hauch
c0b7114424 DBZ-52 Added top-level container structure to all messages
The new envelope Struct contains fields for the local time at which the connector processed the event, the kind of operation (e.g., read, insert, update, or delete), the state of the record before and after the change, and the information about the event source. The latter two items are connector-specific. The timestamp is merely the time using the connector's process clock, and no guarantees are provided about accuracy, monotonicity, or relationship to the original source event.

The envelope structure is now used as the value for each event message in the MySQL connector; they keys of the event messages remain unchanged. Note that to facilitate Kafka log compaction (which requires a null value), a delete event containing the envelope with details about the deletion is followed by a "tombstone" event that contains the same key but null value.

An example of a message value with this new envelope is as follows:

{
    "schema" : {
      "type" : "struct",
      "fields" : [ {
        "type" : "struct",
        "fields" : [ {
          "type" : "int32",
          "optional" : false,
          "name" : "org.apache.kafka.connect.data.Date",
          "version" : 1,
          "field" : "order_date"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "purchaser"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "quantity"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "product_id"
        } ],
        "optional" : true,
        "name" : "connector_test.orders",
        "field" : "before"
      }, {
        "type" : "struct",
        "fields" : [ {
          "type" : "int32",
          "optional" : false,
          "name" : "org.apache.kafka.connect.data.Date",
          "version" : 1,
          "field" : "order_date"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "purchaser"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "quantity"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "product_id"
        } ],
        "optional" : true,
        "name" : "connector_test.orders",
        "field" : "after"
      }, {
        "type" : "struct",
        "fields" : [ {
          "type" : "string",
          "optional" : false,
          "field" : "server"
        }, {
          "type" : "string",
          "optional" : false,
          "field" : "file"
        }, {
          "type" : "int64",
          "optional" : false,
          "field" : "pos"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "row"
        } ],
        "optional" : false,
        "name" : "io.debezium.connector.mysql.Source",
        "field" : "source"
      }, {
        "type" : "string",
        "optional" : false,
        "field" : "op"
      }, {
        "type" : "int64",
        "optional" : true,
        "field" : "ts"
      } ],
      "optional" : false,
      "name" : "kafka-connect-2.connector_test.orders",
      "version" : 1
    },
    "payload" : {
      "before" : null,
      "after" : {
        "order_date" : 16852,
        "purchaser" : 1003,
        "quantity" : 1,
        "product_id" : 107
      },
      "source" : {
        "server" : "kafka-connect-2",
        "file" : "mysql-bin.000002",
        "pos" : 2887680,
        "row" : 4
      },
      "op" : "c",
      "ts" : 1463437199134
    }
}

Notice how the Schema is significantly larger, since it must describe all of the envelope's fields even when those fields are not used. In this case, the event signifies that a record was created as the 4th record of a single event recorded in the binlog.
2016-05-19 12:40:16 -05:00
Randall Hauch
e6710a5300 DBZ-44 Generate a tombstone for old key when row's key is change
When a row is updated in the database and the primary/unique key for that table is changed, the MySQL connector continues to generate an update event with the new key and new value, but now also generates a tombstone event for the old key. This ensures that when a Kafka topic is compacted, all prior events with the old key will (eventually) be removed. It also ensures that consumers see that the row represented by the old key has been removed.
2016-05-13 17:43:29 -05:00
Randall Hauch
b1e6eb1028 DBZ-29 Refactored ColumnMappers and enabled ColumnMapper impls to add parameters to the Kafka Connect Schema. 2016-05-12 12:26:04 -05:00
Randall Hauch
ff9d0fc240 DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns
Changed the MySQL connector to use comma-separated lists of regular expressions for the database
and table whitelist/blacklists. Literals are still accepted and will match fully-qualified table names,
although the '.' character used as a delimiter is also a special character in regular expressions and
therefore may need to be escaped with a double backslash ('\\') to more carefully match fully-qualified
table names.

Added several new configuration properties for the MySQL connector that instruct it to hide,
truncate, and/or mask certain columns. The properties' values are all lists of regular expressions
or literal fully-qualified column names. For example, the following configuration property:

    column.blacklist=server.users.picture,server.users.other

will cause the connector to leave out of change event messages for the `server.users` table those
fields that correspond to the `picture` and `others` columns. This capability can be used to
This capability can be used to prevent dissemination of sensitive information in the change event
stream.

An alternative to blacklisting is masking. The following configuration property:

    column.mask.with.10.chars=server\\.users\\.(\\w*email)

will cause the connector to mask in the change event messages for the `server.users` table
all values for columns whose name ends in `email`. The values will be replaced in this case with
a constant string of 10 asterisk ('*') characters, even when the email value is null.
This capability can also be used to prevent dissemination of sensitive information in the change event
stream.

Another option is to truncate string values for specific columns. The following configuration
property:

    column.truncate.to.120.chars=server[.]users[.](description|biography)

will cause the connector to truncate to at most 120 characters the values of the `description` and
`biography` columns in the change event messages for the `server.users` table. Although this example
used a limit of 120 characters, any positive length can be specified; separate properties should
be used when different lengths are required. Note how the '.' delimiter in the fully-qualified names
is escaped since that same character is a special character in regular expressions. This capability
can be used to reduce the size of change event messages.
2016-05-11 15:57:06 -05:00
Randall Hauch
8f5487b2c0 [maven-release-plugin] prepare for next development iteration 2016-03-17 16:28:40 -05:00
Randall Hauch
c2b8ac50ae [maven-release-plugin] prepare release v0.1.0 2016-03-17 16:28:40 -05:00
Randall Hauch
43f79aad5e Added missing version element to modules 2016-03-17 16:14:17 -05:00
Randall Hauch
9034e26d1e DBZ-26 Corrected the embedded connector framework to enable stopping. Also improved logging statements. 2016-03-03 15:27:11 -06:00
Randall Hauch
1d46e59048 DBZ-17 Minor changes to the POMs 2016-02-18 13:58:29 -06:00
Randall Hauch
73f3c9836b DBZ-1 Completed integration testing and debugging of the MySQL connector 2016-02-15 14:46:12 -06:00
Randall Hauch
70fc601c0f DBZ-8 Added documentation about embedded engines. 2016-02-03 16:09:43 -06:00
Randall Hauch
fbae6d75c8 DBZ-1 Renamed EmbeddedConnector to EmbeddedEngine and improved README 2016-02-03 15:33:57 -06:00
Randall Hauch
37d6a5e7da DBZ-1 Expanded documentation and improved EmbeddedConnector framework
Changed the EmbeddedConnector framework to initialize all major components via configuration properties rather than through the public builder. This increases the size of the configurations, but it simplifies what embedding applications must do to obtain an EmbeddedConnector instance.

The DatabaseHistory framework was also changed to be configurable in similar ways to the OffsetBackingStore. Essentially, connectors that want to use it (like the MySqlConnector) will describe it as part of the connector's configuration, allowing more flexibility in which DatabaseHistory implementation is used and how it is configured whether in Kafka Connector or as part of the EmbeddedConnector.

Added a README.md to `debezium-embedded` to provide documentation and sample code showing how to use the EmbeddedConnector.
2016-02-03 14:11:53 -06:00
Randall Hauch
2da5b37f76 DBZ-1 Added support for recording and recovering database schema
Adds a small framework for recording the DDL operations on the schema state (e.g., Tables) as they are read and applied from the log, and when restarting the connector task to recover the accumulated schema state. Where and how the DDL operations are recorded is an abstraction called `DatabaseHistory`, with three options: in-memory (primarily for testing purposes), file-based (for embedded cases and perhaps standalone Kafka Connect uses), and Kafka (for normal Kafka Connect deployments).

The `DatabaseHistory` interface methods take several parameters that are used to construct a `SourceRecord`. The `SourceRecord` type was not used, however, since that would result in this interface (and potential extension mechanism) having a dependency on and exposing the Kafka API. Instead, the more general parameters are used to keep the API simple.

The `FileDatabaseHistory` and `MemoryDatabaseHistory` implementations are both fairly simple, but the `FileDatabaseHistory` relies upon representing each recorded change as a JSON document. This is simple, is easily written to files, allows for recovery of data from the raw file, etc. Although this was done initially using Jackson, the code to read and write the JSON documents required a lot of boilerplate. Instead, the `Document` framework developed during Debezium's very early prototype stages was brought back. It provides a very usable API for working with documents, including the ability to compare documents semantically (e.g., numeric values are converted to be able to compare their numeric values rather than just compare representations) and with or without field order.

The `KafkaDatabaseHistory` is a bit more complicated, since it uses a Kafka broker to record all database schema changes on a single topic with single partition, and then upon restart uses it to recover the history from the dedicated topics. This implementation also records the changes as JSON documents, keeping it simple and independent of the Kafka Connect converters.
2016-02-02 14:27:14 -06:00