Commit Graph

2035 Commits

Author SHA1 Message Date
Randall Hauch
c20b49a8fc DBZ-57 Added support for the shortened CHARSET alias for CHARACTER SET in MySQL DDL statements
Added explicit support for handling `CHARSET` as an alias for `CHARACTER SET` in both tables and columns.  `CREATE DATABASE` and `ALTER DATABASE` statements can also specify character sets, but the DDL parser handles but does not explicitly parse them so no modification is needed for them. Several unit tests were added to confirm the behavior.
2016-05-20 08:23:50 -05:00
Randall Hauch
4f40cc8332 Merge pull request #39 from rhauch/dbz-43
DBZ-43 Changed form of schema change topic to use schemas
2016-05-19 17:20:04 -05:00
Randall Hauch
e06f5c596c DBZ-43 Added explicit checking and validation of Schemas and Structs in integration tests 2016-05-19 17:06:22 -05:00
Randall Hauch
07315f2b4b DBZ-43 Changed form of schema change topic to use schemas 2016-05-19 16:54:22 -05:00
Randall Hauch
6d66a0ed2d Merge pull request #45 from rhauch/dbz-52
DBZ-52 Added top-level container structure to all messages
2016-05-19 14:28:50 -05:00
Randall Hauch
c0b7114424 DBZ-52 Added top-level container structure to all messages
The new envelope Struct contains fields for the local time at which the connector processed the event, the kind of operation (e.g., read, insert, update, or delete), the state of the record before and after the change, and the information about the event source. The latter two items are connector-specific. The timestamp is merely the time using the connector's process clock, and no guarantees are provided about accuracy, monotonicity, or relationship to the original source event.

The envelope structure is now used as the value for each event message in the MySQL connector; they keys of the event messages remain unchanged. Note that to facilitate Kafka log compaction (which requires a null value), a delete event containing the envelope with details about the deletion is followed by a "tombstone" event that contains the same key but null value.

An example of a message value with this new envelope is as follows:

{
    "schema" : {
      "type" : "struct",
      "fields" : [ {
        "type" : "struct",
        "fields" : [ {
          "type" : "int32",
          "optional" : false,
          "name" : "org.apache.kafka.connect.data.Date",
          "version" : 1,
          "field" : "order_date"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "purchaser"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "quantity"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "product_id"
        } ],
        "optional" : true,
        "name" : "connector_test.orders",
        "field" : "before"
      }, {
        "type" : "struct",
        "fields" : [ {
          "type" : "int32",
          "optional" : false,
          "name" : "org.apache.kafka.connect.data.Date",
          "version" : 1,
          "field" : "order_date"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "purchaser"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "quantity"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "product_id"
        } ],
        "optional" : true,
        "name" : "connector_test.orders",
        "field" : "after"
      }, {
        "type" : "struct",
        "fields" : [ {
          "type" : "string",
          "optional" : false,
          "field" : "server"
        }, {
          "type" : "string",
          "optional" : false,
          "field" : "file"
        }, {
          "type" : "int64",
          "optional" : false,
          "field" : "pos"
        }, {
          "type" : "int32",
          "optional" : false,
          "field" : "row"
        } ],
        "optional" : false,
        "name" : "io.debezium.connector.mysql.Source",
        "field" : "source"
      }, {
        "type" : "string",
        "optional" : false,
        "field" : "op"
      }, {
        "type" : "int64",
        "optional" : true,
        "field" : "ts"
      } ],
      "optional" : false,
      "name" : "kafka-connect-2.connector_test.orders",
      "version" : 1
    },
    "payload" : {
      "before" : null,
      "after" : {
        "order_date" : 16852,
        "purchaser" : 1003,
        "quantity" : 1,
        "product_id" : 107
      },
      "source" : {
        "server" : "kafka-connect-2",
        "file" : "mysql-bin.000002",
        "pos" : 2887680,
        "row" : 4
      },
      "op" : "c",
      "ts" : 1463437199134
    }
}

Notice how the Schema is significantly larger, since it must describe all of the envelope's fields even when those fields are not used. In this case, the event signifies that a record was created as the 4th record of a single event recorded in the binlog.
2016-05-19 12:40:16 -05:00
Randall Hauch
69ec112a17 Merge pull request #43 from rhauch/dbz-44
DBZ-44 Generate a tombstone for old key when row's key is changed
2016-05-13 17:51:48 -05:00
Randall Hauch
e6710a5300 DBZ-44 Generate a tombstone for old key when row's key is change
When a row is updated in the database and the primary/unique key for that table is changed, the MySQL connector continues to generate an update event with the new key and new value, but now also generates a tombstone event for the old key. This ensures that when a Kafka topic is compacted, all prior events with the old key will (eventually) be removed. It also ensures that consumers see that the row represented by the old key has been removed.
2016-05-13 17:43:29 -05:00
Randall Hauch
7c296b83d5 Merge pull request #42 from rhauch/dbz-49
DBZ-49 MySQL DDL parser should be more tolerant of REFERENCE clauses in CREATE TABLE statements
2016-05-13 09:38:46 -05:00
Randall Hauch
97d5caa2db DBZ-49 MySQL DDL parser is more tolerant of REFERENCE clauses in CREATE TABLE statements
MySQL 5.6 using the MyISAM engine will create the `help_relation` system table using a CREATE TABLE statement that does not have in the columns' REFERENCE clause a list of columns in the referenced table. MySQL 5.7 using the InnoDB engine does not include the REFERENCE clauses.

Because Debezium's MySQL DDL parser is meant only to understand the statements recorded in the binlog, it does not have to validate the statements and therefore the DDL parser can be a bit more lenient by not requiring the list of columns in a REFERENCE clause in a CREATE TABLE statement's column definitions.

This commit also adds several unit tests that validate all of the DDL statements used by MySQL 5.6 and 5.7 during startup (in the configurations used in our integration tests).
2016-05-13 09:32:47 -05:00
Randall Hauch
83967e0e53 Merge pull request #41 from rhauch/dbz-50
DBZ-50 Added parameters for length, maxLength and whether the field is masked
2016-05-12 16:55:37 -05:00
Randall Hauch
6d56a8f3d0 DBZ-50 Added parameters for truncated length and when the field is masked. 2016-05-12 16:31:33 -05:00
Randall Hauch
7ce096adaa Merge pull request #40 from rhauch/dbz-29b
DBZ-29 Refactored ColumnMappers
2016-05-12 12:59:33 -05:00
Randall Hauch
b1e6eb1028 DBZ-29 Refactored ColumnMappers and enabled ColumnMapper impls to add parameters to the Kafka Connect Schema. 2016-05-12 12:26:04 -05:00
Randall Hauch
18995abfbd Merge pull request #38 from rhauch/dbz-29
DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns
2016-05-12 08:27:15 -05:00
Randall Hauch
ff9d0fc240 DBZ-29 Changed MySQL connector to be able to hide, truncate, and mask specific columns
Changed the MySQL connector to use comma-separated lists of regular expressions for the database
and table whitelist/blacklists. Literals are still accepted and will match fully-qualified table names,
although the '.' character used as a delimiter is also a special character in regular expressions and
therefore may need to be escaped with a double backslash ('\\') to more carefully match fully-qualified
table names.

Added several new configuration properties for the MySQL connector that instruct it to hide,
truncate, and/or mask certain columns. The properties' values are all lists of regular expressions
or literal fully-qualified column names. For example, the following configuration property:

    column.blacklist=server.users.picture,server.users.other

will cause the connector to leave out of change event messages for the `server.users` table those
fields that correspond to the `picture` and `others` columns. This capability can be used to
This capability can be used to prevent dissemination of sensitive information in the change event
stream.

An alternative to blacklisting is masking. The following configuration property:

    column.mask.with.10.chars=server\\.users\\.(\\w*email)

will cause the connector to mask in the change event messages for the `server.users` table
all values for columns whose name ends in `email`. The values will be replaced in this case with
a constant string of 10 asterisk ('*') characters, even when the email value is null.
This capability can also be used to prevent dissemination of sensitive information in the change event
stream.

Another option is to truncate string values for specific columns. The following configuration
property:

    column.truncate.to.120.chars=server[.]users[.](description|biography)

will cause the connector to truncate to at most 120 characters the values of the `description` and
`biography` columns in the change event messages for the `server.users` table. Although this example
used a limit of 120 characters, any positive length can be specified; separate properties should
be used when different lengths are required. Note how the '.' delimiter in the fully-qualified names
is escaped since that same character is a special character in regular expressions. This capability
can be used to reduce the size of change event messages.
2016-05-11 15:57:06 -05:00
Randall Hauch
5c83d40187 Merge pull request #37 from christian-posta/ceposta-commit-parse-errors
DBZ-48 Cannot parse COMMIT and flush statements
2016-05-06 01:41:02 +02:00
Christian Posta
8b736ef654 DBZ-48 Cannot parse COMMIT and flush statements 2016-05-05 15:36:24 -07:00
Randall Hauch
56cb15cb3f Merge pull request #36 from christian-posta/hack-it
DBZ-42 Use custom mysql images with custom config and startup scripts for integration tests
2016-04-26 17:14:30 -05:00
Christian Posta
ab2cdce279 DBZ-42 inherit from mysql images and add the custom config and startup scripts useful for integration testing 2016-04-26 08:49:27 -07:00
Randall Hauch
cfc795cd75 Merge pull request #34 from rhauch/dbz-38
DBZ-38 Changed the listening framework of the DDL parser
2016-04-13 07:28:47 -05:00
Randall Hauch
1fcb4b02cf DBZ-38 Changed DROP VIEW and TABLE to include single-table statements in events
Drop table/view statements that involve more than one table generate one event for each table/view. Previously, each of those statements had the original multi-table/view statement. Now, each event has a statement that applies to only that table (generated from the original with all the same clauses).
2016-04-12 18:18:13 -05:00
Randall Hauch
b1e428c986 DBZ-38 Adjusted how events are generated for RENAME TO statements
The previous change did not correctly capture the statements for a `RENAME TO` that renamed multiple tables, so fixed the code so that it generates a single `RENAME TO` for each table rename.
2016-04-12 17:58:07 -05:00
David Chen
eeff81b65d MySqlDdlParser should support "RENAME TABLE blue_table TO red_table, orange_table TO green_table, black_table TO white_table;" form. (#1) 2016-04-12 17:40:00 -05:00
Randall Hauch
5b30568650 DBZ-38 Changed the listening framework of the DDL parser
Refactored the mechanism by which components can listen to the activities of a DDL parser. The new approach
should be significantly more flexible for additional types of DDL events while making it easier to maintain
backward compatibility. It also will enable passing event-specific information on each DDL event.
2016-04-12 11:00:02 -05:00
Randall Hauch
75955945ee Merge pull request #33 from rhauch/dbz-38
DBZ-38 Changed the DDL parser framework to notify listeners as statements are applied
2016-04-11 15:44:39 -05:00
Randall Hauch
137b9f6d4d DBZ-38 Changed the DDL parser framework to notify listeners as statements are applied. 2016-04-11 15:16:04 -05:00
Randall Hauch
453a3730f6 Corrected Maven Central badge in README 2016-03-17 22:50:02 -05:00
Randall Hauch
8f5487b2c0 [maven-release-plugin] prepare for next development iteration 2016-03-17 16:28:40 -05:00
Randall Hauch
c2b8ac50ae [maven-release-plugin] prepare release v0.1.0 2016-03-17 16:28:40 -05:00
Randall Hauch
43f79aad5e Added missing version element to modules 2016-03-17 16:14:17 -05:00
Randall Hauch
fa4ae33ba2 Removed unused modules 2016-03-17 16:13:50 -05:00
Randall Hauch
eea175a5aa DBZ-32 Corrected assembly plugin descriptor in parent POM 2016-03-17 16:04:53 -05:00
Randall Hauch
b5945a24ec DBZ-32 Corrected assembly dependencies 2016-03-17 15:58:27 -05:00
Randall Hauch
fc2c83e406 Merge pull request #32 from rhauch/dbz-32
DBZ-32 Changed Maven build to support releasing to Maven Central via Sonatype OSSRH
2016-03-17 15:30:42 -05:00
Randall Hauch
0867bd7961 DBZ-32 Changed Maven build to support releasing to Maven Central via the Sonatype OSSRH. 2016-03-17 15:16:31 -05:00
Randall Hauch
026c92f5c6 Merge pull request #31 from rhauch/dbz-15
DBZ-15 Fixed several issues discovered during testing
2016-03-17 12:44:54 -05:00
Randall Hauch
0da37c8aee DBZ-15 Fixed a problem when handling deleted rows, since the generated record should not have a value schema when the value is null. 2016-03-17 12:34:53 -05:00
Randall Hauch
5a002dbf62 DBZ-15 Cached converters are now dropped upon log rotation. 2016-03-17 11:03:28 -05:00
Randall Hauch
91d200df51 DBZ-15 Removed some of the unnecessary JARs from the MySQL connector plugin kit 2016-03-17 11:03:27 -05:00
Randall Hauch
74c5adcc8d Merge pull request #30 from rhauch/dbz-30
DBZ-30 Changed the MySQL connector to include all columns in the record value
2016-03-04 13:06:17 -06:00
Randall Hauch
4998325de7 DBZ-30 Changed the MySQL connector to include all columns in the record value 2016-03-04 10:51:14 -06:00
Randall Hauch
5c8e68d6d2 Merge pull request #29 from rhauch/dbz-28
DBZ-28 Corrected MySQL connector's behavior for representing deletes
2016-03-04 10:37:36 -06:00
Randall Hauch
fd3a0d992f Change Gitter notifications for Travis builds to always report successsful builds. 2016-03-04 09:59:49 -06:00
Randall Hauch
235fa12ead DBZ-28 Fix formatting 2016-03-04 09:52:02 -06:00
Randall Hauch
2d99cb264c DBZ-28 Prevent the MySQL connector from sending a record with a null key and null value
There is no point in sending a record that contains a null key and null value. While this may not be likely for insert or update cases (since at least the value should not be null), it is possible when a row is deleted (meaning the record value will be null) but the table has no primary/unique key (meaning the record key will be null).
2016-03-04 09:51:32 -06:00
Randall Hauch
64d0e0b458 DBZ-28 Corrected MySQL connector's behavior for representing deletes
Corrects a bug where a deleted row was written to Kafka in the same as an insert, making them indistinguishable. Now, a deleted row is written with the row's primary/unique key as the record key, and a null record value. Note that if the row has no primary/unique key, no record is written to Kafka.
2016-03-04 09:48:52 -06:00
Randall Hauch
b6b982d711 Merge pull request #28 from rhauch/dbz-26
DBZ-26 Corrected how MySQL table info is recovered from db history upon connector restart
2016-03-03 16:11:41 -06:00
Randall Hauch
60d3307597 DBZ-26 Corrected how table info is recovered from the database history. 2016-03-03 15:27:39 -06:00
Randall Hauch
9034e26d1e DBZ-26 Corrected the embedded connector framework to enable stopping. Also improved logging statements. 2016-03-03 15:27:11 -06:00