When the `table.ignore.built` is set to `false`, the MySQL connector included all of the ~250 system tables yet the database and table filters were never applied to these system tables and therefore all would be captured. With this change, the database filters and table filters now apply to the system tables should they not be ignored.
Note that this does change the behavior of the connector when `table.ignore.built` is set to `false`. However, it is unlikely that anyone is seriously capturing all of the system table changes, so correcting the behavior is the preferred solution.
MySQL has special handling of 2-digit years that it deems are ambiguous, such as the year value `17` that is actually treated as `2017`. Apparently the 2-digit values are stored in MySQL and the interpretation is performed when the data is extracted, so therefore the connector needs to also perform this adjustment of the year values. This commit uses the JDK’s `TemporalAdjuster` interface and passes this down to the requisite temporal-related datatype handling code. The MySQL connector then provides its own `TemporalAdjuster` implementation that adjusts the year values via the excellend JDK `Temporal` methods.
A row in one of the MySQL test databases was changed to use a 2-digit year of `16` while the test method still checks that the year is still 2016`, verifying that the year value is properly adjusted.
The parser now handles `BEGIN…END` blocks better by properly handling `IF()` functions that are not `IF…THEN…END IF` control blocks, and `CASE … WHEN … END CASE` control blocks.
From L49 - one DDL test case for PROCEDURE ,
From L278 - one DDL test case for FUNCTION ,
From L433 - one more DDL test case for PROCEDURE with CURSOR,
From L713 - ONE more DDL TEST FOR PROCEDURE,
From L755 - ONE more DDL TEST CASE FOR FUNCTION
Added a test case that uses the MySQL DDL parser to parse similar DDL statements to those reported in the issue, but these are properly handled with the current state of the `master` branch.
Added a table and inserted rows that tries to replicate the problem reported in DBZ-195, but the test was unable to replicate the problem. In fact, this really is no different than existing tests. Changed the log messages so that if/when this happens again it will be possible to know which row was problematic.
MySQL represents an invalid enum literal in the binlog events as an empty string or an value of `0`. Now, when the connector comes across such a value in the binlog, it will instead use an empty string for the enum literal.
The MySQL parser now properly handles control blocks such as `BEGIN…END`, `IF…END IF`, `REPEAT…END REPEAT`, and `LOOP…END LOOP`, even in cases where the block is preceded by and terminated by a label.
Apparently not all reserved words must be quoted when using them as colum names, so refactored MySQL’s DDL parser to better handle a variety of unquoted colum names that are reserved words.
The MySQL connector’s built-in table filter now just filters out all tables within the known built-in databases, and does not check the names of the tables. Thus, the connector should no longer filter out tables in other databases that happen to have the same names as the tables in the built-in databases.
Corrected the MySQL DDL parser to correctly handle `FULLTEXT` indexes within a `CREATE TABLE` statement. The parser was incorrectly using `canConsume(…)` with a list of options instead of `canConsumeAnyOf(…)`.
With GTIDs enabled, each transaction in the binlog contains a GTID event, which gives us access to the GTID of the transaction. The GTID has the following format: source_id:transaction_id, where source_id is the UUID of the mysql server the transaction was written to.
I propose to allow a debezium instance to be configured with a UUID pattern to check against before producing DML events into Kafka. Debezium would produce a DML event into kafka if and only if the UUID in the event's GTID matches the pattern with which debezium was configured.
The configuration for the UUID patterns will make use of the existing gtid.source.includes and gtid.source.excludes options. The DML event filtering will only be performed if the new option gtid.source.filter.dml.events is true.
Changed the GTID source filters in the MySQL connector to be far more efficient when the filters specify literal UUIDs rather than regex patterns. In these cases, the predicate just checks whether a supplied value is in a hash set, and no regular expression patterns are used.
The GTID source filters can still be a combination of UUID literals and regular expressions, and the predicate will use the best implementation for each. For example, if the filters include all UUID literals, then regular expressions will never be used.
Corrects how the MySQL connector reloads database history to take into account the included and excluded GTID sources. This only affects a connector configured to capture changes from _multiple_ MySQL database servers when GTID sources are explicitly excluded or included.
Improved the MySQL connector's logic to better handle Amazon RDS that does not allow giving user `SUPER` privileges. As before, the connector starts a transaction and attempts to get a global read lock via `FLUSH TABLES WITH READ LOCK` to prevent writes to the database so that the binlog position can be accurately read _and_ the table schemas can be read without interference from other clients. Once that is done, the connector releases the global read lock and continues in the same transaction to read all table rows. This means that our snapshot is consistent, but we maintain the global read lock for a very short period of time.
Amazon's RDS and Aurora are hosted MySQL instances that do not allow users to have the `SUPER` privilege, which means the user cannot get a global read lock. In this case, the connector detects this error, continues to read the database and table names (without any lock), and _then_ uses `FLUSH TABLES <tableName> WITH READ LOCK` on each table that satisfies the filters to prevent changes from other clients. The connector then reads the table schemas, reads _all_ table rows, commits the transaction, and _finally_ releases the table locks.
Therefore, there are two very different behaviors/requirements when the user can't obtain a global read lock because of lack of privilege, like on RDS:
# The RDS user that the connector makes use of must also have the `LOCK TABLES` privilege; without it the connector will fail during the snapshot.
# The connector must hold the table read locks _until it has completed reading all of the tables_, since release the table locks using `UNLOCK TABLES` would prematurely commit our transaction and prevent us from getting a consistent snapshot. From the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/flush.html):
> `UNLOCK TABLES` implicitly commits any active transaction only if any tables currently have been locked with `LOCK TABLES`. The commit does not occur for `UNLOCK TABLES` following `FLUSH TABLES WITH READ LOCK` because the latter statement does not acquire table locks.
The MySQL DDL parser was not correclty handling `DEFINER` clauses within `CREATE TRIGGER` or `CREATE EVENT` statements. Support for `DEFINER` clauses was recently added for the various forms of `CREATE PROCEDURE`, `CREATE FUNCTION` and `CREATE VIEW` statements. These are the only kinds of statements that have the definer attribute, per the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/stored-programs-security.html).
Changed the events’ `source` structure to optionally contain the identifier of the MySQL thread where appropriate. The thread is included on each `BEGIN` binlog event, so these are captured and added to all of the associated change events produced for that transaction.
The version of the DB server required for this to work is at least 9.4
The commit also updates the general DBZ build system for:
* custom checkstyle package exclusions - required by the Postgres driver the protobuf code for now
* adds support for debugging Surefire and Failsafe
Added more logic to the snapshot reader to better handle errors when reading the list of table names in each database. Now, any errors with a single database (e.g., some of the not-quite-a-database names described in the JIRA issue) will cause the snapshot reader to simply skip that database name and continue on (with proper logging).
This change also quotes all of the database and table names when used in SQL statements.