Parse and ignore any `DELETE` statements that might be seen in the binlog.
Theoretically, the binlog of a properly-configured MySQL server with row-level binlog enabled should never see these statements. However, users on Amazon RDS run into this quite frequently, and we should just handle and ignore them.
It’s not clear how valuable these recommenders actually are. First, it’s not clear about the expected semantics: can the user use values that don’t appear in the recommended values? Second, the recommenders that return large numbers of values can be slow and can result in very large REST API responses.
Debezium was using recommenders to return the database and table/collection names, but these lists can be very large for large databases. Rather than cap the number of recommended values and have the recommender return a subset of all potential values, we will instead remove the recommenders altogether.
When the `table.ignore.built` is set to `false`, the MySQL connector included all of the ~250 system tables yet the database and table filters were never applied to these system tables and therefore all would be captured. With this change, the database filters and table filters now apply to the system tables should they not be ignored.
Note that this does change the behavior of the connector when `table.ignore.built` is set to `false`. However, it is unlikely that anyone is seriously capturing all of the system table changes, so correcting the behavior is the preferred solution.
MySQL has special handling of 2-digit years that it deems are ambiguous, such as the year value `17` that is actually treated as `2017`. Apparently the 2-digit values are stored in MySQL and the interpretation is performed when the data is extracted, so therefore the connector needs to also perform this adjustment of the year values. This commit uses the JDK’s `TemporalAdjuster` interface and passes this down to the requisite temporal-related datatype handling code. The MySQL connector then provides its own `TemporalAdjuster` implementation that adjusts the year values via the excellend JDK `Temporal` methods.
A row in one of the MySQL test databases was changed to use a 2-digit year of `16` while the test method still checks that the year is still 2016`, verifying that the year value is properly adjusted.
The parser now handles `BEGIN…END` blocks better by properly handling `IF()` functions that are not `IF…THEN…END IF` control blocks, and `CASE … WHEN … END CASE` control blocks.
From L49 - one DDL test case for PROCEDURE ,
From L278 - one DDL test case for FUNCTION ,
From L433 - one more DDL test case for PROCEDURE with CURSOR,
From L713 - ONE more DDL TEST FOR PROCEDURE,
From L755 - ONE more DDL TEST CASE FOR FUNCTION
Added a test case that uses the MySQL DDL parser to parse similar DDL statements to those reported in the issue, but these are properly handled with the current state of the `master` branch.
Added a table and inserted rows that tries to replicate the problem reported in DBZ-195, but the test was unable to replicate the problem. In fact, this really is no different than existing tests. Changed the log messages so that if/when this happens again it will be possible to know which row was problematic.
The MySQL parser now properly handles control blocks such as `BEGIN…END`, `IF…END IF`, `REPEAT…END REPEAT`, and `LOOP…END LOOP`, even in cases where the block is preceded by and terminated by a label.
Apparently not all reserved words must be quoted when using them as colum names, so refactored MySQL’s DDL parser to better handle a variety of unquoted colum names that are reserved words.
The MySQL connector’s built-in table filter now just filters out all tables within the known built-in databases, and does not check the names of the tables. Thus, the connector should no longer filter out tables in other databases that happen to have the same names as the tables in the built-in databases.
Corrected the MySQL DDL parser to correctly handle `FULLTEXT` indexes within a `CREATE TABLE` statement. The parser was incorrectly using `canConsume(…)` with a list of options instead of `canConsumeAnyOf(…)`.
Corrects how the MySQL connector reloads database history to take into account the included and excluded GTID sources. This only affects a connector configured to capture changes from _multiple_ MySQL database servers when GTID sources are explicitly excluded or included.
The MySQL DDL parser was not correclty handling `DEFINER` clauses within `CREATE TRIGGER` or `CREATE EVENT` statements. Support for `DEFINER` clauses was recently added for the various forms of `CREATE PROCEDURE`, `CREATE FUNCTION` and `CREATE VIEW` statements. These are the only kinds of statements that have the definer attribute, per the [MySQL documentation](https://dev.mysql.com/doc/refman/5.7/en/stored-programs-security.html).
Changed the events’ `source` structure to optionally contain the identifier of the MySQL thread where appropriate. The thread is included on each `BEGIN` binlog event, so these are captured and added to all of the associated change events produced for that transaction.
These new modules run during the '-Passembly' profile and use the new integration test framework that compares all
output produced by a connector to expected results that were previously recorded and verified. These integration test modules
can be run manually with a simple build of those modules or their parent; only the top-level 'integration-tests' module is run
during the assembly profile during builds of the entire codebase.
The MySQL connector uses several threads, so previously upon connector shutdown these threads were simply cancelled. This is fine for the binlog reader (which can stop at any moment), but is a poor approach for the snapshot as we didn’t always properly release the database resources and also didn’t complete the writing of the DDL history.
With this change, the snapshot reader stops in a very controlled manner, basically by having the 10-step snapshot procedure frequently check whether the reader is to continue working, and to completely avoid thread interruption altogether. And, the snapshot procedure will always clean up its database resources (locks, transactions, etc.), even if the procedure is stopped before completion.
This change also refactors how the snapshot and binlog reader are managed. This is no longer done in the MySqlConnectorTask class (which is busy enough), but rather the logic has been encapsulated in a new `ChainedReader` that makes use of a new `Reader` interface. This makes testing of `ChainedReader` easier, and ensure that `ChainedReader` relies only upon the primary methods of `Reader` rather than upon `AbstractReader`. `ChainedReader` handles multiple readers generically, and ensures that when stopped the readers are all handled correctly and completely process all records, yet avoid accidentally starting a subsequent reader(s) when stopping the previous reader.
The MySQL DDL parser was not properly consuming function declarations. For functions, the parser consumes the entire statement without handline the various expressions within the function declaration, but the parser was not properly finding the end of the statement and instead was continuing to try to consume values beyond the end of the statement.
Specifically, when the parser consumes a `BEGIN`, it looks for a corresponding `END`. However, if it encountered an `END IF`, the `IF` plus any remaining tokens were left on the token stream and unprocessed. This confused the parser, which keep looking for statements and ultimately ended with a `No more content` error.
This case was replicated in integration tests, and the code fixed to properly find the end of the statements.
The Travis-CI builds run the Maven build using the `assembly` profile, and this has been failing quite a bit lately.
The first problem appears to be that the Travis-CI environment recently changed to have port 3306 taken, which means that our build fails to start any Docker containers for MySQL that attempt to use this port. A simple fix is to use different ports for the assembly build.
However, trying to change the port numbers for some of the profiles caused a lot of problems, and to correct these required refactoring how the properties are set. The Docker Maven plugin is now configured with separate properties that are set once (depending upon the profile) to determine the port assignments of the various Docker containers. The Failsafe plugin executions then use these Maven properties when setting the system variables (e.g., `database.host`) needed in the integration tests. This appears to have worked, but it still is a bit fragile. For example, the assembly profile defines several Failsafe executions, and during this profile these should be the only executions run; however, if not all the properties are set properly, the build seems to also run the default Failsafe execution in addition to the other `assembly` profile executions. (I think properties can’t only be defined in the execution, but need to also be defined in the Failsafe configuration.)
The “alternative” MySQL Docker images were removed, since they basically should not provide any different behavior than the `mysql/mysql-server` images we normally used. The extra containers required a lot more resources to run and dramatically increased the complexity of the build.
A few other trivial changes were made.
By default the MySQL connector handles `DECIMAL` and `NUMERIC` columns using `java.math.BigDecimal` values and describing them using the `org.apache.kafka.connect.data.Decimal` schema type, which serializes the values to a binary form.
This change adds a configuration option that will keep the default behavior, but will instead allow handling `DECIMAL` adn `NUMERIC` values as Java `double` and a schema type of `FLOAT64`.
Added tests to verify whether the connector is properly restarting in the binlog when previously the connector failed or stopped in the middle of a transaction. The tests showed that the connector is not able to properly start when using or not using GTIDs, since restarting from an arbitrary binlog event causes problems since the TABLE_MAP events for the affected tables are skipped.
The logic was changed significantly to record in the offsets the binlog coordinates at the start of the transaction, which should work whether or not GTIDs are used. Upon restart, the connector may have to re-read the events that were previously processed, but now the offset also includes the number of events that were previously processed so that these can be skipped upon restart.
This has an unforunate side effect since the offsets capture a transaction was completed only when it generates a source record for the subsequent transaction. This is because the connector generates source records (with their offsets) for the binlog events in the transaction before the transaction's commit is seen. And, since no additional source records are produced for the transaction commit, the recorded offsets will show that the prior transaction is complete and that all of the events in the subsequent transaction are to be skipped. Thus, upon restart the connector has to re-read (but ignore) all of the binlog events associated with the completed transaction. This shouldn’t be a problem, and will only slow restarts for very large transactions.