tet123/debezium-e2e-benchmark/README.md

140 lines
5.9 KiB
Markdown
Raw Normal View History

2020-03-25 09:21:57 +01:00
# Debezium End-to-end Benchmark
2020-04-24 07:19:04 +02:00
The E2E benchmark is a Python script that inserts data into a dedicated table in a database.
One column is a timestamp stating when data is inserted into the table.
The test compares this time with the time of the corresponding timestamp of the Kafka message in the topic.
The script creates in the `tpcdata` directory the resulting data in a CSV file and some diagrams in PNG format.
2020-03-25 09:21:57 +01:00
2020-06-16 08:58:24 +02:00
![](images/tpc_100000_1.png)
![](images/tpc_100000_1-t.png)
![](images/tpc_100000_1-t-d.png)
![](images/tpc_100000_1-h.png)
2020-04-24 07:19:04 +02:00
All the SQL statements required to run the tests are specified in the [tpc-config.json](py/tpc-config.json) file.
The number of commits run and the commit interval of the data is controlled in this part:
2020-03-25 09:21:57 +01:00
```
"tpc": {
"count": 100000,
"commit.intervals": [
1,
100,
1000,
10000
]
},
```
2020-04-24 07:19:04 +02:00
Each entry in the `commit.intervals` array runs one benchmark test.
This parameter should not be set to very high values.
Test section jdbc is nessesary for the jdbc connection driver information. e.g.
It need the driver information form "connector.class" in the register.json only
"jdbc": {
"db2": {
"jdbcdriver": "com.ibm.db2.jcc.DB2Driver",
2020-06-16 08:58:24 +02:00
"jar" : "jcc.jar",
2020-04-24 07:19:04 +02:00
....
An additional parameter is needed for a test run in a self-contained environment.
Params for db2 are complete, for other database flavors fill out the form accordingly, please.
2020-03-25 09:21:57 +01:00
2020-04-24 07:19:04 +02:00
"tpctable": "",
"initsql": [ ... ],
"enablecdctablesql": [ ... ]
2020-03-25 09:21:57 +01:00
## Benchmark on existing environment (DB Server / Kafka / Connector)
2020-04-24 07:19:04 +02:00
If you have an existing up and running Debezium environment, you can do the benchmark test by following these steps:
2020-03-25 09:21:57 +01:00
- Build the benchmark docker image
``` docker build -t debezium-benchmark . ```
- Run the docker container
``` docker run -itd --name benchmark -v <result path>:/home/tpc/tpcdata debezium-benchmark```
- Create the table in your database
- SQL create table for db2
``` CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INTEGER not null GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1), T0 TIMESTAMP NOT NULL GENERATED BY DEFAULT FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP, PRIMARY KEY (ID) ) ORGANIZE BY ROW ```
- SQL create table for SQLServer
``` CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INT IDENTITY(1,1) PRIMARY KEY, T0 TIMESTAMP DATETIME NULL DEFAULT GETDATE() ) ```
2020-04-24 07:19:04 +02:00
- SQL create table for MySQL
2020-03-25 09:21:57 +01:00
``` CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INTEGER NOT NULL AUTO_INCREMENT, T0 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ```
2020-08-11 17:19:44 +02:00
- Include the TPC.TEST table in your Debezium connector config JSON
``` "table.include.list" : "TPC.TEST" ```
2020-03-25 09:21:57 +01:00
- Enable the table for CDC on the database
- SQL for db2
- ``` CALL ASNCDC.ADDTABLE('TPC','TEST') ```
- ``` UPDATE ASNCDC.IBMSNAP_CAPPARMS SET COMMIT_INTERVAL=1 , SLEEP_INTERVAL=1 ```
- ``` VALUES ASNCDC.ASNCDCSERVICES('reinit','asncdc') ```
- ``` VALUES ASNCDC.ASNCDCSERVICES('status','asncdc') ```
- SQL for SQLServer
2020-06-16 08:58:24 +02:00
- for details see [SQL Server Connector](https://debezium.io/documentation/reference/connectors/sqlserver.html)
2020-03-25 09:21:57 +01:00
- SQL for MySQL
2020-06-16 08:58:24 +02:00
- for details see [MySQL Connector](https://debezium.io/documentation/reference/connectors/mysql.html)
2020-03-25 09:21:57 +01:00
- Login into the docker container
``` docker exec -it benchmark /bin/bash ```
2020-06-16 08:58:24 +02:00
- Copy the Debezium connector configuration JSON in the home directory as `$HOME/register.json`
2020-03-25 09:21:57 +01:00
- Go to the directory where the Python code is
2020-06-16 08:58:24 +02:00
``` cd $HOME/py ```
- Edit the `tpc-config.json` to add the correct debezium.connect.server FQDN:port
2020-03-25 09:21:57 +01:00
- Now run the tests
``` python3 tpc-run-tes.py ```
- Create plots
``` python3 runplots.py ```
2020-04-24 07:19:04 +02:00
## Benchmark in a self-contained environment
2020-03-25 09:21:57 +01:00
You will need the following to run the tests on CentOS:
```
yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum update
yum -y install wget
yum -y install docker
yum -y install docker-compose
yum -y install maven
yum -y install git
```
### Run a self-contained TPC set up for DB2 with Docker compose
```
2020-06-16 11:55:21 +02:00
export DEBEZIUM_VERSION=1.2
export DEBEZIUM_DB2_VOLUME=$HOME/dockerdata
export DEBEZIUM_TPC_VOLUME=$HOME/tpcdata
mkdir $DEBEZIUM_DB2_VOLUME
mkdir $DEBEZIUM_TPC_VOLUME
chmod 777 $DEBEZIUM_TPC_VOLUME
cd $HOME
2020-03-25 09:21:57 +01:00
git clone https://github.com/debezium/debezium-examples
git clone https://github.com/debezium/debezium
2020-06-16 11:55:21 +02:00
2020-03-25 09:21:57 +01:00
cd debezium/debezium-e2e-benchmark
# if you like do do it with an other db, adapt the docker-compose file to your prefered database and update the tpc-config.json file for appropriate database ( SQL)
docker-compose -f docker-compose-db2-tpc.yaml up --build
```
# Run test and plots
Once everything is set up, to actually run the tests:
```
2020-06-16 11:55:21 +02:00
docker-compose -f docker-compose-db2-tpc.yaml exec tpc bash
2020-03-25 09:21:57 +01:00
python3 tpc-run-test.py
python3 runplots.py
```
The results are stored in the DEBEZIUM_TPC_VOLUME directory.
You can check the data flowing in Kafka:
```
/kafka/bin/kafka-topics.sh --bootstrap-server <FQDN bootstrap server>:9092 --list
/kafka/bin/kafka-topics.sh --bootstrap-server <FQDN bootstrap server>:9092 --topic TESTDB.DB2INST1.CUSTOMERS --delete
/kafka/bin/kafka-console-consumer.sh --bootstrap-server <FQDN bootstrap server>:9092 --topic CPRODUCER --from-beginning --property print.key=true --property print.timestamp=true
```