This commit is contained in:
Peter Urbanetz 2020-04-24 07:19:04 +02:00 committed by Jiri Pechanec
parent edd59596a2
commit 59162675c5
8 changed files with 37 additions and 25 deletions

View File

@ -7,15 +7,6 @@ RUN dnf -y install gcc gcc-c++ python3-devel python3-requests
RUN python3 -m pip install JPype1==0.6.3
RUN python3 -m pip install JayDeBeApi matplotlib kafka-python scipy
#RUN pip install ibm-db
## SQL SERVER connector
#RUN pip install pyodbc
#RUN pip install mysql-connector-python
# https://docs.omnisci.com/v3.6.0/mapd-core-guide/jaydebeapi/
RUN useradd -ms /bin/bash tpc
USER tpc

View File

@ -1,8 +1,14 @@
# Debezium End-to-end Benchmark
The E2E benchmark is a Python script which inserts data into a dedicated table in a database. One column is a timestap stating when data is inserted into the table. The test compares this time with the time of the correspondig timestamp of the Kafka massage in the topic. The script creates in the `tpcdata` directory the resulting data in a CSV file and some diagrams in PNG format.
The E2E benchmark is a Python script that inserts data into a dedicated table in a database.
One column is a timestamp stating when data is inserted into the table.
The test compares this time with the time of the corresponding timestamp of the Kafka message in the topic.
The script creates in the `tpcdata` directory the resulting data in a CSV file and some diagrams in PNG format.
All the SQL statements required to run the tests are specified in the [tpc-config.json](py/tpc-config.json) file. The number of commits run and the commit interval of the data is controlled in this part:
<img src="./images/tpc_100000_1.png" width="20%"><img src="./images/tpc_100000_1-t.png" width="20%"><img src="./images/tpc_100000_1-t-d.png" width="20%"><img src="./images/tpc_100000_1-h.png" width="20%">
All the SQL statements required to run the tests are specified in the [tpc-config.json](py/tpc-config.json) file.
The number of commits run and the commit interval of the data is controlled in this part:
```
"tpc": {
"count": 100000,
@ -15,12 +21,28 @@ All the SQL statements required to run the tests are specified in the [tpc-conf
},
```
Each entry in the `commit.intervals` array runs one benchmark test. This parameter should not be set to very high values.
Each entry in the `commit.intervals` array runs one benchmark test.
This parameter should not be set to very high values.
Test section jdbc is nessesary for the jdbc connection driver information. e.g.
It need the driver information form "connector.class" in the register.json only
"jdbc": {
"db2": {
"jdbcdriver": "com.ibm.db2.jcc.DB2Driver",
"jar" : "jcc-11.5.0.0.jar",
....
An additional parameter is needed for a test run in a self-contained environment.
Params for db2 are complete, for other database flavors fill out the form accordingly, please.
"tpctable": "",
"initsql": [ ... ],
"enablecdctablesql": [ ... ]
## Benchmark on existing environment (DB Server / Kafka / Connector)
If you have an existing up und running Debezium environment, you can do the benchmark test by following these steps:
If you have an existing up and running Debezium environment, you can do the benchmark test by following these steps:
- Build the benchmark docker image
``` docker build -t debezium-benchmark . ```
@ -31,7 +53,7 @@ If you have an existing up und running Debezium environment, you can do the benc
``` CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INTEGER not null GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1), T0 TIMESTAMP NOT NULL GENERATED BY DEFAULT FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP, PRIMARY KEY (ID) ) ORGANIZE BY ROW ```
- SQL create table for SQLServer
``` CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INT IDENTITY(1,1) PRIMARY KEY, T0 TIMESTAMP DATETIME NULL DEFAULT GETDATE() ) ```
- SQL crete table for MySQL
- SQL create table for MySQL
``` CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INTEGER NOT NULL AUTO_INCREMENT, T0 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ```
- Whitelist the TPC.TEST table in your Denbezium connector config JSON
@ -62,7 +84,7 @@ If you have an existing up und running Debezium environment, you can do the benc
## Benchmark in self-contained environment
## Benchmark in a self-contained environment
You will need the following to run the tests on CentOS:

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

View File

@ -31,28 +31,28 @@
]
},
"mysql": {
"jdbcdriver": "com.ibm.db2.jcc.DB2Driver",
"jdbcdriver": "com.mysql.cj.jdbc.Driver",
"jar": "mysql-connector-java-8.0.19.jar",
"tpctable": "CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INTEGER NOT NULL AUTO_INCREMENT, T0 TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ",
"initsql": [],
"enablecdctablesql": []
},
"oracle": {
"jdbcdriver": "com.ibm.db2.jcc.DB2Driver",
"jdbcdriver": "com.oracle.ojdbc",
"jar": "ojdbc10-19.3.0.0.jar",
"tpctable": "",
"initsql": [],
"enablecdctablesql": []
},
"sqlserver": {
"jdbcdriver": "com.ibm.db2.jcc.DB2Driver",
"jdbcdriver": "com.microsoft.sqlserver",
"jar": "mssql-jdbc-8.2.0.jre8.jar",
"tpctable": "CREATE TABLE TPC.TEST ( USERNAME VARCHAR(32) NOT NULL, NAME VARCHAR(64), BLOOD_GROUP CHAR(3), RESIDENCE VARCHAR(200), COMPANY VARCHAR(128), ADDRESS VARCHAR(200), BIRTHDATE DATE, SEX CHAR(1), JOB VARCHAR(128), SSN CHAR(11), MAIL VARCHAR(128), ID INT IDENTITY(1,1) PRIMARY KEY, T0 TIMESTAMP DATETIME NULL DEFAULT GETDATE() )",
"initsql": [],
"enablecdctablesql": []
},
"postgress": {
"jdbcdriver": "com.ibm.db2.jcc.DB2Driver",
"postgresql": {
"jdbcdriver": "org.postgresql.Driver",
"jar": "postgresql-9.1-901.jdbc4.jar",
"tpctable": "",
"initsql": [],

View File

@ -13,7 +13,6 @@ from pprint import pprint
import requests
import datetime
import threading
import jpype
@ -154,12 +153,12 @@ def main(argv):
print('tpc-connector deleted')
pass
dockerbootstrapserver = config['config']['database.history.kafka.bootstrap.servers']
bootstrapserver = config['config']['database.history.kafka.bootstrap.servers'].split(",")
bootstrapserver = config['config']['database.history.kafka.bootstrap.servers'].split(
",")
# check integrated test ( all in one docker)
if dockerbootstrapserver == 'kafka:9092' :
if dockerbootstrapserver == 'kafka:9092':
print(bootstrapserver)
kafkaadmin = KafkaAdminClient(bootstrap_servers=bootstrapserver)