Updated the README and added a CONTRIBUTE.md file with details for developers.

This commit is contained in:
Randall Hauch 2016-01-25 13:03:25 -06:00
parent 671172a6d3
commit b7f2221107
2 changed files with 202 additions and 14 deletions

157
CONTRIBUTE.md Normal file
View File

@ -0,0 +1,157 @@
## Contributing to Debezium
The Debezium community welcomes anyone that wants to help out in any way, whether that includes reporting problems, helping with documentation, or contributing code changes to fix bugs, add tests, or implement new features. This document outlines the basic steps required to work with and contribute to the Debezium codebase.
### Install the tools
The following software is required to work with the Debezium codebase and build it locally:
* [Git 2.2.1](https://git-scm.com) or later
* [JDK 8](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) or [OpenJDK 8](http://openjdk.java.net/projects/jdk8/)
* [Maven 3.2.1](https://maven.apache.org/index.html) or later
* [Docker Engine 1.6](http://docs.docker.com/engine/installation/) or later
See the links above for installation instructions on your platform. You can verify the versions are installed and running:
$ git --version
$ javac -version
$ mvn -version
$ docker --version
### GitHub account
Debezium uses [GitHub](GitHub.com) for its primary code repository and for pull-requests, so if you don't already have a GitHub account you'll need to [join](https://github.com/join).
### Fork the Debezium repository
Go to the [Debezium repository](https://github.com/debezium/debezium) and press the "Fork" button near the upper right corner of the page. When finished, you will have your own "fork" at `https://github.com/<your-username>/debezium`, and this is the repository to which you will upload your proposed changes and create pull requests. For details, see the [GitHub documentation](https://help.github.com/articles/fork-a-repo/).
### Clone your fork
At a terminal, go to the directory in which you want to place a local clone of the Debezium repository, and run the following commands to use HTTPS authentication:
$ git clone https://github.com/<your-username>/debezium.git
If you prefer to use SSH and have [uploaded your public key to your GitHub account](https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/), you can instead use SSH:
$ git clone git@github.com:<your-username>/debezium.git
This will create a `debezium` directory, so change into that directory:
$ cd debezium
This repository knows about your fork, but it doesn't yet know about the official or ["upstream" Debezium repository](https://github.com/debezium/debezium). Run the following commands:
$ git remote add upstream https://github.com/debezium/debezium.git
$ git fetch upstream
$ git branch --set-upstream-to=upstream/master master
Now, when you check the status using Git, it will compare your local repository to the *upstream* repository.
### Get the latest upstream code
You will frequently need to get all the of the changes that are made to the upstream repository, and you can do this with these commands:
$ git fetch upstream
$ git pull upstream master
The first command fetches all changes on all branches, while the second actually updates your local `master` branch with the latest commits from the `upstream` repository.
### Making changes
Everything the community does with the codebase -- fixing bugs, adding features, making improvements, adding tests, etc. -- should be described by an issue in our [JIRA](http://issues.jboss.org/browse/DEBEZIUM). If no such issue exists for what you want to do, please create an issue with a meaningful and easy-to-understand description, and assign the issue to yourself. On the other hand, if you're working on an existing issue, simply assign the issue to yourself so that other people know you're working on it.
Before you make any changes, be sure to switch to the `master` branch and pull the latest commits on the `master` branch from the upstream repository. Also, it's probably good to run a build and verify all tests pass *before* you make any changes.
$ git checkout master
$ git pull upstream master
$ mvn clean install
Once everything builds, create a *topic branch* named with the issue number (e.g,. `DEBEZIUM-1234`):
$ git checkout -b DEBEZIUM-1234
This branch exists locally and it is here that you should make all of your proposed changes for the issue. As you'll soon see, it will ultimately correspond to a single pull request that the Debezium committers will review and merge (or reject) as a whole. (Some issues are big enough that you may want to make several separate but incremental sets of changes. In that case, you can create subsequent topic branches for the same issue by appending a short suffix to the branch name.)
Your changes should include changes to existing tests or additional unit and/or integration tests that verify your changes work. We recommend frequently running related unit tests (in your IDE or using Maven) to make sure your changes didn't break anything else, and that you also periodically run a complete build using Maven to make sure that everything still works:
$ mvn clean install
Feel free to commit your changes locally as often as you'd like, though we generally prefer that each commit represent a complete and atomic change to the code. Often, this means that most issues will be addressed with a single commit in a single pull-request, but other more complex issues might be better served with a few commits that each make separate but atomic changes. (Some developers prefer to commit frequently and to ammend their first commit with additional changes. Other developers like to make multiple commits and to then squash them. How you do this is up to you. However, *never* change, squash, or ammend a commit that appears in the history of the upstream repository.) When in doubt, use a few separate atomic commits; if the Debezium reviewers think they should be squashed, they'll let you know when they review your pull request.
Committing is as simple as:
$ git commit .
which should then pop up an editor of your choice in which you should place a good commit message. We do expect that all commit messages begin with a line starting with the JIRA issue and ending with a short phrase that summarizes what changed in the commit. If that phrase is not sufficient to explain your changes, then the first line should be followed by a blank line and one or more paragraphs with additional details. For example:
DEBEZIUM-1234 Expanded the MySQL integration test and correct a unit test.
or
```
DEBEZIUM-1234 Added support for ingesting data from PostgreSQL.
The new ingest library supports PostgreSQL 9.4 or later. It requires a small plugin to be installed
on the database server, and the database to be configured so that the ingest component can connect
to the database and use logical decoding to read the transaction log. Several new unit tests and one
integration test were added.
```
### Rebasing
If its been more than a day or so since you created your topic branch, we recommend *rebasing* your topic branch on the latest `master` branch. This requires switching to the `master` branch, pulling the latest changes, switching back to your topic branch, and rebasing:
$ git checkout master
$ git pull upstream master
$ git checkout DEBEZIUM-1234
$ git rebase master
If your changes are compatible with the latest changes on `master`, this will complete and there's nothing else to do. However, if your changes affect the same files/lines as other changes have since been merged into the `master` branch, then your changes conflict with the other recent changes on `master`, and you will have to resolve them. The git output will actually tell you you need to do (e.g., fix a particular file, stage the file, and then run `git rebase --continue`), but if you have questions consult Git or GitHub documentation or spend some time reading about Git rebase conflicts on the Internet.
### Creating a pull request
Once you're finished making your changes, your topic branch should have your commit(s) and you should have verified that your branch builds successfully. At this point, you can shared your proposed changes and create a pull request. To do this, first push your topic branch (and its commits) to your fork repository (called `origin`) on GitHub:
$ git push origin DEBEZIUM-1234
Then, in a browser go to https://github.com/debezium/debezium, and you should see a small section near the top of the page with a button labeled "Create pull request". GitHub recognized that you pushed a new topic branch to your fork of the upstream repository, and it knows you probably want to create a pull request with those changes. Click on the button, and GitHub will present you with a short form that you should fill out with information about your pull request. The title should start with the JIRA issue and ending with a short phrase that summarizes the changes included in the pull request. (If the pull request contains a single commit, GitHub will automatically prepopulate the title and description fields from the commit message.)
When completed, press the "Create" button and copy the URL to the new pull request. Go to the corresponding JIRA issue and record the pull request by pasting the URL into the "Pull request" field. (Be sure to not overwrite any URLs that were already in this field; this is how a single issue is bound to multiple pull requests.) Also, please add a JIRA comment with a clear description of what you changed. You might even use the commit message (except for the first line).
At this point, you can switch to another issue and another topic branch. The Debezium committers will be notified of your new pull request, and will review it in short order. They may ask questions or make remarks using line notes or comments on the pull request. (By default, GitHub will send you an email notification of such changes, although you can control this via your GitHub preferences.)
If the reviewers ask you to make additional changes, simply switch to your topic branch for that pull request:
$ git checkout DEBEZIUM-1234
and then make the changes on that branch and either add a new commit or ammend your previous commits. When you've addressed the reviewers' concerns, push your changes to your `origin` repository:
$ git push origin DEBEZIUM-1234
GitHub will automatically update the pull request with your latest changes, but we ask that you go to the pull request and add a comment summarizing what you did. This process may continue until the reviewers are satisfied.
By the way, please don't take offense if the reviewers ask you to make additional changes, even if you think those changes are minor. The reviewers have a broach understanding of the codebase, and their job is to ensure the code remains as uniform as possible, is of sufficient quality, and is thoroughly tested. When they believe your pull request has those attributes, they will merge your pull request into the official upstream repository.
Once your pull request has been merged, feel free to delete your topic branch both in your local repository:
$ git branch -d DEBEZIUM-1234
and in your fork:
$ git push origin :DEBEZIUM-1234
(This last command is a bit strange, but it basically is pushing an empty branch (the space before the `:` character) to the named branch. Pushing an empty branch is the same thing as removing it.)
### Summary
Here's a quick check list for a good pull request (PR):
* Discussed and approved on IRC or the mailing list
* A JIRA associated with your PR (include the JIRA issue number in commit comment)
* One commit per PR
* One feature/change per PR
* No changes to code not directly related to your change (e.g. no formatting changes or refactoring to existing code, if you want to refactor/improve existing code that's a separate discussion and separate JIRA issue)
* A full build completes succesfully
* Do a rebase on upstream `master`

View File

@ -1,49 +1,62 @@
[![Build Status](https://travis-ci.org/debezium/debezium.svg?branch=master)](https://travis-ci.org/debezium/debezium)
[![License](http://img.shields.io/:license-apache-brightgreen.svg)](http://www.apache.org/licenses/LICENSE-2.0.html)
Copyright 2008-2015 Red Hat and contributors. Licensed under the Apache License, Version 2.0.
Copyright 2008-2016 Debezium Authors. Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
# Debezium
Debezium is an open source project for change data capture tools. With its libraries, your application can monitor databases and receive detailed events for each row-level change made to the database. Only committed changes are visible, so your application doesn't have to worry about transactions or changes that are rolled back. Debezium provides a single model of all change events, so your application does not have to worry about the intricacies of each kind of database management system. Additionally, when your application restarts it is able to consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely.
Debezium is an open source project that provides a near real-time data streaming platform for change data capture (CDC). You setup and configure Debezium to monitor your databases, and then your application uses Debezium libraries to receive detailed events for each row-level change made to the database. Only committed changes are visible, so your application doesn't have to worry about transactions or changes that are rolled back. Debezium provides a single model of all change events, so your application does not have to worry about the intricacies of each kind of database management system. Additionally, since Debezium records the history of data changes in its logs, your application can be stopped and restarted at any time, and it will be able to consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely.
Monitoring databases and being notified when data changes has always been complicated. Relational database triggers can be useful, but are often limited to updating state within the same database (not communicating with external processes) and often don't work with views or schema tables. Some databases offer APIs or frameworks for monitoring changes, but there is no standard so each database's approach is different and requires a lot of knowledged and specialized code.
Monitoring databases and being notified when data changes has always been complicated. Relational database triggers can be useful, but are specific to each database and often limited to updating state within the same database (not communicating with external processes). Some databases offer APIs or frameworks for monitoring changes, but there is no standard so each database's approach is different and requires a lot of knowledged and specialized code. It still is very challenging to ensure that all changes are seen and processed in the same order while minimally impacting the database.
Debezium provides modules that do this work for you. Some modules are generic and work with multiple database management systems, but are also a bit more limited in functionality and performance. Other modules are tailored for specific database management systems, so they are often far more capable and they leverage the specific features of the system.
## Common use cases
There are a number of scenarios in which Debezium can be extremely valuable, but here we outline just a few of the more common ones.
There are a number of scenarios in which Debezium can be extremely valuable, but here we outline just a few of them that are more common.
### Cache invalidation
Automatically purge and/or update a second-level cache when the record(s) for entries change or are removed. The cache and monitoring logic can be in separate processes from the application(s).
Automatically invalidate entries in a cache as soon as the record(s) for entries change or are removed. If the cache is running in a separate process (e.g., Redis, Memcache, Infinispan, and others), then the simple cache invalidation logic can be placed into a separate process or service, simplifying the main application. In some situations, the logic can be made a little more sophisticated and can use the updated data in the change events to update the affected cache entries.
### Updating multiple data systems
### Simplifying monolithic applications
Many applications need to update a database and then do additional work after the commit, such as update separate search indexes, update a cache, send notifications, activate business logic, etc. If the application were to crash after the commit but before all of these other activities are performed, updates might be lost or other systems might become inconsistent or invalid. Using change data capture, these other activities can be performed in separate threads or separate processes/services when the data is committed in the original database. This approach is more tolerant of failures, does not miss events, scales better, and more easily supports upgrading and operations.
Many applications update a database and then do additional work after the changes are committed: update search indexes, update a cache, send notifications, run business logic, etc. This is often called "dual-writes" since the application is writing to multiple systems outside of a single transaction. Not only is the application logic complex and more difficult to maintain, dual writes also risk losing data or making the various systems inconsistent if the application were to crash after a commit but before some/all of the other updates were performed. Using change data capture, these other activities can be performed in separate threads or separate processes/services when the data is committed in the original database. This approach is more tolerant of failures, does not miss events, scales better, and more easily supports upgrading and operations.
### Pre-computed
### Sharing databases
### Event processing
When multiple applications share a single database, it is often non-trivial for one application to become aware of the changes committed by another application. One approach is to use a message bus, although non-transactional message busses suffer from the "dual-writes" problems mentioned above. However, this becomes very straightforward with Debezium: each application can monitor the database and react to the changes.
###
### Data integration
Data is often stored in multiple places, especially when it is used for different purposes and has slightly different forms. Keeping the multiple systems synchronized can be challenging, but simple ETL-type solutions can be implemented quickly with Debezium and simple event processing logic.
### CQRS
The [Command Query Responsibility Separation (CQRS)](http://martinfowler.com/bliki/CQRS.html) architectural pattern uses a one data model for updating and one or more other data models for reading. As changes are recorded on the update-side, those changes are then processed and used to update the various read representations. As a result CQRS applications are usually more complicated, especially when they need to ensure reliable and totally-ordered processing. Debezium and CDC can make this more approachable: writes are recorded as normal, but Debezium captures those changes in durable, totally ordered streams that are consumed by the services that asynchronously update the read-only views. The write-side tables can represent domain-oriented entities, or when CQRS is paired with [Event Sourcing](http://martinfowler.com/eaaDev/EventSourcing.html) the write-side tables are the append-only event log of commands.
## Requirements
## Building Debezium
The following software is required to work with the Debezium codebase and build it locally:
* [Git 2.2.1](https://git-scm.com) or later
* [JDK 8](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) or [OpenJDK 8](http://openjdk.java.net/projects/jdk8/)
* [Maven 3.2.1](https://maven.apache.org/index.html) or later
* [Docker Engine 1.4](http://docs.docker.com/engine/installation/) or later
* [Docker Engine 1.6](http://docs.docker.com/engine/installation/) or later
See the links above for installation instructions on your platform. You can verify the versions are installed and running:
$ git --version
$ javac -version
$ mvn -version
$ docker --version
See the links above for installation instructions on your platform.
### Why Docker?
Many open source software projects use Java and Maven, but requiring Docker is less common. Debezium is designed to talk to a number of external systems, such as various databases and services, and our integration tests verify Debezium does this correctly. But rather than expect you have all of these software systems installed locally, Debezium's build system uses Docker to automatically download or create the necessary images and start containers for each of the systems. The integration tests can then use these services and verify Debezium behaves as expected, and when the integration tests finish, Debezum's build will automatically stop any containers that it started.
Many open source software projects use Git, Java, and Maven, but requiring Docker is less common. Debezium is designed to talk to a number of external systems, such as various databases and services, and our integration tests verify Debezium does this correctly. But rather than expect you have all of these software systems installed locally, Debezium's build system uses Docker to automatically download or create the necessary images and start containers for each of the systems. The integration tests can then use these services and verify Debezium behaves as expected, and when the integration tests finish, Debezum's build will automatically stop any containers that it started.
Debezium also has a few modules that are not written in Java, and so they have to be required on the target operating system. Docker lets our build do this using images with the target operating system(s) and all necessary development tools.
@ -55,3 +68,21 @@ Using Docker has several advantages:
1. All builds are consistent. When multiple developers each build the same codebase, they should see exactly the same results -- as long as they're using the same or equivalent JDK, Maven, and Docker versions. That's because the containers will be running the same versions of the services on the same operating systems. Plus, all of the tests are designed to connect to the systems running in the containers, so nobody has to fiddle with connection properties or custom configurations specific to their local environments.
1. No need to clean up the services, even if those services modify and store data locally. Docker *images* are cached, so building them reusing them to start containers is fast and consistent. However, Docker *containers* are never reused: they always start in their pristine initial state, and are discarded when they are shutdown. Integration tests rely upon containers, and so cleanup is handled automatically.
### Building the code
First obtain the code by cloning the Git repository:
$ git clone https://github.com/debezium/debezium.git
$ cd keycloak
Then build the code using Maven:
$ mvn clean install
The build starts and uses several Docker containers for different DBMSes. Note that if Docker is not running, you'll likely get an arcane error -- if this is the case, always verify that Docker is running, perhaps by using `docker ps` to list the running containers.
## Contributing
The Debezium community welcomes anyone that wants to help out in any way, whether that includes reporting problems, helping with documentation, or contributing code changes to fix bugs, add tests, or implement new features. See [this document](CONTRIBUTE.md) for details.