Summary of features/the value of Sqlch

Continuous Integration

Sqlch is a set of tools that are intended to provide developers with the ability to recreate a database quickly and easily. Just as regularly doing a complete Java code rebuild is a good idea, doing one for the database at the same time is also a good idea. Frequent rebuilds ensure you are always using the data (and schema) to go with your code and reduces the risk of build-breaks for other people when you check in. Sqlch is designed to be as simple and quick to use as possible, lowering the barriers to usage. Currently Sqlch only really deals with loading and saving data - it is up to developers to provide scripts that rebuild the schema itself. I would suggest you use the Ant SqlExec task until Sqlch is extended to provide this.

If you're not interested in rebuilding your schema every time, you can use Sqlch to clear out all or part of your data before loading in a fresh set.

Revision control

Sqlch makes it easy to store your data in a revision control system. Sqlch allows you to specify the order in which rows are retrieved by allowing users to specify the column order and row order. If users do not specify, the tool chooses sensible defaults. This means that identical data should result in identical data files being produced that can be compared easily using the diff command (e.g. as used by cvs). This allows developers to see the changes they have made quickly without the risk that they check in changes they were unable to spot but didn't mean to make.

Loose/human readable data format

The datafiles that Sqlch uses are very simple to understand. All non-binary data is converted into human-readable Strings. This means that data types can be changed without always having to change the data files. For instance, change a column from a numeric to a varchar should have no impact.

Sqlch data files are all UTF-8 meaning that a large number of text editors can load and save them.

Database independence

JDBC is used throughout. There are hooks to allow developers to write to driver-specific APIs for non-standard datatypes or driver features, but at it's heart, the tool uses pure JDBC.

If you're using a fairly standard database you shouldn't have much of a problem configuring Sqlch: most of the abstraction has been done for you by JDBC.

A Java tool

If you're a Java developer, you may only understand Java and Ant. If something goes wrong with Sqlch you stand an excellent chance of working out what is going wrong. Also, there is a good chances that Sqlch uses the same mechanisms to insert data into your database as your Java application - this means you shouldn't have to worry about formats or inconsistencies creeping into your data.

Although this may seem obvious, Sqlch is platform independent. The tool should run equally happily on any platform that supports it's requirements. The data loaded is not impacted by the platform it was loaded from.

Flexibility/Extensibility

Sqlch has been designed to be flexible. It has various hooks that enable you to configure your own classes and your own scripts (or script templates) to be used as part of the data-load and data-save process.

Sqlch does not assume that you always want to do a complete data-load - it allows you to have multiple data-sets: you can choose to load certain tables or rows if you desire. By configuring your data-sets you can choose to re-load only data in particular datasets.

Dependency resolution

Sqlch can deal with all the inter-table dependencies for you. If you choose to, you can have Sqlch configured so that tables are only loaded once all the tables they depend on (i.e. have foreign key relationship with) have already been loaded. As you change your schema Sqlch can recognise the changes and change the order that tables are loaded.

Note: Sqlch recognises that some databases can have dependency checking turned off. Hooks and commands are provided to take advantage of delayed/disabled dependency checking. Disabling dependency checks is one way of making maximum use of Sqlch's multi-threaded capabilities.

Performance

Sqlch uses JDBC and consequently it is never going to be as fast as a native, database-specific tool. If you're looking for a tool to use for your latest data warehouse, you haven't found it. However, Sqlch uses several mechanisms to make it as quick as it can be without sacrificing any of the other goals of the tool.

The main approaches to achieving good performance during data-load:

Infrequent commits - Transactions can be used to batch up multiple inserts (even if they have been inserted individually). This makes a large performance impact on most database implementations.
Batch inserts - The tool provided the ability to use JDBC insert batching. Depending on your database and JDBC driver implementation, this can make a reasonable impact on performance. Single inserts are also supported for less capable drivers and databases.
Multiple threads - The load tool can load tables concurrently (when safe to do so). This is suitable situations where you have a large database server with multiple processors and a fast network (or if you are connecting to a locally hosted database).
Simple data file format - The tool uses a simple-to-parse data format that requires minimal overhead to interpret.
Connection re-use - Most of the connections used by Sqlch are re-used for multiple table loads.

The main approaches to achieving good performance during data-save:

Configurable fetch-size - the tool can be configured to use the JDBC fetch count to determine the number of rows to be retrieved from the database in one operation.

What Sqlch is not/Disadvantages of Sqlch

DDL/Schema tool

While you can use some parts of Sqlch to alter your schema using DDL, Sqlch currently assumes your schema is in place when you come to execute an operation.

Loading DDL can be accomplished by executing SQL scripts either using the Ant SqlExec task or through a database-specific mechanism.

Driver Bugs

Bitter experience teaches us that most JDBC drivers have annoying or fatal bugs, often in the lesser used parts of the JDBC API. Since Sqlch uses some of these parts (mainly associated with retrieiving table and column metadata), it can often stumble into the bugs rendering it useless. Where possible, Sqlch provides the ability for users to specify using files the information that would otherwise have been gleaned from the driver. This is often at the expense of having Sqlch automatically detect changes to your schema but allows users to continue using the tool until the driver bug is fixed.

Sqlch also doesn't automatically trust Java datatype information coming from the JDBC driver and it allows users to specify the mapping from SQL data type to Java datatype.

Sqlch SourceForge Distribution: 0.1Alpha8 20041026 2208