Datasets

Dataset files allow the user to specify a number of tables and information about the data associated with those tables. Some of the information is used for data loading, while the rest can be used for data saving.

Why do datasets exist?

Some applications require different sets of data to be loaded depending on the nature of the database build being done. For instance, perhaps there is a database build that must be done to support a unit testing or functional testing strategy, while there may be a production dataset for production builds. If you find you have several different database builds, there is also likely to be a common set of data that is needed regardless of the database build, implying a shared dataset. For example, tables that contain constants often fall into the category of shared datasets.

Datasets example

All projects are likely to vary, however here is a common scenario:

Example datasets:

Constants data - shared between all builds.
Configuration data - may change between platforms or scenarios
Functional test application data - used for functional test script
Unit test data - needed for unit test execution

Example database builds:

For new deployments, you may choose to load the constants data plus the correct configuration data. No "dynamic" application data (e.g. data produced by an application as it is used) is loaded.
For functional testing you may choose to load the constants data, plus the configuration data plus a set of functional test data that represents a running application.
For unit tests you require the constants data, the configuration data plus the unit test data.

By using datasets the developer does not have to maintain identical data in multiple places. Done correctly this can be a powerful feature. Using datasets correctly requires some discipline and management otherwise the dataset files can fall out of sync with the data and you run the risk of data appearing in more than one datafile or no datafiles at all! Applying datasets correctly usually requires that you think about ID-ranges before you put Sqlch in place.

For situations where datasets overlap (i.e. information for a single table exists in multiple datasets), the user should specify range information in the dataset file. This allows DbExecute commands to selectively save and delete data from a table depending on the datasets "in play". Being selective ensures that a dataset can be saved and loaded correctly. E.g. if data from multiple datasets is loaded into a table, when the database is saved back to file, the appropriate data can be selected back out into the correct datasets.

Fundamentally, multiple datasets are an optional feature. If you are content to have a single dataset you can have a single dataset file and just have all data from each table in its own datafile.

Dataset file format

The dataset file is specified in XML. The root of the dataset file is always an element named <dataset>

A dataset is made up of a number of <table> elements that describe the tables present in the dataset. The attributes and sub-elements can vary depending on your database and your choices

Example dataset files

Basic:

This example provides the bare minimum for describing two tables in a dataset file. Note: Without specifying a fileName element, this wouldn't be particularly useful except to tell Sqlch that the table exists!

<dataset>
	<table name="DEMO_CHANNEL" />
	<table name="DEMO_SITE" />
</dataset>

More complex:

This example provides the additional information for describing two tables in a dataset file. It specifies that data should be dumped in a particular order, specfied by the column name to order the data by.

<dataset>
	<table name="DEMO_CHANNEL">
 		<fileName>DEMO_CHANNEL.dat</fileName>
		<orderBy>CHANNEL_ID</orderBy>
	</table>
	<table name="DEMO_SITE">
 		<fileName>DEMO_SITE.dat</fileName>
		<orderBy>SITE_ID</orderBy>
	</table>
</dataset>

Table element attributes

The <table> element has the following attributes:

Attribute	Description	Required
catalog	The name of the catalog (as recognised by JDBC) for the table. The use of this attribute is dependent on the database being used as well as your chosen database setup. Developers can choose the value * (asterisk) as a wildcard to match "any" catalog. See the table identifier page for more information.	No.
schema	The name of the JDBC schema for the table. The use of this attribute is dependent on the database being used as well as your chosen database setup. Developers can choose the value * (asterisk) as a wildcard to match "any" shema. See the table identifier page for more information.	No
name	The name of the table. See the table identifier page for more information.	Yes

Table element sub-elements

fileName

This element is optional and can only appear at most once per table.

The <fileName> element allows users to specify the name of the file where the data for this table should be found. The filename should use "/" characters for directory separators. The location is relative - the base is dependent on the interpreting command.

orderBy

This element is optional and can only appear at most once per table.

The <orderBy> element allows users to specify the order the data should appear in datafiles. This is used when saving data from a database. Good choice of ordering ensures that datafiles are easy to compare with earlier versions. Poor choice can lead to major headaches when comparing datafile versions. A poor choice of orderBy is when you choose a set of columns that would not be suitable as a primary key since it could lead to a random order to rows that have the same values for those columns.

The orderBy is effectively the expression you would expect to see in the ORDER BY clause of a SQL statement. By default, the user can only specify columns from the table being defined itself. However, sometimes the data in a dataset can be related to data held in another table, for this reason the users can specify "related" tables which are included. See the related table element for further information on importing tables.

If you do not specify the order, the primary key of the table is used (if there is one).

Important Note: In self-referential tables, users must be careful to ensure that the order is correct such that "root" rows are loaded first and dependent rows are loaded later. Sqlch does not do this for you. Failure to do this could mean that a table that saves correctly cannot be loaded again without re-ordering the datafile manually or disabling constraint checking.

dataRange

This element is optional and should only appear at most once per table. Failure to specify a dataRange will lead to a default being picked - read below to see the rules for the picked value.

The <dataRange> element indicates the range of data present in this dataSet. There are three values this element can take:

all - this dataset contains all table data. Use this when you only have one dataSet and therefore all your data exists in that one dataSet.
none - this dataset contains no table data. Use this when the table is being included in the data set for purposes other than data loading/saving/deleting. E.g. for constraint disabling.
criteria - this dataset contains a subset of data, specified by the <criteria> and <relatedTable> elements. Use this when you have multiple dataSets that contain data for a table.

If the <dataRange> element is not present, Sqlch picks a sensible default value based on the other elements present:

If a <fileName> element is not present, it is assumed you have no data to load or save, so the dataSet is contains no data. In this situation a default value of "none" is used.
If a <fileName> element is present, but no <criteria> element is found, it is assumed that the dataSet contains all table data. In this situation a value of "all" is used.
If a <criteria> element is found. In this situation a value of "criteria" is used.

Note: Specifying a criteria but no fileName is permitted by Sqlch but the load and save commands will error since they will have nowhere to load and save the specified rows to.

Simple "all" example:

The following is a simple example that indicates that all data for this table should be part of this dataset.

    ...
	<table name="DEMO_LANGUAGE">
		<fileName>DEMO_LANGUAGE.dat</fileName>
		<dataRange>all</dataRange>
	</table>
    ...

Note: The dataRange line is optional since a fileName element has been supplied.

Simple "none" example:

The following is a simple example that indicates that no rows should be part of this dataset.

    ...
	<table name="DEMO_LANGUAGE">
		<dataRange>none</dataRange>
	</table>
    ...

Note: The dataRange line is optional since a fileName element has not supplied.

Note: If you specify a value of "none", specifying a file ceases to make any sense. Since there is no data to load or to save. Specifying both a value of "none" and a file name will cause some commands to fail with an error.

criteria

This element is mandatory when the dataRange is set to "criteria" and should only appear at most once per table.

The <criteria> element allows users to specify the range of data present in the dataset using a specified criteria. This is used when saving or deleting a dataset to ensure that only the rows that are part of a dataset are saved in the datafiles.

The criteria is effectively the expression you would expect to see in the WHERE clause of a SQL statement. By default, the user can only specify columns from the table being defined itself. However, sometimes the data in a dataset can be related to data held in another table, for this reason the users can specify "related" tables which are included. See the related table element for further information on importing tables.

Simple example:

The following is a simple example that indicates that all languages besides the one with ID == 1 should be part of this dataset.

    ...
	<table name="DEMO_LANGUAGE">
		<fileName>DEMO_LANGUAGE.dat</fileName>
		<dataRange>criteria</dataRange>
		<criteria>LANGUAGE_ID NOT IN ( 1 )</criteria>
	</table>
    ...

relatedTable

This element is optional and should only appear at most once per table. It can only be used with a dataRange value of "criteria".

The <relatedTable> element is used to "import" a table into the selection criteria for commands. By default only the table being defined is visible. The relatedTable element allows users to define additional tables and their aliases. By default, the table being defined is given the alias "THIS".

The relatedTable element has the following attributes:

Attribute	Description	Required
sqlId	The ID of the related table. This ID must be recognised by the database engine being used and therefore is not guaranteed to be portable between databases.	Yes
alias	The alias for the table.	Yes

Example

This example shows how the "DEMO_ELEMENT_ROOT" table can be "imported" into the range criteria for the "DEMO_ELEMENT" table. Only those elements that have a parent with the name "Test" are included in the dataset.

    ...
	<table name="DEMO_ELEMENT">
           <fileName>DEMO_ELEMENT.dat</fileName>
           <dataRange>criteria</dataRange>
           <relatedTable sqlId="MYSCHEMA.DEMO_ELEMENT_ROOT" alias="ELEMENT_ROOT"/>
           <criteria>THIS.ROOT_ID = ELEMENT_ROOT.ROOT_ID AND ELEMENT_ROOT.NAME = 'Test' </criteria>
	</table>
    ...

columns

This element is optional and should only appear at most once per table.

The <columns> element is used to specify the columns to be included in a datafile and the order they should appear. The columns element contains sub-elements that specify the column names. This element is useful for excluding columns when some are auto-generated by the database - e.g. auto increment/sequence ids or where the value is unimportant (e.g. if NULL is mapped to a particular value).

If you do not specify the columns, all columns are selected in alphabetical order of the column name. This allows users to not specify and not be left at the mercy of the database choosing a "random" default order.

Example

    ...
	<table name="DEMO_LANGUAGE">
        <fileName>DEMO_LANGUAGE.dat</fileName>
        <dataRange>criteria</dataRange>
        <criteria>LANGUAGE_ID NOT IN ( 1 )</criteria>
        <columns>
             <column>LANGUAGE_ID</column>
             <column>LANGUAGE_CODE</column>
        </columns>
	</table>
    ...

Sqlch SourceForge Distribution: 0.1Alpha8 20041026 2208