Wednesday, July 14, 2010

Table Partitioning in DB2 Express - Part II

Hi

As a direct continuation to the previous post on that subject, I decided to post here the solution that I have developed for one of my clients, so you all can enjoy it. I do ask that if anyone improves the implementation, please send me a copy so I can post it here for everyone.

All the scripts that create the objects required for this solution can be found in the resources page on our website.

Architecture

The implementation uses a feature called partitioned view. This feature allows one to create several identical tables, and then create a view which perform select against those tables with the union all set operator. In order to be able to insert records into this view, a check constraint must be defined on every base table in a way that no overlapping exists. When a row is inserted into the view, DB2 knows into which table to insert the row according to the check constraints that are defined on the tables.
In order to help the optimizer access only the relevant tables when a query is issued against the view, it is recommended to specify where predicates in the view definition which filter the rows of every table with the same predicate as the check constraint.

Control Tables

CREATE TABLE T_CONTROL_PARTITIONED_TABLES (
    VIEW_NAME              VARCHAR (128) NOT NULL,
    VIEW_SCHEMA            VARCHAR (128) NOT NULL,
    MODEL_TABLE_NAME       VARCHAR (128) NOT NULL,
    MODEL_TABLE_SCHEMA     VARCHAR (128) NOT NULL,
    TARGET_TABLE_NAME      VARCHAR (128) NOT NULL,
    TARGET_TABLE_SCHEMA    VARCHAR (128) NOT NULL,
    PARTITION_COLUMN       VARCHAR (128) NOT NULL
);

The table T_CONTROL_PARTITIONED_TABLES holds a single record for each partitioned view. VIEW_NAME and VIEW_SCHEMA specifies the object name of the partitioned view, this name will be used by the application (for example, DB2SP.T_EVENT).
When creating a partition in the partitioned view, the mechanism needs to create a table and indexes for that partition (the table needs to be identical to the other partitions, the indexes do not have to be identical). The columns MODEL_TABLE_NAME and MODEL_TABLE_SCHEMA holds the name of the table that will be duplicated as the new partition. The new partition is created using the statement
CREATE TABLE X.Y LIKE MODEL_TABLE_SCHEMA.MODEL_TABLE_NAME INCLUDING COLUMN DEFAULTS;

This implementation does not support ROW CHANGE TIMESTAMP columns, so if such a column is needed, it is possible to change the implementation so use regular create statement.

Index creation is done by generating create statements for the indexes that are defined on the model table.
The new table will be created in the same tablespace as the model table.
The new indexes will be created in the tablespace that was specified in the model table creation.

The columns TARGET_TABLE_NAME and TARGET_TABLE_SCHEMA contains the name of the target tables that will be created for each partition. Each partition will be based on a table named TARGET_TABLE_SCHEMA.TARGET_TABLE_NAME_XXXXXXXXXX where XXXXXXXXXX is the partition number, for example, the first partition will be  TARGET_TABLE_NAME_0000000001.

PARTITION_COLUMN is the name of the column that is used to partition the rows between the tables.


CREATE TABLE T_CONTROL_PARTITIONS (
    VIEW_NAME       VARCHAR (128) NOT NULL,
    VIEW_SCHEMA     VARCHAR (128) NOT NULL,
    SEQ             INT  NOT NULL,
    RANGE_START     VARCHAR(1000) NOT NULL,
    RANGE_END       VARCHAR(1000) NOT NULL,
    IS_ACTIVE       SMALLINT
);

The table T_CONTROL_PARTITIONS holds a single record for each partition in each partitioned view.
The columns VIEW_NAME and VIEW_SCHEMA identifies the partitioned view. The column SEQ holds the partition number.
RANGE_START and RANGE_END are the values that are contained in this partition. When we add a new partition, the range of values that will be in that partition is checked by the check constraint : PARTITION_COLUMN >= RANGE_START AND PARTITION_COLUMN < RANGE_END
ׂ(Note that the RANGE_END value is exclusive, that is, the value of RANGE_END itself will not be in this partition).
The IS_ACTIVE column is an indication if this partition is active in the partitioning view, that is, when the mechanism generates the partitioning view, it will only use partitions with IS_ACTIVE=1 (the same goes for deleting - only partitions with IS_ACTIVE=1 can be deleted).

The IS_ACTIVE column gives the user the chance to "Archive" old partitions. By setting the IS_ACTIVE column to 0 for a specific partition, and recreating the view, that partition is archived. If the user would like to "Restore" this archived partitioned, all it needs to do is to set IS_ACTIVE=1 and recreate the view.


CREATE TABLE T_CONTROL_LOG (
    REC_ID         INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1),
    REC_TIME     timestamp with default,
    MSG_TEXT     VARCHAR(32000)
);

The T_CONTROL_LOG table will hold log records that are generated by the various procedures. Log records are written only if everything works fine, otherwise a rollback is performed and all log records are discarded.


API - Stored Procedures

SP_ADD_PARTITION (IN V_VIEW_NAME          VARCHAR(128),
                  IN V_VIEW_SCHEMA        VARCHAR(128),
                  IN V_NEW_RANGE_START    VARCHAR(2000),
                  IN V_NEW_RANGE_END      VARCHAR(2000),
                  OUT V_RC                INTEGER,
                  OUT V_ERROR_TEXT        VARCHAR(32672) )

V_VIEW_NAME and V_VIEW_SCHEMA - these parameters holds the name of the partitioned view to which we want to add a partition. The values in these columns are used to perform the lookup in the control tables.
V_NEW_RANGE_START and V_NEW_RANGE_END - these values represents the values that will be stored in this partition.

NOTE - when specifying V_NEW_RANGE_START and V_NEW_RANGE_END you must specify a legal value that can be compared without the need of conversion to the data type of the partitioning columns, that is, if for example the partitioning column is of type timestmap, you need these values to be surrounded with quotes, for example, '2010-06-20 15:13:34.123456', so, if you want to call this procedure from java, and you pass the variable j_startTime for the V_NEW_RANGE_START parameter, you will need to assign the java variable by using
j_startTime = "'2010-06-20 15:13:34.123456'"
and not 
j_startTime = "2010-06-20 15:13:34.123456"

For a partitioning column of type int for example, you can use
j_startRange = "1"
j_endRange = "100"

(the data type of the stored procedure parameter is varchar in order to support all possible data types of the partitioning column).

V_RC and V_ERROR_TEXT are used to return status to the application. If V_RC = 0 and V_ERROR_TEXT = 'NO ERROR' it means everything went OK.

The procedure SP_ADD_PARTITION performs the following actions:
1. Create a table for the new partition
2. Create a check constraint on that table
3. Create all necessary indexes
4. Regenerate the partitioned view

Possible Errors:
V_RC = -8, V_ERROR_TEXT = 'NO ROWS FOUND IN T_CONTROL_PARTITIONED_TABLES FOR VIEW_NAME X.Y' - this means that the view that was specified with V_VIEW_NAME and V_VIEW_SCHEMA could not be found in the T_CONTROL_PARTITIONED_TABLES, so you probably have a misprint, or, you forgot to insert a record to the control table.
  
V_RC = (some SQLCODE from DB2), V_ERROR_TEXT = 'EXECUTING RETURNED ' - lets hope this will not happen, but if it does, it depends on the error.


SP_RECREATE_PVIEW (IN  V_VIEW_NAME     VARCHAR(128),
                   IN  V_VIEW_SCHEMA   VARCHAR(128),
                   OUT V_RC            INTEGER,
                   OUT V_ERROR_TEXT    VARCHAR(32672) )

This procedure recreates the partitioned view, based on the active partitions in T_CONTROL_PARTITIONS.
Note - If at any point you would like to grant permissions on this view to any user except the owner of the view (which is determined by V_VIEW_SCHEMA) you will have to modify this procedure to execute the necessary grant statements.

V_VIEW_NAME and V_VIEW_SCHEMA represents the partitioned view name.
V_RC and V_ERROR_TEXT are status variables that the procedure returns. If V_RC = 0 and V_ERROR_TEXT = 'NO ERROR' it means everything went OK.
Possible Errors:
V_RC = -8, V_ERROR_TEXT = 'NO ROWS FOUND IN T_CONTROL_PARTITIONED_TABLES FOR VIEW_NAME X.Y' - this means that the view that was specified with V_VIEW_NAME and V_VIEW_SCHEMA could not be found in the T_CONTROL_PARTITIONED_TABLES, so you probably have a misprint, or, you forgot to insert a record to the control table.
  
V_RC = -8, V_ERROR_TEXT = 'NO ROWS FOUND IN T_CONTROL_PARTITIONS FOR VIEW_NAME X.Y' - this means that the view that was specified with V_VIEW_NAME and V_VIEW_SCHEMA has no active partitions in T_CONTROL_PARTITIONS, so you probably have a misprint, or, you forgot to call SP_ADD_PARTITION before calling this procedure.


V_RC = (some SQLCODE from DB2), V_ERROR_TEXT = 'EXECUTING RETURNED ' - lets hope this will not happen, but if it does, it depends on the error.

This procedure is invoked by SP_ADD_PARTITION and SP_REMOVE_PARTITION but it can be invoked by user at any time.



CREATE PROCEDURE SP_REMOVE_PARTITION 
   (IN V_VIEW_NAME      VARCHAR(128),
    IN V_VIEW_SCHEMA    VARCHAR(128),
    IN V_RANGE_START    VARCHAR(2000),
    IN V_RANGE_END      VARCHAR(2000),
    IN V_RECS_TO_DEL    INTEGER,
    IN V_PARTS_TO_KEEP  INTEGER,
    IN V_DELETE_TABLES  INTEGER,
    OUT V_RC            INTEGER,
    OUT V_ERROR_TEXT    VARCHAR(32672) )

This is the most complex procedure and it is used to remove partitions from the partitioned table.

This procedure can work in 3 modes:
1) Remove partitions by a range a of values of the partitioning key. Passing values X and Y will cause all partitions whose range is between X and Y to be removed. For example, if we have a table with the following partitions:
P1: 1    100
P2: 100 200
P3: 200 300
P4: 300 400
P5: 400 500
P6: 500 600
P7: 600 700

and you request to remove partitions in the range of 120 to 550, the partitions that will be removed are P3,P4,P5.

2) Remove partitions by a number of records to delete. In this mode you request to delete the oldest X records from the partitioned table. The procedure will check how many records exists in each partition according to the results of the latest runstats that was executed, and delete partitions from the oldest to the newest as long as the total number of deleted records does not exceed the number of records you specified. The procedure will stop delete partitions once it has deleted enough records, or when it reaches a partition that never had runstats executed against it.

For example: assume the following is the number of records in the table according to runstats:
P1 - 1000
P2 - 1000
P3 - 1000
P4 - 1000
P5 - ? (runstats was never executed)
P6 - 1000
P7 - 1000

When you request to delete 3500 rows, the procedure will delete partitions P1,P2,P3.
If you request to delete 6500 rows, the procedure will delete partitions P1,P2,P3,P4 (the search will stop at P5 since it has no statistics)

NOTE - the procedure assumes that the statistics are correct, so if in fact the partitions contain more rows than what appears in the card column of SYSCAT.TABLES, more rows might be deleted than excepted.

3) Remove partitions by keeping the X most recent partitions. When working in this mode, the procedure will remove all partitions except the most recent X partitions.


The logic of the procedure gives precedence to the methods according to the order that was specified, so, if it can, it will use method 1, if not, it will try method 2, if it can't use method 2 it will use method 3.

Parameters:
 
V_VIEW_NAME and V_VIEW_SCHEMA specify the name of the partitioned view from which we want to remove partitions.

V_RANGE_START and V_RANGE_END specify the range of values we wish to remove from the partitioned table (using method 1). If either one of these variables is null, then method 1 can not be used.
NOTE - the value of V_RANGE_END will never be removed from the table. If that value falls in the middle of the range of some partition, that partition will not be removed in order to preserve the greater values. If that value is the exact END RANGE of an existing partition, then that partition will be removed but since when we defined the partitions, we created the check constraint as - partition_columns >= range_start and partition_column < range_end, again, the actual value of V_RANGE_END will not be deleted.

V_RECS_TO_DEL - number of records to delete (method 2). If method 1 can not be used and this value is not null, the method 2 will be used.

V_PARTS_TO_KEEP - number of partitions to keep (method 3). If method 1 and 2 can not be used this method will be used, only if the value of this parameter is not null and larger than 1.
NOTE - the value of V_PARTS_TO_KEEP will always be respected, that is, if you wish to delete a range of partitions, and that range would have caused less than V_PARTS_TO_KEEP partition to remain in the partitioned view, than less partitions will be deleted so V_PARTS_TO_KEEP partitions will remain in the table. For example, If you have partititons
P1, P2, P3, P4, P5, P6, P7, P8, P9, P10 and you provided a delete by range that would cause partitions P2,P3,P4,P5,P6,P7, but V_PARTS_TO_KEEP = 6, then only partitions P2,P3,P4 will be deleted.
If V_PARTS_TO_KEEP is null, then 1 is used instead.

V_DELETE_TABLES - determines if the partitions will be physically deleted or just marked as inactive.
If V_DELETE_TABLES = 1 then the actual tables of the removed partitions will be dropped and the records of these partitions will be deleted from the table T_CONTROL_PARTITIONS.
If V_DELETE_TABLES <> 1 (any other value) then the partitions are not physically dropped, they are just marked with IS_ACTIVE=0 in the control table.

V_RC and V_ERROR_TEXT are status variables that the procedure returns. If V_RC = 0 and V_ERROR_TEXT = 'NO ERROR' it means everything went OK.
Possible Errors:
V_RC = -4, V_ERROR_TEXT = 'SP_REMOVE_PARTITION => NO PARTITIONS WERE FOUND IN THE SPECIFIED RANGE' - means that no partitions were found according to the delete specifications that were passed to the procedure (this is returned from method 1).

V_RC = -8, V_ERROR_TEXT = 'SP_REMOVE_PARTITION => VIEW DOES NOT EXISTS OR HAS 0 PARTITIONS' - means that either the given view does not exists, or, this view has no active partitions (this is returned from method 2 or 3).

V_RC = -8, V_ERROR_TEXT = 'THE RANGE PROVIDED TO SP_REMOVE_PARTITION IS INVALID' - the procedure could not use any method to perform the remove.



General Considerations:

Range Values - The responsibility of assigning the range values to each partition is in the hands of the user. Unlike regular partitioning where you only specify the upper limit of each partition, our implementation provides the user with the ability yo specify where each partition begins and ends. This means, that unless the user is careful, gaps in the value range may be created. If such a gap is created, a record who's partition column value falls inside this gap can not be inserted into the partitioned table.

Archiving - the mechanism supports easy archiving of old partitions. If you want to archive a partition, you can remove it from the view and signal the procedure not to physically delete it. You can also do it manually (bypass the sp_remove_partitions stored procedure) - just update IS_ACTIVE to 0 for the partitions you want to archive / remove and then call the procedure SP_RECREATE_PVIEW.
If you choose not to physically delete the partitions you remove, you will be able to restore them into the partitioned table at any point by setting their IS_ACTIVE value to 1 and regenerate the view. 
This flexibility provides you with a way to swap in and swap out partitions as you wish.

Indexing - as opposed to a partitioned table, this implementation provides the flexibility of having different indexes on each partition. So, for example, if you know that older partitions are queried differently than newer ones, you can adjust the indexes that are defined on each partition accordingly (this will have to be done manually for now).

Migration - migrating an existing table to the new mechanism. Migration can be done in two ways:
1) create the partitioned view and copy all the information from the old table to the new partitioned view. This might take some time if the table is very big, but once this is over, you can use sp_remove_partition and sp_add_partition as you wish to, and you will be able to do it rather automatically:

* rename table xxx to xxx_old

* create the partitioned view and all relevant partitions

* insert into xxx

   select * from xxx_old

  (you should do this using batches so you won't cause the transaction log to run out of space).
* drop table xxx_old

2) rename the existing table to xxx_0000000001 and use it as the first partition (you will need to update the control tables manually with the range of values).
This can be done using this script:

rename table yyy.xxx to xxx_0000000001;

INSERT INTO T_CONTROL_PARTITIONED_TABLES
VALUES ('xxx', 'yyy', 'xxx_MODEL', 'yyy', 'xxx', 'yyy', 'part_col')

create yyy.xxx_model and all indexes

INSERT INTO T_CONTROL_PARTITIONS VALUES ( 'yyy', 'xxx', 1, min_value_of_part_col, max_val_for_first_partition, 1)

add more partitions as you need

This migration is faster, but the use of sp_remove_partition is limited until you want to remove the entire first partition.


CompressionCurrently the partitions that are created are not compress. If you wish to turn on compression you can either add a parameter to the stored procedure that adds partitions (for maximum control over which partition is compressed and which is not), or you can hard code it inside the create table statement in this procedure.
Either way, since the partition is created empty, compression will not be optimal.
It might be better

Identity Columns - this solution does not support identity columns in the tables that participate in the partitioned view. I don't understand why DB2 does not support identity columns in the underlying tables, but that is the way it is.


That is it for now,
I hope you will find this usefull, and as I previously asked, please share your changes with me.

Cheers
Momi