Monday, May 17, 2010

Table Partitioning in DB2 Express

Hi,

It's been a while since my last post and I want to apologize, this one is about DB2 LUW ... (although everything here behaves the same on zOS if I'm not mistaken).

A customer I was helping today had a problem with one of his tables. He has a really big table that keeps getting records inserted into, and he needs to periodically delete old records (just a plain log table), and the delete operation takes forever.

Another problem is that client is using DB2 Express, so he is not able to use table partitioning to solve this problem (by deleting partitions instead of rows).
As a solution I recommended him to use a mechanism I remembered from zOS of partitioned views. A partitioned view is a view that performs a UNION ALL operation against some tables.
The special feature of a partitioned view is that the DB2 compiler and optimizer knows how to deal with it very efficiently, which means, when you insert a row to the view, DB2 will know into which table to insert the row and when you select rows from the view, DB2 will access only the relevant base tables, and not all of them (just like partition pruning).

How does this magic work? when you create the partitioned view, you base it on tables with check constraint. The check constraints of the underlying tables must not overlap because these check constraints tells DB2 to which table every row belongs.

Let's look at an example:

create table t1 (id int, rtype int check(rtype=1));
create table t2 (id int, rtype int check(rtype=2));

create view v as
select * from t1 where rtype=1
union all
select * from t2 where rtype=2;

-- basically you don't have to code the where clauses in the view definition itself, but it is possible that it provides the optimizer with more information for run time optimizations so I would leave it there

insert into v values (1, 1);
insert into v values (1, 2);
insert into v values (2, 1);
insert into v values (2, 2);

-- at this point, the rows were inserted into the appropriate tables

select * from v;
select * from t1;
select * from t2;

-- now lets make sure that DB2 access only the relevant tables for select processing:
create index idx1_t1 on t1 (rtype);
create index idx2_t2 on t2 (rtype);

-- the following query will show us that the indexes were not used yet
select INDEX_SCANS from table (mon_get_index('','T1',-2));
select INDEX_SCANS from table (mon_get_index('','T2',-2));

-- now we issue the select
select rtype from v where rtype=1;

-- running the above queries against mon_get_index will show that the index of T1 was scanned once

select rtype from v;

-- running the above queries against mon_get_index will show that the index of T1 was accessed again and that the index of T2 was accessed for the first time


There are many advantages to this design, for example:
1) You can change underlying table design without changing the application as long as you can make sure the view reflects the same results
2) Each underlying table can belong to a different tablespace with different characteristics on different storage
3) Each table can have different indexes
4) Removing / Adding bulks of data is very easy by simply changing the view to include / exclude new / old tables

etc...

I think this feature is amazing and it can be used as a replacement for table partitioning, especially in the express edition which does not support table partitioning.

I have also found a very interesting paper on this feature that was written way back at 2002 but I still recommend reading it, if you intend to use this feature, or want to learn more about it.
http://www.ibm.com/developerworks/data/library/techarticle/0202zuzarte/0202zuzarte.pdf

At first I though to title this post as - A poor man's partitioning, but now I think this feature is too cool to be named like that

I talked to much already,
Have fun with this feature and take care

Momi

1 comment:

  1. Funny approach.
    - Table partitioning was introduced to formalize/replace this (old & proven) technique.
    - When your tables are that big then Express-C is no longer the ideal version, I'd say.

    It is too bad that this is not implemented so you cannot educate yourself with this feature, I agree on that.

    Gr, Dick

    ReplyDelete