Monday, December 14, 2009

Copy DB2 Statistics – Modeling Production Systems

This post is somewhat related to a previous post about comparing access path selection between test and production systems.

In the previous post I talked about the problem that verifying a package has optimal access paths in the test system, will not guarantee it will have the same access paths in the production system. The post also had a link to a REXX script that compares plan_tables between DB2 systems and reports if any differences exist for a given package.

So how can one minimize the number of times such differences will appear? The best approach is to try modeling production systems in the test systems.

What does modeling production systems means? Trying to have a test system with as similar configuration as possible to the production system.

Most organizations will not spend the same amount of money on computing resources for the test system as they do on the production system for obvious reasons, but that does not mean all hope is lost. DB2 will choose access paths for queries based on its own statistics as well as the hardware resources that are available to it.

According to what I have seen mostly, the DB2 statistics have far more influence on query access path selection than the available hardware resources, thus, by duplicating the statistics from your production system to your test system, you can increase the probability that DB2 will choose the same access path in both systems. This method is very efficient since you do not have to actually copy the data from the production system to the test system, you only copy the statistics.

So what can we copy?

In the Utility Guide and Reference manual, under the RUNSTATS utility documentation you can find a list of columns in the catalog which are used by the optimizer for access path selection. Most of these columns are updateable by SQL statements so there is no problem to update them manually.

I have written a REXX script (DB2CPSTS) which receives the following parameters and produces the appropriate statements that are required to copy the statistics from one system to another:

DB2SYS – DB2 system where the REXX will execute
DB2SRC - DB2 system from which we get the statistics
DB2CPY – Name of database for which we want to copy statistics (like pattern)
TS2CPY – Name of tablespace for which we want to copy statistics (like pattern)

There are some additional parameters which are not implemented yet.

The REXX also access the following DD files which have to be defined in the JCL:
OUTSQL – this file will eventually contain the SQL statements
SCMAP – contains mapping of schema / creator names
OBMAP – contains mapping of object names
DBMAP – contains mapping of database names
SPMAP – contains mapping of tablespaces / indexspaces names

The purpose of the mapping files is to provide a way of altering the objects names between systems, so for example, if your schema contains the letter P in the production system and the letter T in the test system, you can translate the names in order to make sure the appropriate objects are updated. Also, if you have several copies of the same schema in a single system, you can also use this script in order to copy statistics between these schemas.

The format of the mapping rules are:
Source_Pattern Target_Pattern

Each file can contain several mapping rules which are evaluated by their order. The first rule that will match will be used and later rules will not be checked, so for example, having these rules:

Will translate A to B and stop.

The rules

Both patterns can use % and _ as used with the like predicate.
Not all patterns are legal, I implemented the translation according to what I find logical. In my algorithm translation is done in the following way:
A single non special character must match a single non-special character
The _ character must match either a regular character or a _ character
The % character must match the % character
If a special character is used, the number of characters up to that character must be identical.

Here are some examples for mapping rules that work:
%M %D
___I% DUPI%
___I% ___B%
SYSTABLES SYSTEMTABS (static translation)

Names that don't match any rule will not be translated.

In addition to this REXX script, I wrote another script (DB2PCKCS) which receives these parameters:
DB2SYS – DB2 system where the REXX will execute
DB2SRC - DB2 system from which we get the statistics
COLLID – collection id of a package
PACK – package name

The REXX also need to have the same DD cards as the DB2CPSTS script. This REXX script will invoke the previous one for all the tablespaces that contains tables upon which the specified package depends (according to SYSIBM.SYSPACKDEP), so if you want to copy all the statistics that are relevant for a specific package, the second REXX will do the trick for you).

I also uploaded a sample job (CPSTSJOB) that executes DB2CPSTS.

You should note that even after you copy the statistics from your production system to your test system, you might get different access paths since access paths also depends on:

• The Processor model
• Number of processors (effects parallelism)
• Buffer pool sizes
• RID pool sizes
• Services levels at each system
• Values of some system parameters such as PARAMDEG, NPGTHRSH, STARJOIN
• Distribution statistics in SYSCOLDIST

Some future improvements that might be implemented into this script and thing to keep in mind:

• The script was tested on DB2 V8 compatibility mode. V9 has new types of statistics such as histograms that are not included in this script.
• The script does not generate commit statements so if a very large number of statements were generated, you might create a long contention on the DB2 catalog when running these statements.
• Column distribution statistics are inserted into SYSCOLDIST and not updated, you might need to delete any previous rows from this table.
• The scripts update both access path statistics and space related statistics. In the future, maybe I will implement a parameter that will govern this behavior.
• In the future, maybe a parameter will be added that will indicate if the SQL statements should be written to file or executed immediately.

Any thoughts / ideas are more than welcomed.
Hope you find this useful.

Take care


  1. Stumbled here from google, am an LUW guy, so my assumption is bound to be false.Isn't there a -m option - mimic mode in runstats that generates the update statements that could be run in test to get the correct statistics of prod in test???

  2. The mimic function is available with db2look in LUW but there is no such feature in zOS.
    Also, my script provides the option for changing objects names if you have different names in prod and test, or if you want to have several copies of the same schema within the same system. As far as I know, the db2look command does not provide such capability

  3. My Previous Employer had BMC tool, so using which i had easily migrated the catalog statistics, but present employer doesn't have any universal tool for statistics migration, so your Post is a great break through and really useful. Thanks for the REXX program.