Monday, October 26, 2009

Load Resume Shrlevel Change Behavior - The Results

Hi

As promised, here are the answers to the questions from the previous post.
For my experiment I have created the following tables:

CREATE TABLE MOMI.MOMITEST (
ID INT GENERATED ALWAYS AS IDENTITY (START WITH 1 CACHE 10000 ),
TMSTAMP TIMESTAMP NOT NULL WITH DEFAULT ,
-- DMY1 CHAR(250) NOT NULL WITH DEFAULT 'MOMI',
-- DMY2 CHAR(250) NOT NULL WITH DEFAULT 'MOMI',
-- DMY3 CHAR(250) NOT NULL WITH DEFAULT 'MOMI',
-- DMY4 CHAR(200) NOT NULL WITH DEFAULT 'MOMI',
DMY CHAR(36) NOT NULL
) IN DB.MOMI;


CREATE TABLE MOMI.LOAD_MAX_ID (
ID INT GENERATED ALWAYS AS IDENTITY (START WITH 1 CACHE 10000 ),
REC_TMSTAMP TIMESTAMP NOT NULL WITH DEFAULT,
MAX_ID INT NOT NULL
) IN DB.MOM4;

Load was performed against the momitest table and only the dmy columns were loaded (id and timestamp generated automatically by DB2)

1) Which part of the data is available for concurrent access? only the data that existed in the table prior to the beginning of the utility, or the data that was actually being loaded?

The data that is being loaded is available for update. As the utility progresses, more data is available for update. I executed a REXX script that performed the following update statement in a loop against the table that is being loaded (momi.momitest):
UPDATE MOMI.MOMITEST
SET TMSTAMP = CURRENT TIMESTAMP,
DMY = 'UPDATED'
WHERE MOD(ID,5120) = 0
AND DMY <> 'UPDATED'

The results showed clearly that the update statement had executed several times, updating different records. Also, records that were loaded early on, were updated before the later records were loaded (that is, the timestamp generated by the update statement was earlier than the one generated as the default value). The conclusion is that data is available for update as it is being loaded. The LOAD utility takes internal commits which makes the data available (the data is not available immediately, my REXX did wait for locks).

2) Is the loaded data available for read only or also for updates?
that question was also answered by the first answer

3) Does DB2 perform internal commits to make the data available? if so, what is the frequency of these commits and how is the frequency calculated?

DB2 performs internal commits in order to make the data available.
Test description: A REXX scripts runs a loop that performs

INSERT INTO MOMI.LOAD_MAX_ID (MAX_ID)
SELECT MAX(ID) FROM SDMOMI.MOMITEST

while the table MOMITEST is being loaded. This statement will provide me with the maximum committed ID at each point in time (the table has a timestamp column with default value of current timestamp).

(actually, when I first designed the test, I thought I would record current timestamp against the MAX of the timestamps that are generated in the loaded table, but then I faced a problem with the values generated by current timestamp - see Timestamp post for more details)

Test Results: (the test was executed several times for different row sizes and different number of rows to load)
Record Size - table record size in bytes
Num Of Row - number of rows in load input file
Cpu / Elp time - Cpu / Elapsed time to run the LOAD utility
Check Interval - number of seconds to sleep in the REXX script loop
Min / Avg / Max Rows - Minimum, Average and Maximum amount of rows between two records in the control table (or in other words - the number of records that were committed by the load utility)
Min / Avg / Max tm diff - Different between two consecutive records insert timestamps

For Row Level Locking with MaxLocks 1000

Record Size
Num Of Row
Cpu Time
Elp Time
Check Interval
Min Rows
Avg Rows
Max Rows
Min Tm Diff
Avg tm diff
Max tm diff
50
2500
0.16
0.42
0
2500
2500
2500
0.00359
0.00359
0.00359
50
25K
0.66
1.22
0
4520
5120
10240
0.00162
0.16598
0.36
50
250K
5.57
11.53
1
15360
26738
30720
50
250K
5.53
11.5
0
4240
5120
5120
0.00179
0.22088
0.54724
50
500K
11.16
22.99
1
18720
23810
30720
1.0032
1.0931
1.2434
50
500K
11
21.89
0
3360
5120
10240
0.00227
0.21512
0.4225
50
2.5M
54.72
01:55.37
0
1440
5120
10240
0.00151
0.22882
0.64663
150
2.5M
53.87
01:49.77
0
1440
5120
10240
0.00111
0.21624
0.56245
300
2.5M
56.42
02:14.24
0
1440
5120
10240
0.00133
0.26629
0.65413
1000
1.9M
52.84
04:06.45
0
5120
5120
5120
0.00128
0.64057
16.39

So, the most obvious things that pops right out is - DB2 committed every 5120 records.
I also measured the average time between insert operations of the REXX script when the load utility was not running and I saw that the average wait time in seconds was
for 150 bytes rows - 0.21456 which is 99.2% of the time
for 300 bytes rows - 0.26432 which is 99.3% of the time
for 1000 bytes rows - 0.63843 which is 99.6% of the time


With Page Level locking and maxlocks 1000, these were the results:

Record Size
Num Of Row
Cpu Time
Elp Time
Check Interval
Min Rows
Avg Rows
Max Rows
Min Tm Diff
Avg tm diff
Max tm diff
50
250K
5.16
10.26
0
4240
5120
10240
0.00267
0.19626
0.36561
300
2.5M
58.06
02:49.86
0
1440
5120
10240
0.00156
0.3354
16.574
1000
2.5M
01:10.59
04:56.98
0
1440
5120
10240
0.00145
0.59305
7.236

And the insert wait time was roughly 99% also.

With Row Level locking and maxlocks 100, these were the results:

Record Size
Num Of Row
Cpu Time
Elp Time
Check Interval
Min Rows
Avg Rows
Max Rows
Min Tm Diff
Avg tm diff
Max tm diff
50
250K
7.81
34.74
0
8
59*
584
0.00118
0.00803
0.19021
300
2.5M
01:21.48
06:55.43
0
8
65**
1332
0.00115
0.01083
0.38246
1000
2.5M
01:31.14
08:31.13
0
8
90
1296
0.00123
0.01836
0.39555

* out of 4214 intervals, 2638 had a value of 36, 881 had 72, 372 had 108 and 139 had 144, so probably, 36 is the commit interval.
* out of 38090 intervals, 21459 had a value of 36, 8696 had 72, 3897 had 108 and 1717 had 144, so probably, 36 is the commit interval.

With Row Level locking and maxlocks 100000, these were the results:

Record Size
Num Of Row
Cpu Time
Elp Time
Check Interval
Min Rows
Avg Rows
Max Rows
Min Tm Diff
Avg tm diff
Max tm diff
50
250K
6.21
11.02
0
4240
5120
5120
0.00161
0.20642
0.37181
1000
2.5M
01:14.69
04:50.63
0
1440
5120
5120
0.00136
0.58
17.302


So what can we actually learn from all these numbers?
1) For default settings (page level locking with maxlocks that defaults to NUMLKTS parameter) DB2 will commit every 5120 rows, which in this test, was rather quickly even for a big table with a 1000 bytes per row and several indexes. Even when using a much higher value for maxlocks, DB2 will not change the commit frequency.
2) When constrained by maxlocks,  DB2 will commit more frequently (our test showed every 36 rows), and performance will be impacted severely.
3) It seems that the commit frequency does not depend on the amount of records we load
4) It seems that the commit frequency does not depend on the record length of the table being loaded

That sums up answer number 3.
As for question number 4:
4) Is it possible that the load utility will experience a lock escalation?

The Load utility will not cause lock escalation, no matter what is the maximum number of locks it is allowed to acquire. This conclusion is based on 3 facts:
1) No lock escalation message was issued to the DB2MSTR and no lock escalation appeared in the lock escalation counter in omegamon.
2) If lock escalation had took place, there would be no reason for the LOAD operation to take significantly longer to execute as we have seen
3) The omegamon lock counters show that when the lockmaxs parameter is set to a small value, the LOAD utility had many lock requests and many lock change request (which will be redundant if it had escalated to a table lock).

And one last nugget - the LOAD utility acquires locks at the same level the application does, so if your tablespace is defined with locksize tablespace, the LOAD utility will also acquire a S tablespace lock, which will cause any user application that tried to update the table to wait.

That was a long one, hope it answered all your questions.

Till next time
Momi

Wednesday, October 21, 2009

When Exactly is Current Timestamp ???

Well, I know I still owe some answers regarding load resume yes shrlevel change post, but this post will be about an issue I faced when conducting the load utility testing.

Assuming we have this scenario:
table1 with column named tmstamp with a default value of current timestamp
table2 with 2 columns - rec_tmstamp and max_tmstamp

Some process is constantly inserting rows into table1 (with commits)
what would you expect table2 to hold after this statement runs in a loop:
insert into table2 (Rec_tmstamp, max_tmstamp)
select current timestamp, max(tmstamp) from table1

I had expected to a see records with rec_tmstamp which is later than max_tmstamp (which will allow me to estimate the average time it takes the inserting process to commit), but, I was surprised to find out that rec_tmstamp was always earlier than max_tmstamp.

Confused by the results, I tried a different approach. I changed the column rec_tmstamp in tatble2 to have a default value of current timestamp and changed the insert statement to:
insert into table2 (max_tmstamp) select max(tmstamp) from table1

Again, no luck. I got the same confusing results where rec_tmstamp was earlier than max_tmstamp. Maybe db2 sets the default values before it gets the other values from the insert statement itself, so let's try something else.

Time for plan 3 - instead of using the default value, I will create a before trigger that will set the value of the column just before it gets inserted:

create trigger myTrigger
no cascade before insert on table2
referencing new as n
for each row mode db2sql
set n.rec_tmstamp = current timestamp;

Still, the same results.
Ok, now it's personal, I have to get to the bottom of this one,
I dropped the before trigger and created an after trigger:
create trigger myTrigger
after insert on table2
referencing new as n
for each row mode db2sql
update table2 set rec_tmstamp = current timestamp where id = n.id;

(I added an identity column named id so I can uniquely identify each row).
Once again, I received the same results.

So the answer to the question, when is current timestamp, as it turns out, current timestamp is when the statement began executing. The ultimate proof for that claim was observed in this test:
process1 - lock table table1 in exclusive mode
process2 - insert into table2 (rec_tmstamp, max_tmstamp) select current timestamp, max(tmstamp) from table1
process1 - wait for 30 seconds
process1 - insert into table1 (tmstamp) values(current timestamp)
process1 - commit

process2 had a 30 seconds lock wait due to process 1, and as I have expected, the record that was inserted to the table had a rec_tmstamp which was 30 seconds earlier than max_tmstamp.

Now I can go on with the load shrlevel change testing...

Tuesday, October 20, 2009

Load Resume Shrlevel Change Behavior

Hi

Usually, I find it very hard to complain about the DB2 manuals, since they are rather detailed, but as for the load utility, I could not find in the manuals any description of the actual behavior when using resume yes shrlevel change.

Some questions I could not find answers to in the manual were
1) Which part of the data is available for concurrent access? only the data that existed in the table prior to the beginning of the utility, or the data that was actually being loaded?
2) Is the loaded data available for read only or also for updates?
3) Does DB2 perform internal commits to make the data available? if so, what is the frequency of these commits and how is the frequency calculated?
4) Is it possible that the load utility will experience a lock escalation?

I will try to answer these questions in the upcoming posts so stay tunned

To Be Continued...

Sunday, October 18, 2009

DB2CFIMP / Configuration Assistant hangs when importing

Hi

Just a little problem with the DB2 LUW client I faced today. I needed to copy the database manager configuration from one server to another. I used DB2CFEXP in order to export the configuration at the source instance, and then tried to import it using DB2CFIMP on the target instance.

The DB2CFIMP utility just froze and I was getting no reply for several minutes. I tried to perform the same import operation using the DB2 Configuration Assistant but it couldn't do it also (giving me the processing window with the little wheels).

When I tried to do achieve the same result using catalog node command through the clp, I got an error message indicating my user is not authorized.
Once I logged in using an authorized user, I was able to perform the import using both db2cfimp and the configuration assistant (and it took a fraction of a second).

Wednesday, October 14, 2009

Compare Static SQL Access Paths

What I am going to talk about this time is one of the most critical factors (in my opinion) of application performance - the access paths that were selected for your SQL statements.

There are two types of SQL statements - dynamic and static. This post will be devoted for static SQL statements, that is, statements for which an access path is chosen at bind time.

As a part of project I did for a big bank, I was asked to review explain output for packages before they were moved from the test environment to the production environment. In that bank, every package was manually approved by a DBA before the program could go into production.

Since the test environment was not identical to the production environment in terms of computers resources and database capacity, and since I don't like to waste my time, I immediately asked them if they had any mechanism that checked that the access path chosen in production, is identical to the one in the test environment (otherwise, the time spent checking each package was useless).

Since they did not have such mechanism, I wrote one myself. The idea is pretty simple: you bind your packages with the explain(yes) parameter and then you just compare the output that was written to the plan_table of the two environments. I implemented this procedure using REXX.

As it turned out, they had hundreds of packages with different access paths (we simply copied the production plan_tabe to the test environment and compared the two using the same SQL from my script). Today, each time they move a program to the production environment, my script is executed in order to verify the access path has not changed.

My REXX code can be found here.
The code is very straightforward. The procedure receives the following parameters:
DB2SYS - DB2 system to which the REXX connects
DB2SRC - Source DB2 system
QSRC - PLAN_TABLE qualifier in source system
DB2TAR - Target DB2 system
QTAR - PLAN_TABLE qualifier in target system
PACKNAME - The package name
CIDSRC - Collection ID of source package.
CIDTAR - Collection ID of the target package.

The REXX code first connects to the source DB2 system and gets the version name and collection of the package that was bounded last. The source collection id that is passed as a parameter can use a like pattern. If no value is supplied, % is taken. If no value is provided for target collection id, the source collection id will be used.
Then, the REXX fetches all the data from the plan_table in the source DB2 system, fetches all the data from the target DB2 system, and compares the two.
Not all the plan_table columns are being compared. Just the ones I found relevant. If a difference is found then it is printed and the exit code will be 4. If no differences were found, the return code is 0. If a SQL error is encountered, the REXX will exit with code 8.

I hope you can find this REXX useful. If you have any comments or suggestions, please post them in the comments section.

Till next time…

Job generation utility / a smarter Listdef

As a DBA, sometimes I find myself in need of running jobs on a list of objects which I can only identify using a SQL statement. In these cases, the IBM Listdef utility is not sophisticated enough and a massive amount of manual labor is needed.


In order to shorten the time it takes me to prepare such jobs, I wrote a simple REXX script that receives the following parameters (description of execution logic follows):

DB2SYS - DB2 system to connect to
SKELNAME - Skeleton name to be used
NUMJOBS - Number of jobs to generate
JOBNAME - Jobname prefix

The REXX script reads a SQL statement from SQLIN ddname (it will read the entire content of the file so make sure you have a single statement with no semicolon). The SQL statement must return two columns that identify the objects on which you wish to operate.

Then the REXX script will loop through the objects, and divide them into the number of jobs specified in NUMJOBS parameter in a round robin fashion, so if for example, you wish to generate many RUNSTATS jobs, you can specify in the SQL statement in SQLIN – order by space. This way, each job will receive roughly the same amount of big objects and small objects as the other jobs, and all jobs should take roughly the same time to execute.

The SKELNAME parameter specifies a skeleton name that will be used to generate the jobs. This skeleton can contain the following parameters:

&ONAME1 – will be replaced with the first column returned from the SQL statement in SQLIN.
&ONAME2 – will be replaced with the second column returned from the SQL statement in SQLIN.
&DB2SYS – will be replaced with the DB2SYS parameter
&JOBNAME – will be replaced with the JOBNAME parameter value appended with the job number up to an eight characters long name, for example, if you passed the value ABCDEFGH, you first job will be named ABCDEFG1, the 10th job will be named ABCDEF10 etc.

It's pretty simple. The REXX code and skeleton examples and JCL examples can be found here.
If you have any comments or suggestions, please post them.

Hope this one will save you some time; it did save me some …

Wednesday, October 7, 2009

Debugging with The Dynamic Statement Cache

Hi

In DB2 version 8 one of my favorite features was introduced and that is, the explain stmt cache statement. This statement provides information regarding the content of the dynamic statement cache (which statements are cached and what is the cached access path), and if you have ifcid 318 turned on, you also receive statistics such as number of executions, cpu time etc.

Using that information I was able to discover what I believe is a bug. I had a table with a varchar(100) column which was indexed. I issued the following query:
select *
from myTable
where varchar_col like 'mumu%'

this query returns no rows, but took rather long to execute considering the fact it was doing a matching index scan (you can get the chosen access path by explaining the statement from the dynamic cache).
Using the explain output, I was able to see that this query performed about 4000 getpage requests. This figure seemed strange since the index had about 4000 leaf pages, so the only idea I could come up with is that db2 performed a non-mathcing index scan.

When I tried a different query such as
Select *
from myTable
where varchar_col like 'knut%'

the query did not return anyrows either, but it only performed 3 getpage operations (which seems reasonable since my index level was 3).

I believe this is a bug in DB2 since I can't think of any explanation to the above phenomenon. I tried a some other strings as well (that returns no rows), some did 4000 getpage operations, others did 3.

The lesson to be learned here is, get to know your dynamic statement cache table. It has information that might become handy in time of need.

Momi

Monday, October 5, 2009

Does Skip Uncommitted Inserts really work?

Hi

Recently I tried to use the new feature called Skip Uncommitted Inserts at one of my DB2 customers, so in order to activate the feature, we turned on the SKIPUNCI zparm. Unfortunately, DB2 did not behave exactly as I thought it would. Here are the results of a little experiment I've conducted, and I would like to hear your opinion - are these the results you would accept?

I created a two columns table - mytable (col1 int not null, col2 int) with row level locking and a primary key on col1.

Now, I used two sessions to run the following SQL Statements (naturally, not working with autocommit in order to hold the locks that are acquired):

Session #1:
insert into mytable values (1,1)

Session #2
insert into mytable values (2,2)

Session #1
select * from mytable with ur - returned both rows (as expected)

select * from mytable - this returned a single row with col1=1 (as expected, skipping the row that was inserted but not committed by session #2)


Session #2
select * from mytable with ur - returned both rows (as expected)

select * from mytable - this statement was blocked waiting for a lock held by Session#1. I would expect this statement to return the row with col1=2.

Session #1
commit - at this point

Session #2
select * from mytable - returned both rows (as expected)

Session #1
select * from mytable - this statement blocked waiting for a lock held by Session #2.

etc...

As you can see, it seems that the Skip Uncommitted inserts only works for the thread that performed the first insert against the table, and other threads are getting blocked, even though they shouldn't.
If you also encountered this problem, your are not alone... :-)

till next time

Momi