Datagenerator Fix

I’ve updated datagenerator to fix a few bugs, improve the perfromance and give a little more feedback when running from the commandline. You can find it in the usual place.

Broken clusteroverview in later builds

It appears that I’ve broken some of the functionality in clusteroverview in the latest builds... In particular the scalability portion and the reporting of CPU.

I’ll fix this and get a new build out ASAP.



Flash version of the screencast now available

I’ve just upload a flash version of the walkthrough screencast. Hope this helps the Linux users.

New builds of swingbench and datagenerator

I’ve uploaded new builds of swingbench and datagenerator to fix a few niggly bugs. You can download them from here

New Swingbench Screencast

Ive added a new swingbench screencast which is a complete walkthrough from the installation of swingbench to the running of a benchmark. I’ve also provided a commentary. So if you ever wondered what I sound like try it out here

Minor Change to the look and feel of the web site

I’ve taken the oppertunity to update the look and feel of the web site. Hopefully its made it a little cleaner and easier to navigate. Let me know if you think I should go back to the old style. I’ll slowly update some of the content too.

Large scale data sets for "SOE" and "SH"

I’ve added some new pages describing how to build large scale “SOE” and “SH” schemas. I’ve tested them both to 500GB in size and will create larger scale versions as soon as I can borrow hardware to test them at multi terabyte levels. If you’re interested in giving it ago let me know via the comments page and I can try and assist. You can find the the instructions on how to do create 100GB+ schemas for “SOE” here and the how to create 100GB+ “SH” schemas here

Update : I noticed that the scripts for SH use compression by default. This can slow down a load. and make the expected upload much smaller. I’ll upload a new build shortly that fixes this and afew other issues.

Datagenerator 0.4

I've uploaded a new build of datagenerator. New features include

  • Support for indexes and sequences
  • New command line options
  • Better multi threading support
  • New scaleable data builds
  • Number generators can reference row counts from other tables
  • Better database performance
  • Ability to generate only the DDL of a schema
  • Numerous bug fixes

The new build can be downloaded here

I’ve also updated the install, and added some additional walk throughs (in the swingbench section)

Lets go through some of the new features in a little more detail.

Indexes and Sequences

You can now include indexes and sequences inside of a datagenerator definition

This makes it easy to build an entire schema for a benchmark run removing the need to run additional scripts afterwards. Currently I don't support their reverse engineering but that will come.

Better multithreading support

Previously it was possible to specify multiple threads for a datageneration run but each table was allocated a single thread. In this version a user can soft partition a table and hence break the build into smaller units which can each have a thread allocated to them.

This means that if you have a 32 CPU server you'll be able to build a 10 billion row table much faster if you soft partition the table into 32 units and allocate 32 threads for the build. The partition key can be either a date or number. This is also useful to avoid resource contention when inserting data into a partitioned table.

New command line options

Its now possible to run the entire data generation to file or database from the command line. These include
[dgiles@macbook bin]$ ./datagenerator -h
usage: parameters:
-async perform async commits
-bs batch size of inserts (defaults to 50)
-c specify config file
-cl use command line interface
-commit number of inserts between commits
-cs connectring for database insertion
-d output directory (defaults to "data")
-db write data direct to database
-ddl just generate the ddl to be used
-debug turn on debug information
-dt driver type (oci|thin)
-f write data to file
-g use graphical user interface
-h,--help print this message
-ni don't create any indexes after data creation
-nodrop don't drop tables if they exist
-p password for database insertion
-s run silent
-scale mulitiplier for default config
-tc number of generation threads (defaults to 2)
-u username for database insertion
-z compress the results file

Scaleable data builds

The config files for the soe and sh schema are now by default configured for a 1GB build. These can be scaled up by using the -scale option. To build a 100GB sh schema the following command can be used.

./datagenerator -c sh.xml -cl -scale 100

This functionality is supplemented by a new flag on a table definition.

Only tables with this flag enabled will be scaled up.

Referenceable row counts

It is now possible to use the row count of a table as the maximum value of a number generator. This is useful when scaling up/down a datageneration and maintaining data coverage and referential integrity.

As the number of rows in the referenced table increase so does the the maximum value of the data generator.

Better database performance

This build supports the use of asynchronous commits. This results in performance increases of about 10-30% when this option is enabled. I’ve also undergone several database

Generate only DDL

It is sometimes useful to only create the DDL that will used to create tables and indexes.

The files that are created can be edited and modified to include additional information such as storage definitions.

Timing groups of SQL operations

Some times I feel like I’ve missed out on a whole chunk on functionality in Oracle products. One little nugget is the “timing” function in SQL*Plus. This allows you to time groups of operations.

Obviously turning on is achieved with the “set timing on” operation. i.e

SQL > set timing on

SQL > select count(1) from all_objects;


Elapsed: 00:00:03.95


Which is great but what if want to time mulitiple operations. Use the timing function and simply give the timer a name, in this case statement timer.

SQL> timing start statement_timer
SQL> select count(1) from all_objects;


SYS@orcl > /


SQL> timing show statement_timer;
timing for: statement_timer
Elapsed: 00:00:30.85

Which times anything that went on in between the timer starting and finishing. In this case also my typing of the commands. Its a fantastic utility for timing stages in a batch job including call outs to os operations.

New build and fixes in swingbench

I've uploaded a new build of swingbench which includes a number of fixes (one that broke charbench). Apologies for this. I thought I released the fixes if the previous build. You can find it here.

As usual let me know if you find obvious bugs.

On the subject of ISCSI and udev

Ok so I admit I haven't been updating the web site recently. Im pretty much overrun with work (and I took a weeks holiday) so updates to swingbench have had to be put on hold. That said I haven't been idle. I've been setting up a RAC cluster under Oracle Virtual Server (NOTE : This is not currently supported).

The environrment is build using 3 Dell Lattitude notebooks and a 1Gb Netgear switch. I've installed one notebook with OpenFiler as the storage server and two additional notebooks with Oracle Virtual Server. (NOTE : this really isn't advised as a production implementation Winking). I've used the Oracle Enterprise Template to create the Linux Servers running Oracle11g. It was pretty straight sailing with a few notable exceptions
  • Many of the shipping templates benefit (and some require) the activation of hardware acceleration. This can be done via the BIOS and will make a big difference to performance in most circumstances.
  • UDev and ISCSI. This sadly caused me real problems and Im still not sure why. I was under the impression that the devices presented from the Openfiler (the ISCSI Target) would have consistent ID's. Now I've no reason to make this claim it just seemed like a sensible thing to do. I originally used the ENV{ID} to uniquely identify the devices and create persistent links to the disks. However subsequent reboots showed these to change. We eventually ended up using a rule similar to the folowing
ENV{ID_PATH}=="", OWNER="oracle", GROUP="dba", MODE="0640", SYMLINK+="disk/by-name/quorumdisk"
  • We also encountered some library issues (rpm), Some of the openfiler devices couldn't be seen however upgrading to the very latest initiator libraries solved the problem (At the time of writing this iscsi-initiator-utils-
Other than that the Oracle install went very smoothly and the cluster appears stable and performs well given the hardware its running on.


Minor update to datagenerator

Just a minor update to datagenerator to fix some scripts and config files. There's also a few fixes to the code.

Swingbench with AWR support

Im uploading a new build of swingbench that includes support for performing AWR snaps at the start and end of of a benchmark run.

This is a common request and automates a task that a lot of people do. I've also included a new section in the stats that are generated which lists the top 10 (11 to be exact if you include CPU) wait events for a run if you choose to collect database statistics.

I've had a lot of requests about 9i support in swingbench which I appear to have broken. Can you let me know if you still think this is important (I guess its implied by the requests I've had). I need to know whether to focus on this of finish the 2.3 doc.

I'd also appreciate if you could let me know if the wait events I report are in line with the stats you get from ARW... if they're not its down to me not AWR.


New build of swingbench 2.3 (Build 370)

Just a quick build to fix cursors not being closed in ccwizard. Still looking into the "-be" and "-bs" issue.

You can get it here

New build of swingbench 2.3

I know things have been a little slow of late in terms of the releases of code but I've just had too much on both at work and at home. The good news is that I've found time to squeeze in a new build of swingbench. This has a number of bug fixes in and the following new functionality.
  • Clusteroverview now uses the groupID attribute to determine members of a load generation group. This provides a lot more flexibility for clusteroverview to be used to test technologies such as Oracle Dataguard
  • The output from the command line option -v can now be directed to a named file with the -vo option

I've also included a walk through of how to set up clusteroverview in 2.3 with the new GroupID functionality, you can find it here.

Any problems report them through the usual channels...