New build of swingbench 2.4

I’ve just uploaded a new build of swingbench with the following fixes

  • Wizard now writes output to debug.log correctly with the -debugFile option
  • Fixed bug where partitioning was always installed regardless. This meant the wizards couldn't install against a standard edition database
  • Fixed a bug where swingbench wouldn't start unless all users were able to log on. Swingbench will now start when it has logged on as many users as it can.
  • Fixed a bug where users were incorrectly counted as being logged on
  • Fixed a bug where logging wasn't correctly written to a debug file
  • Changed -debugFile to -debugf to provide consistency with other tools
  • Changed to way users logged on is reported in verbose mode of charbench 
  • Updated the secure shell libraries to support Solaris 11

You can find it
here. Let me know if you find any problems.
Comments

2.4 makes it to stable status

At last I feel that I’ve fixed enough bugs and had enough feed back to change 2.4 to stable status. This means that this should be the default client most groups use. Based on the feedback I get I’ll remove 2.3 from the downloads page.

There has been a few changes in this release. These include
  • Improvements to the Overviewchart to provide dynamic YXAxis and floating legends
  • The overview chart now uses a 5 second rolling window for response time metrics as opposed to a meaningless average
  • “Minibench” has been made a little bigger Happy
  • Numerous fixes to the wizards
  • Numerous fixes to Universal Connection Pooling
  • New command line options

The new Overview chart is show below


As per usual you can download it here
Comments

New Build of swingbench 2.4

It’s been 2 or 3 months since the last drop of swingbench and so it will come as no surprise that I’ve released another build. This release features
  • Fix to enable the jdbc version of the SOE benchmark to work without errors. I’ll be updating it over the coming week to try and make it as close as possible to the PL/SQL version
  • A new parameter (and UI change) to support the disconnect/connection of sessions after a fixed number of transactions
  • Various fixes

Comments

New build of swingbench 2.4

I’ve just uploaded a new build of swingbench 2.4 it has a number of bug fixes which sort out some of the following issues
  • Unexpected termination due to driver issues
  • Unclear timeline text on the overview chart
  • Numerous UI and java issues

You can download it
here
Let me know if you have any problems via the
comment page
Comments

Notes on pre-parsing data for Oracle data loads

Sometimes data simply isn't in a form that is easy to load into an Oracle database i.e. column form. It would be great if everybody exchanged data in a simple CSV form with a single file to table mapping. Sadly that isn't the case and sometimes you have to do a little work to get it into a form thats useable. A recent benchmark highlighted this issue very well. The customer provided the data in compressed CSV form (so far so good) but the data was held in key value pairs (not so good). They also provided us with a mapping file that describes how it all fits together. 

Now typically the approach many people would take would be to develop some form of program that parses all of the data and writes it to staging  area and then loads all of it in one go to the target database. I make no criticism of this approach since it works well and as long as its not time critical. It's by far the simplest method. However Im a big fan of taking advantage of whats already available and one of the most underused and powerful features of the Oracle database is  the preparser. It enables you to pipeline various operations so they all run as quickly as possible. So going back to my benchmark we used this approach to load data into out target database. It consisted of 4 steps
  • Read the data of the filesystem as efficiently as possible and write it to stdout
  • Read from stdin and Unzip the the data writing it to stdout
  • Read from stdin into a java program to do the key value mapping and error detection/correction writing the output to stdout
  • Read from stdin into sqlloader
I will at this time point out I'm not really using Oracle's pre-parser I'm just using good old "Pipes" but why this is important will become clearer later This approach gave us a great deal of flexibility and simplified the code we had to write. It operates in some respects as a serialised map reduce flow but I'll come back to that another day and explain how it can be integrated directly into a massively parallel approach. It's also possible to get Java to natively read the zipped file as well having said that I offloaded that process to the os to enable me to use different compression formats when needed.

The java program simple reads from stdin and writes to stdout. To handle key value pairs just required the program to read the mapping file in and split and parse the values from stdin. The data was then written to stdout in a well know order.

Java extract from my program... 


            BufferedReader br = new BufferedReader(new InputStreamReader(System.in), OneMB);
            String line = null;
            HashMap keyValuePairs = null;
            MyTokenizer mt = null;
            while ((line = br.readLine()) != null) {
                keyValuePairs = new HashMap(200);
                mt = new MyTokenizer(line, delimitor);
                for (String token : mt) {
                    int loc = token.indexOf("=");
                    if (loc != -1) {
                        String i = token.substring(0, loc);
                        String s = token.substring(loc + 1, token.length());
                        keyValuePairs.put(i, s);
                    }
                StringBuffer outRec = new StringBuffer(1000);
                outRec.append(checkForNull(keyValuePairs.get("uniqueID"), "")).append(seperator);
                // mapping logic similar to above repeats
                System.out.println(outRec.toString());
            }

All that was needed for sqlloader to process the files was a control file that understood the order of the columns and any additional formatting. 

One of the additional benefits is that we can load the data via "direct path" and implement other features such as multi table insert. The Java preparser enables you to add all of the additional formatting to make this a trivial process.

The following diagram illustrates the process.

This equates into a Unix/Linux statement such as

/bin/dd if=myverybigfile.txt bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn | sqlloader bench/bench control=kv.ctl data=\"-\" direct=TRUE;

NOTE : one thing you may have noticed is that Im using dd to do 1MB I/Os. This just an efficiency operation and works well on structures such as DBFS, you could skip this part of the operation if needed.

Which brings us onto external tables and the preparser


External Tables and pre-parsers 

As I mentioned earlier I like to take advantage of functionality that's already available and one of those features in the Oracle database is external tables. I don't intend to go into much detail as to why you should use external tables other than they do much of the heavy lifting for you and they provide a seamless interface between the filesystem and the database. They effectively make files look like tables. 

This means it's trivial to implement parallelism for our pre-parser. We don't need to worry about how to handle the files and how to schedule everything, external tables take care of all of that for you. In our benchmark we used them in the following way




Our previous pipeline remains the same except that we don't need sqlloader its all managed by the table definition itself. So we end up with something similar to the following for the table definition (I've abbreviated it quite substantially and highlighted the important bits)

create table staging_ext_mydata_jan01
(    uniqueid NUMBER,
..
-- Lots of columns
..
 )
ORGANIZATION EXTERNAL
(
   TYPE oracle_loader
   DEFAULT DIRECTORY load_dir_jan01
   ACCESS PARAMETERS
   (
      RECORDS DELIMITED BY NEWLINE
      PREPROCESSOR exec_dir:'external_tab.sh'
      BADFILE log_dir: 'external.bad'
      LOGFILE log_dir: 'external.log'
fields terminated by '|'
OPTIONALLY ENCLOSED BY '"' AND '"'
   ( uniqueid char(100),
..
-- Lots of defintions
..
)
)
LOCATION('data1.txt',data2.txt'...'data100.txt'))
REJECT LIMIT UNLIMITED;

One of the things to note is that I've included the pipelined preprocessor inside of a shell script which looks like this 

/bin/dd if=$1 bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn

The important part of this script is the parameter ($1) that is passed to the shell script. This is the file name that the external table wants to process. 

A simple select statement from my "staging_ext_mydata_jan01" unzips and parses the data converting it to usable columns. Whats more if I issue the statement in parallel Oracle takes care of creating the processes for me and making sure everything is scheduled in an orderly fashion.

To finish the load we simply used a multi table insert to put the data into the correct tables in an efficient fashion. Using this approach we were able to read zipped files, parse them and insert them into our three target tables at over 1.5 million source records/sec.
Comments