Setting up Swingbench for Oracle Autonomous Data Warehousing (ADW)

The following blog details how to install swingbench against ADW. It’s very similar in structure to my previous blog for ATP with the major difference being the benchmark we are using. The example below will use the TPC-DS Like Benchmark but the approach and parameters apply to the SH benchmark as well. To install swingbench on ADW you’ll need to run through the following steps (Step 7 is optional).

Step 1/ Make Sure you have a SSH Public key


You are likely to already have a ssh key but it is possible that you want to create another purely for this exercise. You’ll need this key to create your application server. You can find details on how to do this here

https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key

It’s the .pub file or more precisely its contents that you’ll need. The public key file is typically created in the hidden .ssh directory in your home directory. The public key will look something like this (modified)

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfO/80wleUCYxY7Ws8c67PmqL2qRUfpdPGOduHmy9xT9HkCzjoZHHIk1Zx1VpFtQQM+RwJzArZQHXrMnvefleH20AvtbT9bo2cIIZ8446DX0hHPGaGYaJNn6mCeLi/wXW0+mJmKc2xIdasnH8Q686zmv72IZ9UzD12o+nns2FgCwfleQfyVIacjfi+dy4DB8znpb4KU5rKJi5Zl004pd1uSrRtlDKR9OGILvakyf87CnAP/T8ITSMy0HWpqc8dPHJq74S5jeQn/TxrZ6TGVA+xGLzLHN4fLCOGY20gH7w3rqNTqFuUIWuIf4OFdyZoFBQyh1GWMOaKjplUonBmeZlV

You’ll need this in step 3.

Step 2/ Create the ADW Instance


You’ll have to have gone through the process of acquiring an Oracle Cloud account but that’s beyond the scope of this walkthrough. Once you have the account and have logged into Oracle Cloud Infrastructure, click on the menu button in the top left of the screen and select “Autonomous Data Warehouse”. Then simply follow these steps.

Napkin 28-08-18, 1.38.46 pm

Step 3/ Create a compute resource for the application server


Whilst the ADW instance is creating we can create our application to run swingbench. For any reasonable load to be run against the application server you’ll need a minimum of two cores for larger workloads you may need a bigger application or potentially a small cluster of them.

In this walkthrough we’ll create a small 2 core Linux Server VM.

Iaas Creation 11-08-18, 9.48.29 am

This should only take a couple of minutes. On completion we’ll need to use the public IP address of the application server we created in the previous step.

Step 4/ Log onto application server and setup the environment


In this step we’ll use ssh to log onto the application server and setup the environment to run swingbench. ssh is available on MacOS and Linux. On platforms like Windows you can use Putty. You’ll need the IP address of the application server you created in the previous step.

First bring up a terminal on Linux/Mac. On Putty launch a new ssh session. The username will be “opc”

ssh opc@< IP Address of Appserver >

You should see something similar to

ssh opc@129.146.65.101
ECDSA key fingerprint is SHA256:kNbpKWL3M1wB6PUFy2GOl+JmaTIxLQiggMzn6vl2qK1tM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '129.146.65.101' (ECDSA) to the list of known hosts.
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa':


By default java isn’t installed on this VM so we’ll need to install it via yum. We’ll need to update yum first

$> sudo yum makecache fast

Then we can install java and its dependencies

$> sudo yum install java-1.8.0-openjdk-headless.x86_64

We should now make sure that swingbench works correctly

$> java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

We can now pull the swingbench code from the website

$> curl http://www.dominicgiles.com/swingbench/swingbenchlatest.zip -o swingbench.zip

and unzip it

$> unzip swingbench.zip

Step 5/ Download the credentials file


The next step is to get the credentials file for ATP. You can do this by following these steps.

Napkin 2 28-08-18, 2.25.22 pm

You’ll need to upload this to our application server with a command similar to

scp wallet_DOMADW.zip opc@< IP address of App Server >:

This will place our credentials file in the home directory of the application server. You don’t need to unzip it to use it.

Step 6/ Install a workload schema into the ADW instance

We can now install a schema to run our queries against. We do this by first changing to the swingbench bin directory

cd swingbench/bin

And then running the following command replacing your passwords with those that you specified during the creation of the ADW instance.

A quick explanation of the parameters we are using
  • -cf tells tpcdswizard the location of the credentials file
  • -cs is the connecting for the service of the ADW instance. It is based on the name of the instance and is of the form followed by one of the following _low, _medium or _high
  • -ts is the name of the table space to install swingbench into. It is currently always “DATA”
  • -its is the name of the index table space to install swingbench into. It is currently always “DATA”
  • -dba is the admin user, currently this is always admin
  • -dbap is the password you specified at the creation of the ADW instance
  • -u is the name you want to give to the user you are installing swingbench into (I used tpcds)
  • -p is the password for the user. It needs to follow the password complexity rules of ATP
  • -async_off you need to disable the wizards default behaviour of using async commits. This is currently prohibited on ADW
  • -scale indicates the size of the schema you want to create where a scale of 10 will generate 10GB of data. The indexes will take an additional amount of space roughly half the size of the data.
  • -create tells swingbench to create the schema (-drop will delete the schema)
  • -cl tells swingbech to run in character mode
  • -v tells swingbench to output whats going on (verbose mode)

You should see the following output. A scale of 10 should take just over 25 mins to create depending on how many CPUs you allocated to your compute and ADW instances.

SwingBench Wizard
Author : Dominic Giles
Version : 2.6.0.1086

Running in Lights Out Mode using config file : ../wizardconfigs/tpcdswizard.xml
Connecting to : jdbcGaspracle:thin:@domadw_high
Connected
Starting run
Starting script ../sql/tpcds_droptables.sql
Script completed in 0 hour(s) 0 minute(s) 6 second(s) 594 millisecond(s)
Starting script ../sql/tpcds_createtables.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 740 millisecond(s)
Inserting data into table STORE_SALES_21603029
Inserting data into table STORE_SALES_2
Inserting data into table STORE_SALES_14402020
Inserting data into table STORE_SALES_7201011
Inserting data into table CATALOG_SALES_10811609
Inserting data into table CATALOG_SALES_2
Inserting data into table CATALOG_SALES_3603871
Inserting data into table CATALOG_SALES_7207740
Inserting data into table WEB_SALES_5395379
Inserting data into table WEB_SALES_2
Inserting data into table STORE_RETURNS_179999
Inserting data into table WEB_SALES_3596920
Inserting data into table WEB_SALES_1798461
Inserting data into table INVENTORY_2452180
Inserting data into table INVENTORY_2451725
. . .
. . .
Connection cache closed
Starting script ../sql/tpcds_analyzeschema.sql
Script completed in 0 hour(s) 1 minute(s) 25 second(s) 439 millisecond(s)
Starting script ../sql/tpcds_constraints.sql
Script completed in 0 hour(s) 11 minute(s) 25 second(s) 427 millisecond(s)
Starting script ../sql/tpcds_transactions_pkg.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 675 millisecond(s)

============================================
| Datagenerator Run Stats |
============================================
Connection Time 0:00:00.003
Data Generation Time 0:12:58.906
DDL Creation Time 0:12:58.965
Total Run Time 0:25:57.876
Rows Inserted per sec 135,690
Data Generated (MB) per sec 10.6
Actual Rows Generated 105,530,092
Commits Completed 5,366
Batch Updates Completed 528,519


It may report that the schema was created unsuccessfully because we asked it to not create any indexes on the schema. Just ignore this and carry on.

To be sure it installed correctly you can validate the schema using the following command

./sbutil -tpcds -cf ~/wallet_DOMADW.zip -cs domadw_high -u tpcds -p < Your user password > -val

You should see something similar to

Screenshot of RapidWeaver (06-09-2018, 20-13-48)

It should report that you are missing a single index INVENTORY_NUK. This is expected a a fix in the future will take into consideration that we asked for indexes (except those used for constraints) no to be created.

You can check on the number of rows that have been created with the following command

./sbutil -tpcds -cf ~/wallet_DOMADW.zip -cs domadw_high -u tpcds -p -tables

You should see something similar to this

Screenshot of RapidWeaver (06-09-2018, 20-15-27)
The size may be smaller or larger than this depending on a number of factors but in this case HCC has attempted to compress the data even though it was loaded via DML operations. A higher compression ratio can be achieved by reorganising the data.


Step 7/ Run a workload


In the next step we’ll be using swingbench’s load generator to create a load against ADW. Since we are running on a command line interface with no graphical UI we’ll be using the “charbench” command line utility. But before we begin we’ll disable a couple of the TPC-DS queries because they take a long time to complete. To do this just run the following command.


Now we can run a query workload against the TPCDS schema with a command similar to this


I won’t explain the parameters that I detailed earlier when running the wizard but for the new ones do the following
  • -v indicates what info should be shown in the terminal when running the command. In this instance I’ve asked that the users logged on, The Time remaining in the run, Tx/Min, Tx/Sec and average response time of the queries (in milliseconds)
  • -intermin -intermax indicates the time to sleep between each transaction.
  • -rt indicates how long to run the benchmark before automatically stopping it

You should see output similar to the following

Screenshot of RapidWeaver (06-09-2018, 20-19-41)

NOTE : The value used in the example above for the run time might not be enough to complete all of the queries (approximately 100) in which case either increase the run time or up the number of CPUs being used.

One thing to try whilst running the load against the server is to try and scale the number of available CPUs to the ADW instance up and down. This should see a increase in the number of queries being processed.

Screenshot of Google Chrome (30-08-2018, 13-22-38)

Somethings to note. At the end of each run you’ll end up with a results file in xml format in the directory you ran charbench from. i.e.

$ ls
bmcompare clusteroverview debug.log oewizard results00003.xml results00006.xml results00009.xml sbutil swingbench
ccwizard coordinator jsonwizard results00001.xml results00004.xml results00007.xml results2pdf shwizard tpcdswizard
charbench data minibench results00002.xml results00005.xml results00008.xml results.xml sqlbuilder


These xml files contain the detailed results of each run i.e. average transactions per second, completed transactions, percentile response times etc. Whilst these are difficult to read you can install swingbench on a windows or mac and use a utility called results2pdf to convert them into a more human parseable form. You can find some details on how to do that here.

http://www.dominicgiles.com/blog/files/86668db677bc5c3fc1f0a0231d595ebc-139.html

Using the methods above you should be able to create scripts that test the performance of the ADW server. i.e. running loads with different CPU counts, users, think times etc.

But beware that before comparing the results with on premise servers there are a lot of features enabled on the ATP server like db_block_checking and db_check_sum that may not be enabled on another Oracle instance.

Step 7/ Optional functionality


To make the demo more interactive you could show the charts on the service console. The only issue is that the refresh rate is a little slow. You can improve on this by using some utilities I provide. The first of these is Database Time Monitor (http://www.dominicgiles.com/dbtimeviewer.html).

To install it you first need to download it (
http://www.dominicgiles.com/downloads.html) to your PC or mac and make sure that you’ve installed a Java 8 JRE. Once you’ve done that you simply need to unzip it and change into the bin directory. From there all you need to do is to run a command similar to

$> ./dbtimemonitor -u admin -p < your admin password > -cs domadw_low -cf /Users/dgiles/Downloads/wallet_DOMADW.zip

or on Windows

$> dbtimemonitor.bat -u admin -p < your admin password > -cs domadw_low -cf /Users/dgiles/Downloads/wallet_DOMADW.zip

where the -cf parameter references the credential file you downloaded and the -cs parameter references the service for ATP. You should see a screen similar to this.

Screenshot of ChildMain (10-08-2018, 19-07-22)

You can use this to monitor in real time the activity of the ATP instance.

The final tool, cpumonitor (http://www.dominicgiles.com/cpumonitor.html), allows you to monitor the activity of the application server. This can be downloaded from here (http://www.dominicgiles.com/downloads.html) and again should be installed on your PC or Mac. This is done by simply unzipping the download. Then change into the bin directory on mac or linux or the winbin directory on a windows machine.

You’ll need to edit the XML file to reflect the location of the application server.



Save the file and then launch it with the command

$> ./cpumonitor
Screenshot of Terminal (11-08-2018, 19-04-00)

Conclusion

This walk through should show you how to install swingbench and it’s DSS and data warehousing benchmarks against ADW. I’ll be providing
Comments

Setting up Swingbench for Oracle Autonomous Transaction Processing (ATP)

The following blog details how to install swingbench against ATP and use it to run load tests. The example below used the Simple Order Entry (SOE) benchmark but its possible to follow the methodology laid out below to run any of the other supplied benchmarks (SH, TPC-DS etc). To do this you’ll need to run through the following steps (Step 8 is optional).

Step 1/ Make Sure you have a SSH Public key


You are likely to already have a ssh key but it is possible that you want to create another purely for this exercise. You’ll need this key to create your application server. You can find details on how to do this here

https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key

It’s the .pub file or more precisely its contents that you’ll need. The public key file is typically created in the hidden .ssh directory in your home directory. The public key will look something like this (modified)

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfO/80wleUCYxY7Ws8c67PmqL2qRUfpdPGOduHmy9xT9HkCzjoZHHIk1Zx1VpFtQQM+RwJzArZQHXrMnvefleH20AvtbT9bo2cIIZ8446DX0hHPGaGYaJNn6mCeLi/wXW0+mJmKc2xIdasnH8Q686zmv72IZ9UzD12o+nns2FgCwfleQfyVIacjfi+dy4DB8znpb4KU5rKJi5Zl004pd1uSrRtlDKR9OGILvakyf87CnAP/T8ITSMy0HWpqc8dPHJq74S5jeQn/TxrZ6TGVA+xGLzLHN4fLCOGY20gH7w3rqNTqFuUIWuIf4OFdyZoFBQyh1GWMOaKjplUonBmeZlV

You’ll need this in step 3.

Step 2/ Create the ATP Instance


You’ll have to have gone through the process of acquiring an Oracle Cloud account but that’s beyond the scope of this walkthrough. Once you have the account and have logged into Oracle Cloud Infrastructure, click on the menu button in the top left of the screen and select “Autonomous Transaction Processing”. Then simply follow these steps.

ATP 11-08-18, 9.48.53 am


Step 3/ Create a compute resource for the application server


Whilst the ATP instance is creating we can create our application to run swingbench. For any reasonable load to be run against the application server you’ll need a minimum of two cores for larger workloads you may need a bigger application or potentially a small cluster of them.

In this walkthrough we’ll create a small 2 core Linux Server VM.

Iaas Creation 11-08-18, 9.48.29 am

This should only take a couple of minutes. On completion we’ll need to use the public IP address of the application server we created in the previous step.

Step 4/ Log onto application server and setup the environment


In this step we’ll use ssh to log onto the application server and setup the environment to run swingbench. Ssh is natively available on MacOS and Linux. On platforms like Windows you can use Putty. You’ll need the IP address of the application server you created in the previous step.

First bring up a terminal on Linux/Mac. On Putty launch a new ssh session. The username will be “opc”
ssh opc@< IP Address of Appserver >

You should see something similar to
$> ssh opc@129.146.65.101
ECDSA key fingerprint is SHA256:kNbpKWL3M1wB6PUFy2GOl+JmaTIxLQiggMzn6vl2qK1tM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '129.146.65.101' (ECDSA) to the list of known hosts.
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa': 
[opc@swingbench-client ~]$ 



By default java isn’t installed on this VM so we’ll need to install it via yum. We’ll need to update yum first

sudo yum makecache fast

Then we can install java and its dependencies

sudo yum install java-1.8.0-openjdk-headless.x86_64

We should now make sure that java works correctly

java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

We can now pull the swingbench code from the website

curl http://www.dominicgiles.com/swingbench/swingbenchlatest.zip -o swingbench.zip

and unzip it

unzip swingbench.zip

Step 5/ Download the credentials file


The next step is to get the credentials file for ATP. You can do this by following these steps.

Credentials 11-08-18, 4.09.07 pm

You’ll need to upload this to our application server with a command similar to

$> scp wallet_SBATP.zip opc@< IP Address of Appserver >:

This will placehe home directory of the application server. You don’t need to unzip it to use it.

Step 6/ Install a workload schema into the ATP instance


We can now install a schema to run our transactions against. We do this by first changing in to the swingbench bin directory

$> cd swingbench/bin

And then running the following command replacing your passwords with those that you specified during the creation of the ATP instance.

A quick explanation of the parameters we are using

  • -cf tells oewizard the location of the credentials file
  • -cs is the connecting for the service of the ATP instance. It is based on the name of the instance and is of the form followed by one of the following _low, _medium,_high,_parallel
  • -ts is the name of the table space to install swingbench into. It is currently always “data”
  • -dba is the admin user, currently this is always admin
  • -dbap is the password you specified at the creation of the ATP instance
  • -u is the name you want to give to the user you are installing swingbench into (I’d recommend soe)
  • -p is the password for the user. It needs to follow the password complexity rules of ATP
  • -async_off you need to disable the wizards default behavior of using async commits. This is currently prohibited on ATP
  • -scale indicates the size of the schema you want to create where a scale of 1 will generate 1GB of data. The indexes will take an additional amount of space roughly half the size of the data. A scale of 10 will generate a 10GB of data and roughly of 5GB of indexes
  • -hashpart tells the wizard to use hash partitioning
  • -create tells swingbench to create the schema (-drop will delete the schema)
  • -cl tells swingbech to run in character mode
  • -v tells swingbench to output whats going on (verbose mode)

You should see the following output. A scale of 1 should take just over 5 mins to create. If you specified more CPUs for the application server of ATP instance you should see some improvements in performance, but this is unlikely to truly linear because of the nature of the code.
SwingBench Wizard
Author  :	 Dominic Giles
Version :	 2.6.0.1082

Running in Lights Out Mode using config file : ../wizardconfigs/oewizard.xml
Connecting to : jdbc : oracle : thin : @sbatp_medium                             
Connected                                                                  
Starting run                                                               
Starting script ../sql/soedgdrop2.sql                                      
Script completed in 0 hour(s) 0 minute(s) 2 second(s) 691 millisecond(s)   
Starting script ../sql/soedgcreatetableshash2.sql                          
Script completed in 0 hour(s) 0 minute(s) 1 second(s) 433 millisecond(s)   
Starting script ../sql/soedgviews.sql                                      
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 31 millisecond(s)    
Starting script ../sql/soedgsqlset.sql                                     
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 196 millisecond(s)   
Inserting data into table ADDRESSES_1124999                                
Inserting data into table ADDRESSES_2                                      
Inserting data into table ADDRESSES_375001                                 
Inserting data into table ADDRESSES_750000                                 
Inserting data into table CUSTOMERS_749999                                 
Inserting data into table CUSTOMERS_250001                                 
Inserting data into table CUSTOMERS_500000                                 
Inserting data into table CUSTOMERS_2                                      
Run time 0:00:19 : Running threads (8/8) : Percentage completed : 5.36

You can then validate the schema created correctly using the following command

$> ./sbutil -soe -cf ~/wallet_SBATP.zip -cs sbatp_medium -u soe -p < a password for the soe user > -val
The Order Entry Schema appears to be valid.
--------------------------------------------------
|Object Type    |     Valid|   Invalid|   Missing|
--------------------------------------------------
|Table          |        10|         0|         0|
|Index          |        26|         0|         0|
|Sequence       |         5|         0|         0|
|View           |         2|         0|         0|
|Code           |         1|         0|         0|
--------------------------------------------------

You may have noticed that the stats failed to collect in the creation of the schema (known problem) so you’ll need to collect stats using the following command

$>./sbutil -soe -cf ~/wallet_SBATP.zip -cs sbatp_medium -u soe -p -stats

And see the row counts for the tables with

$ ./sbutil -soe -cf ~/wallet_SBATP.zip -cs sbatp_medium -u soe -p < your soe password > -tables
Order Entry Schemas Tables
----------------------------------------------------------------------------------------------------------------------
|Table Name                  |                Rows|              Blocks|           Size|   Compressed?|  Partitioned?|
----------------------------------------------------------------------------------------------------------------------
|ORDER_ITEMS                 |           4,271,594|              64,448|        512.0MB|              |           Yes|
|ADDRESSES                   |           1,500,000|              32,192|        256.0MB|              |           Yes|
|LOGON                       |           2,382,984|              32,192|        256.0MB|              |           Yes|
|CARD_DETAILS                |           1,500,000|              32,192|        256.0MB|              |           Yes|
|ORDERS                      |           1,429,790|              32,192|        256.0MB|              |           Yes|
|CUSTOMERS                   |           1,000,000|              32,192|        256.0MB|              |           Yes|
|INVENTORIES                 |             897,672|               2,386|         19.0MB|      Disabled|            No|
|PRODUCT_DESCRIPTIONS        |               1,000|                  35|          320KB|      Disabled|            No|
|PRODUCT_INFORMATION         |               1,000|                  28|          256KB|      Disabled|            No|
|ORDERENTRY_METADATA         |                   4|                   5|           64KB|      Disabled|            No|
|WAREHOUSES                  |               1,000|                   5|           64KB|      Disabled|            No|
----------------------------------------------------------------------------------------------------------------------
                                                            Total Space           1.8GB

Step 7/ Run a workload


The first thing we need to do is to configure the load generator to load the users on in a sensible fashion (i.e. to not exceed the login rate). You could do this manually by editing the config file or use the following command.

We can now run a workload against the newly created schema using a command similar to

I
won’t explain the parameters that I detailed earlier when running the wizard but for the new ones do the following

• -v indicates what info should be shown in the terminal when running the command. In this instance I’ve asked that the users logged on, Tx/Min, Tx/Sec and the average response time for each transaction are shown.
• -min and -max indicate the time to sleep between each DML operation in a transaction (intra sleep). A Transaction is made up of many DML operations
• -intermin -intermax indicates the time to sleep between each transaction.
• -di indicates that I want to disable the following transactions SQ,WQ,WA. These are reporting queries and aren’t really needed.
• -rt indicates how long to run the benchmark before stopping it

You should see output similar to the following
$> ./charbench -c ../configs/SOE_Server_Side_V2.xml -cf ~/wallet_SBATP.zip -cs sbatp_low -u soe -p < your soe password > -v users,tpm,tps,vresp -intermin 0 -intermax 0 -min 0 -max 0 -uc 128 -di SQ,WQ,WA -rt 0:0.30
Author  :	 Dominic Giles
Version :	 2.6.0.1082

Results will be written to results.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS	NCR	UCD	BP	OP	PO	BO	SQ	WQ	WA
17:29:53	[0/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:54	[0/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:55	[0/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:56	[40/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:57	[45/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:58	[51/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:59	[60/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:00	[69/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:01	[78/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:02	[84/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:03	[95/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:04	[101/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:05	[104/128]	0	0	419	395	547	554	0	570	0	0	0
17:30:06	[108/128]	118	118	653	110	379	1576	375	647	0	0	0
17:30:07	[116/128]	325	207	355	220	409	406	499	450	0	0	0
17:30:08	[128/128]	547	222	423	100	203	504	403	203	0	0	0
17:30:09	[128/128]	831	284	420	306	303	396	501	505	0	0	0
17:30:10	[128/128]	1133	302	344	234	232	884	603	217	0	0	0
17:30:11	[128/128]	1438	305	564	367	355	375	559	376	0	0	0
17:30:12	[128/128]	1743	305	443	150	323	319	233	143	0	0	0
17:30:13	[128/128]	2072	329	1712	179	108	183	325	179	0	0	0
17:30:14	[128/128]	2444	372	1036	102	147	204	194	134	0	0	0
17:30:15	[128/128]	2807	363	1584	85	182	234	179	169	0	0	0
17:30:16	[128/128]	3241	434	741	159	157	250	256	251	0	0	0
17:30:17	[128/128]	3653	412	517	91	178	181	176	137	0	0	0

We specified a runtime of 30 seconds (-rt 0:0.30) which meant the workload ran for a short period of time. You could increase this by changing the the -rt parameter to something larger like

-rt 1:30

Which would run the benchmark for 1 hour 30mins. or you could leave the -rt command off altogether and the benchmark would run until you hit return.

One thing to try whilst running the load against the server is to try and scale the number of available CPUs to the ATP instance up and down. This should see an increase in the number of transactions being processed.

Screenshot of Google Chrome (10-08-2018, 18-45-10)

Somethings to note. At the end of each run you’ll end up with a results file in xml format in the directory you ran charbench from. i.e.
$ ls 
bmcompare  clusteroverview  debug.log   oewizard          results00003.xml  results00006.xml  results00009.xml  sbutil      swingbench
ccwizard   coordinator      jsonwizard  results00001.xml  results00004.xml  results00007.xml  results2pdf       shwizard    tpcdswizard
charbench  data             minibench   results00002.xml  results00005.xml  results00008.xml  results.xml       sqlbuilder


These xml files contain the detailed results of each run i.e. average transactions per second, completed transactions, percentile response times etc. Whilst these are difficult to read you can install swingbench on a windows or mac and use a utility called results2pdf to convert them into a more human parseable form. You can find some details on how to do that here.

http://www.dominicgiles.com/blog/files/86668db677bc5c3fc1f0a0231d595ebc-139.html

Using the methods above you should be able to create scripts that test the performance of the ATP server. i.e. running loads with different CPU counts, users, think times etc.

But beware that before comparing the results with on premise servers there are a lot of features enabled on the ATP server like db_block_checking and db_check_sum that may not be enabled on another Oracle instance.

Step 8/ Optional functionality


To make the demo more interactive you could show the charts on the service console. The only issue is that the refresh rate is a little slow. You can improve on this by using some utilities I provide. The first of these is Database Time Monitor (http://www.dominicgiles.com/dbtimeviewer.html).

To install it you first need to download it (http://www.dominicgiles.com/downloads.html) to your PC or mac and make sure that you’ve installed a Java 8 JRE. Once you’ve done that you simply need to unzip it and change into the bin directory. From there all you need to do is to run a command similar to

$> ./dbtimemonitor -u admin -p -cs sbatp_low -cf /Users/dgiles/Downloads/wallet_SBATP.zip

or on Windows

$>
dbtimemonitor.bat -u admin -p -cs sbatp_low -cf /Users/dgiles/Downloads/wallet_SBATP.zip

The -cf parameter references the credential file you downloaded and the -cs parameter references the service for ATP. You should see a scree similar to this.

Screenshot of ChildMain (10-08-2018, 19-07-22)

You can use this to monitor in real time the activity of the ATP instance. Currently it reports all of the cores on the server you are running on. This will be fixed shortly to just show the cores available to you.

The final tool, cpumonitor (http://www.dominicgiles.com/cpumonitor.html), allows you to monitor the activity of the application server. This can be downloaded from here (http://www.dominicgiles.com/downloads.html) and again should be installed on your PC or Mac. This is done by simply unzipping the download. Then change into the bin directory on mac or linux or the winbin directory on a windows machine.

You’ll need to edit the XML file to reflect the location of the application server.


Save the file and then launch it with the command

$> ./cpumonitor
Screenshot of Terminal (11-08-2018, 19-04-00)

Conclusion


That's it. We’ve installed swingbench against Oracle Autonomous Transaction Processing. In my next blog entry we'll take a look at some automated tests you can user to kick the tyres on ATP. I'll also include a version that shows how to install the TPC-DS like schema that comes with swingbench against Oracle Autonomous Warehouse.


Comments

Accessing the Oracle Object Store

OCIConnection

OCI Object Store Examples

The following are a series of examples showing the loading of data into the Oracle Object Store. For these to work with your own data you'll need to have your own Oracle Cloud account and uploaded a key. You can find details on how to achieve this here

I'll be using the Oracle OCI Python SDK which wrappers the REST API. You can find details on the API here

Before we do anything we'll need to load the required needed Python modules.

In [127]:
import oci
import keyring
import ast
import os

Configuration needed to connect

I'm using the "keyring" Python module to hold the config for my connection to OCI (to avoid needlessly exposing sensitive information). It's of the form

{
    "user": "your user ocid",
    "key_file": "the path to your private key file",
    "fingerprint": "the fingerprint of your public key",
    "tenancy": "your tenancy ocid",
    "region": "the region you are working with"
}

After retrieving it from my keyring store I then need to convert it into a dictionary before using it. You can also validate the config you are using as well. Handy if this is the first time you've configured it.

In [128]:
my_config = ast.literal_eval(keyring.get_password('oci_opj','doms'))
oci.config.validate_config(my_config)

Create object storage client

Then I just need to retireve a Object Storage client to start working with data

In [129]:
object_storage_client = oci.object_storage.ObjectStorageClient(my_config)
In [130]:
namespace = object_storage_client.get_namespace().data
bucket_name = "doms_object_store"

Upload the contents of user directory to a bucket

I'll create a bucket and then select all of the files from a user defined directory and upload them to the newly created bucket

In [131]:
import os, io

directory = '/Users/dgiles/datagenerator/bin/generateddata'
files_to_process = [file for file in os.listdir(directory) if file.endswith('csv')]

Create a bucket named "Sales_Data" and give it the tenancy ocid from your config.

In [132]:
try:
    create_bucket_response = object_storage_client.create_bucket(
        namespace,
        oci.object_storage.models.CreateBucketDetails(
            name='Sales_Data',
            compartment_id=my_config['tenancy']
        )
    )
except Exception as e:
    print(e.message)

Then we just need to loop through the list of files in the directory specified and upload them to the newly created bucket

In [133]:
bucket_name = 'Sales_Data'
for upload_file in files_to_process:
    print('Uploading file {}'.format(upload_file))
    object_storage_client.put_object(namespace, bucket_name, upload_file, io.open(os.path.join(directory,upload_file),'r'))
Uploading file CUSTOMERS.csv
Uploading file PRODUCTS.csv
Uploading file COUNTRIES.csv
Uploading file PROMOTIONS.csv
Uploading file CHANNELS.csv
Uploading file SUPPLEMENTARY_DEMOGRAPHICS.csv
Uploading file SALES.csv

Retrieve a list of objects in a bucket

The folowing retrieves a bucket and gets a list of objects in the bucket

In [134]:
bucket = object_storage_client.get_bucket(namespace, bucket_name)
object_list = object_storage_client.list_objects(namespace, bucket_name)

for o in object_list.data.objects:
    print(o.name)
CHANNELS.csv
COUNTRIES.csv
CUSTOMERS.csv
PRODUCTS.csv
PROMOTIONS.csv
SALES.csv
SUPPLEMENTARY_DEMOGRAPHICS.csv

Download the contents of an object

The following downloads a file from a named bucket in chunks and writes it to user defined directory on the client

In [135]:
# Attempt to download a file

object_name = "CUSTOMERS.csv"
destination_dir = '/Users/dgiles/Downloads'.format(object_name) 
get_obj = object_storage_client.get_object(namespace, bucket_name, object_name)
with open(os.path.join(destination_dir,object_name), 'wb') as f:
    for chunk in get_obj.data.raw.stream(1024 * 1024, decode_content=False):
        f.write(chunk)

Delete a bucket

We can just as simply delete the bucket we've just created but first we'll need to delete all of the objects inside of it.

In [136]:
object_list = object_storage_client.list_objects(namespace, bucket_name)

for o in object_list.data.objects:
    print('Deleting object {}'.format(o.name))
    object_storage_client.delete_object(namespace, bucket_name, o.name)

print('Deleting bucket')    
response = object_storage_client.delete_bucket(namespace, bucket_name)
Deleting object CHANNELS.csv
Deleting object COUNTRIES.csv
Deleting object CUSTOMERS.csv
Deleting object PRODUCTS.csv
Deleting object PROMOTIONS.csv
Deleting object SALES.csv
Deleting object SUPPLEMENTARY_DEMOGRAPHICS.csv
Deleting bucket
Comments

Making the alert log just a little more readable

One of the most valuable sources of information about what the Oracle database has done and is currently doing is the alert log. It's something that every Oracle Database professional should be familiar with. So what can you do to improve you chances of not missing important pieces of info? The obvious answer is that you should use a tool like Enterprise Manager. This is particularly true if you are looking after hundreds of databases.

But what if you are only looking after one or two or just testing something out? Well the most common solution is to simply tail the alert log file.

The only issue is that it's not the most exciting thing to view, this of course could be said for any terminal based text file. But there are things you can do to make it easier to parse visually and improve your chances of catching an unexpected issue.

The approach I take is to push the alert log file through python and use the various libraries to brighten it up. It's very easy to go from this (tail -f)

Screenshot of ScreenFloat (20-03-2018, 08-25-26)

To this

Screenshot of ScreenFloat (20-03-2018, 08-25-58)

The reason this works is that python provides a rich set of libraries which can add a little bit of colour and formatting to the alert file.

You can find the code to achieve this in the gist below



Just a quick note on installing this. You'll need either python 2.7 or 3 available on your server.

I'd also recommend installing pip and then the following libraries

pip install humanize psutil colorama python-dateutil

After you've done that it's just a case of running the script. If you have $ORACLE_BASE and $ORACLE_SID set the library will try and make a guess at the location of the alert file. i.e

python alertlogparser.py

But if that doesn't work or you get an error you can also explicitly specify the location of the alert log with something like


python alertlogparser.py -a $ORACLE_BASE/diag/rdbms/orcl/orcl/trace/alert_orcl.log


This script isn't supposed to be an end product just a simple example of what can be achieved to make things a little easier. And whilst I print information like CPU load and Memory there's nothing to stop you from modifying the script to display the number of warnings or errors found in the alert log and update it things change. Or if you really want to go wild implement something similar but a lot more sophisticated using python and curses

The age of "Terminal" is far from over….
Comments

ORDS From : Start to Finish

Introduction


This is likely to be a pretty long posting on ORDS to describe the workflow for creating a REST application from start to finish. With that said I'm going to have to make a few compromises to keep it tolerable to follow. I thought long and hard about what would be a meaningful example that would be easy to understand but show most of the aspects of building a REST based application using Oracle ORDS. With this in mind I settled on a simple micro service to enable users to "like" products/rows in a similar fashion to being able to like posts on Facebook/Twitter or other social media networks. Whilst there are plenty of examples of how to do this out there already I couldn't find many that explained the entire process from start to finish and using the Oracle Database as the target. I'll try and keep it up to date as we are expecting to see a few changes in SQLDeveloper and ORDS shortly. On completion we should have a web page that looks a little like the one below.

untitled

The code for this blog is maintained here
urlhttps://github.com/domgiles/ORDS_Example

To create a working environment simply go to the link above, click on the "Clone or download" button and then click on the "Download ZIP" link. When it's finished downloading uncompress the file into your preferred location.

Safari
This post should largely serve as a reference for those wishing to test out this functionality themselves. I'll also be releasing a version of this code where we store JSON in the database as opposed to relational table to offer a comparison with the approach.

ORDS Description


First of all what is ORDS and why do we care. Well this is Oracle's description

"Oracle REST Data Services (ORDS) makes it easy to develop modern REST interfaces for relational data in the Oracle Database and now, with ORDS 3.0, the Oracle Database 12c JSON Document Store and Oracle NoSQL Database. ORDS is available both as an Oracle Database Cloud Service and on premise.

REST has become the dominant interfaces for accessing services on the Internet, including those provided by major vendors such as Google, Facebook, Twitter, and Oracle, and within the enterprise by leading companies throughout the world. REST provides a powerful yet simple alternative to standards such as SOAP with connectivity to virtually every language environment, without having to install client drivers, because it is based on simple HTTP calls which virtually all language environments support.

For relational data, ORDS 3.0 now automatically generates REST endpoints for about a dozen major single table operations including create, query, update, and delete. Easy to write JSON filters allow query predicates and sorts to be specified in a query-by-example format. For more complex operations, REST calls are mapped to SQL and PL/SQL routines you can write which return data in JSON and other formats.

ORDS 3.0 enables a wide range of operations to be performed on JSON documents in the Oracle Database 12c Document Store using the Simple Oracle Document Access (SODA) API over REST. These operations include create, query, update, and delete with support for JSON filters.
"

With that said lets get started.

Prerequisites


If you plan to install this example you'll need to have the following components available. I won't be describing their installation. I'll assume you'll have done this ahead of time
  • Oracle Database (SE or EE) 11.1 or later. Oracle Database 12c is the preferred version. It can be configured either as a PDB or not. You can download it from here
  • Oracle SQL Developer 4.1.3. It needs to be at least this version. You can down load it here
  • Oracle ORDS 3.0.6. You can download it here. (NOTE : I will walkthrough this installation later in the document)
  • A Java Virtual Machine on the server/laptop/VM you are running the example on. I'd recommend Java 8 available here
  • Alternatively you could download a Virtual Box VM such as the "Hands-on Database Application Development" VM from here.Which will provide everything you need in a ready configured package.

I'm also going to assume that you'll be running on a unix type OS (Linux, Mac OS, Solaris, AIX). Although this will also work fine on windows (just need to convert some of the commands).

My setup


For my setup I have the following configuration.
Architecture@2x
In reality everything is running on my workstation but the the diagram above provides a working model

Install of ORDS

Assuming you aren't going down the pre installed Virtual Box VM mentioned in the previous section. The install of ORDS is very simple. After you've downloaded it to your VM/Server/Laptop all you need to do is unzip it and change into the directory . i.e.

In the directory you unzipped the download run ORDS. The first time you run this it will lead you through the install asking you for the connection to the database. I'd recommended skipping configuring APEX at this stage (by selecting 2 when asked) and running ORDS using http. Whilst these options might not be ideal for production environments they will enable you to get started.

After this completes hit control-c to exit the program. At this stage you should probably add a user with SQLDeveloper development support. I've named mine "appdev" but any name would be fine. We'll use this user later in the example. You can do this using the following command and supplying passwords.

And then restart ORDS. This time it won't prompt you for any passwords but simply run in the foreground


Thats all thats needed to install ORDS.

Install Database Tables


The next step is to create a user and tablespace to hold the tables. I've provided a simple script "create_user.sql" that will create a database user "SOE" and tablespace "SOE" to run it simply connect to the database with a user with DBA privilege. (NOTE : This could also be done inside of SQLDeveloper)
The script creates a user "soe" and and tablespace "soe" and then grants the user access to the tablespace. It finishes off by granting the user enough privileges to create tables, views and procedures
At this stage it's worth starting SQLDeveloper where the bulk of the work will be done. You'll need to create a connection to the Database .You'll do this from the connections tab as show below.

Oracle SQL Developer

Then fill in the details and hit connect

New - Select Database Connection

The next step is to create the example tables and test data. I've provided a simple script "create_tables.sql to do this for you. It will create one to hold the "product_information" that would be typically part of and existing application and other to hold the likes. This table is called "user_likes".


The "USER_LIKES" table is an Index Organised Table. Whilst the trivial amount of data we holding in this example won't benefit from his approach it would likely have an impact on production scale workloads. The script will insert a few sample rows as well.

To run this script copy and paste the SQL into the SQLDeveloper
Oracle SQL Developer - oracle12c2 - soe


Install REST Service Application Code



At this point we'll connect to ORDS from SQLDeveloper. You'll be using the username you used during the installation of ORDS. In my case this was "appdev". From the menu you'll need to view "REST Data Services" and select the "Development" option as shown below

Oracle SQL Developer

This will launch the REST development UI (by default on the left hand side of the screen) as shown below. Add a new connection and fill in the details as shown i.e. Server Path = "/ords" and Schema Workspace = "/soe/".

Affinity Designer

The next step is to run a script "utilities_module.sql" which will enable the schema to support ORDS and create the ORDS modules used by the application. This is purely for convenience. I'll describe the process of creating one of the modules in a blog shortly after this one. To install the REST modules simply copy and paste the code below into the SQLDeveloper session you've already created.

As shown below
Oracle SQL Developer - oracle12c2 - soe

After this script completes we can then download the modules into SQLDeveloper from ORDS's repository. The reason we need to do this is that we've simply imported the data into the ORDS repository and not made it visible to SQLDeveloper at this point. To achieve this all we need to do is

slice1@2x

This will display the modules we created by running the script. The two modules are
  • Products : This will simply retrieve the rows from the PRODUCTS table. It's not really necessary to explicitly declare this module as ORDS base functionality already has a REST API that could be used instead.
  • Utilities : This module contains three templates or services we can call to like a row in a table
    • get likes : Gets the total number of likes of a row by all users and the user specified in the parameter
    • like : Enables you to like a row if you haven't already
    • unlike : reverses a like operation by the user specifed if he's liked it before
You can drill into the details of each service by clicking on it and then clicking on the HTTP operation it uses in SQLDeveloper. The details of the service comprises of three sections (here we'll take a look at the "like" service).

The first tab list the SQL or PL/SQL that will implement the business logic. In this particular example we are using PL/SQL to first check if a user has liked a row by checking first the USER_LIKES table. If they haven't already liked the row we insert a new one and commit the transaction.

Oracle SQL Developer

The next tab lists the parameters that are passed to our logic or SQL statement. In our particular example we are passing three parameters that tell the procedure the table that is being liked, the unique identifier for the row (in our simple case it's assumed to be a string but could as equally be a rowid or number) and the user identifier (typically application specific but we'll assume it's something like an email).

Oracle SQL Developer

The final tab summarises some of the settings for the service and provides you with a url to call to test the service.

Oracle SQL Developer

You can test most "GET" REST calls pretty trivially within a browser put it takes just a little bit more for POST, PUT and DELETE. To test the POST call shown in the screen shot above we are going to use a command line utility called curl (installed by default on linux and MacOS). We'll also use a file that contains the json we want to send to the REST service. This file is called "test.json" and has the following contents


To call the REST service all we need to do is issue the following command


Obviously substitute the name of your server in the code. You should get a HTTP return code of 200. Indicating that the code successfully ran.

We can also test the GET REST call that returns the "like" we just made using the following command


You should get a small JSON snippet with the number of likes by everybody and the user "dominic.giles"

Calling the REST Services from a web page


Now that we've got the services up and running we can use a simple web page that displays all of the rows from the PRODUCT_INFORMATION table with a "thumb" to enable us to like the row. Like all of the code used in this blog you can find them all on my github repository.Let me start by saying that the web page we are going to create is a very simple example and would normally require a lot more validation and error handling before it would ever be considered for a production environment.

The web page we'll put together uses JQuery and a javascript library called "DataTables" (link here). However it's important to point out that implementing this functionality doesn't require these tools. It could have been as easily achieved using Oracle Application Express or a framework like Jet. But to simplify things this approach allows us to minimise the steps required and reduce the amount of moving parts.

The final app should look like this

slicescreen

When the form is first loaded the javascript calls the "GetLikes" REST service (once for each row displayed) which asynchronously returns the total count of "likes" a product has received and sets the colour of the them dependent on whether the current user has liked it. It will stay grey if the current user hasn't liked it and turn blue if they have. On clicking on a thumb the javascript checks whether the user has previously like it. If they haven't it calls the "Like" REST Call and call the "Unlike" REST call if they have.

The code is made up broken up into two files. One containing the HTML and the other the containing the javascript.



And the javascript



This really isn't the place to go into all of the details of the code but there are one or two things that are worth pointing out. The first is at the start of liketable.js file. You'll need to change the following entries to reflect your own environment. It's likely you'll only need to change the "hostport" variable to start with. The username is hardcoded in this example but in the real world it would be derived from the system context.

var $userName = 'dominic.giles';
var $tableName = 'product_information';
var $hostport = 'http://oracle12c2:8080';
var $uniqueIDColumn = "product_id";

The other piece of code worth looking at is the calls to the REST service. The code below uses the javascript ajax functionality to make an asynchronous call to ORDS which with then return the JSON response.

function getRowLikes(un, tn, rID) {
    var likeResult = $.ajax({
        url: $hostport + "/ords/soe/utilities/getlikes?table_name=" + tn + "&column_value=" + rID + "&user_id=" + un,
        type: "GET"
    });
    return likeResult;
}


Likewise the following shows the AJAX call to the like function. This call unlike the the previous one is performed synchronously but follows the same principals.

function likeRow(un, tn, rID) {
    $.ajax({
        url: $hostport + "/ords/soe/utilities/like",
        type: "POST",
        data: {"table_name": tn, "column_value": rID, "user_id": un},
        async: false
    });
}


Once you've made the changes to the hostname variable you should be able to open a browser and open the products.html file within it.

Over the coming days I'll knock up a little screen cast to accompany this blog.
Comments

A new member of the family

I’ve just add a new member to the family. MonitorDB. It’s a simple tool that allows you to trivially chart and record the values of a SQL query. You can have as many charts as you want and you can pretty much chart anything as long as it’s expressed in SQL form. Take a look at short description here or download it here
Monitor IO
Comments

Notes on pre-parsing data for Oracle data loads

Sometimes data simply isn't in a form that is easy to load into an Oracle database i.e. column form. It would be great if everybody exchanged data in a simple CSV form with a single file to table mapping. Sadly that isn't the case and sometimes you have to do a little work to get it into a form thats useable. A recent benchmark highlighted this issue very well. The customer provided the data in compressed CSV form (so far so good) but the data was held in key value pairs (not so good). They also provided us with a mapping file that describes how it all fits together. 

Now typically the approach many people would take would be to develop some form of program that parses all of the data and writes it to staging  area and then loads all of it in one go to the target database. I make no criticism of this approach since it works well and as long as its not time critical. It's by far the simplest method. However Im a big fan of taking advantage of whats already available and one of the most underused and powerful features of the Oracle database is  the preparser. It enables you to pipeline various operations so they all run as quickly as possible. So going back to my benchmark we used this approach to load data into out target database. It consisted of 4 steps
  • Read the data of the filesystem as efficiently as possible and write it to stdout
  • Read from stdin and Unzip the the data writing it to stdout
  • Read from stdin into a java program to do the key value mapping and error detection/correction writing the output to stdout
  • Read from stdin into sqlloader
I will at this time point out I'm not really using Oracle's pre-parser I'm just using good old "Pipes" but why this is important will become clearer later This approach gave us a great deal of flexibility and simplified the code we had to write. It operates in some respects as a serialised map reduce flow but I'll come back to that another day and explain how it can be integrated directly into a massively parallel approach. It's also possible to get Java to natively read the zipped file as well having said that I offloaded that process to the os to enable me to use different compression formats when needed.

The java program simple reads from stdin and writes to stdout. To handle key value pairs just required the program to read the mapping file in and split and parse the values from stdin. The data was then written to stdout in a well know order.

Java extract from my program... 


            BufferedReader br = new BufferedReader(new InputStreamReader(System.in), OneMB);
            String line = null;
            HashMap keyValuePairs = null;
            MyTokenizer mt = null;
            while ((line = br.readLine()) != null) {
                keyValuePairs = new HashMap(200);
                mt = new MyTokenizer(line, delimitor);
                for (String token : mt) {
                    int loc = token.indexOf("=");
                    if (loc != -1) {
                        String i = token.substring(0, loc);
                        String s = token.substring(loc + 1, token.length());
                        keyValuePairs.put(i, s);
                    }
                StringBuffer outRec = new StringBuffer(1000);
                outRec.append(checkForNull(keyValuePairs.get("uniqueID"), "")).append(seperator);
                // mapping logic similar to above repeats
                System.out.println(outRec.toString());
            }

All that was needed for sqlloader to process the files was a control file that understood the order of the columns and any additional formatting. 

One of the additional benefits is that we can load the data via "direct path" and implement other features such as multi table insert. The Java preparser enables you to add all of the additional formatting to make this a trivial process.

The following diagram illustrates the process.

This equates into a Unix/Linux statement such as

/bin/dd if=myverybigfile.txt bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn | sqlloader bench/bench control=kv.ctl data=\"-\" direct=TRUE;

NOTE : one thing you may have noticed is that Im using dd to do 1MB I/Os. This just an efficiency operation and works well on structures such as DBFS, you could skip this part of the operation if needed.

Which brings us onto external tables and the preparser


External Tables and pre-parsers 

As I mentioned earlier I like to take advantage of functionality that's already available and one of those features in the Oracle database is external tables. I don't intend to go into much detail as to why you should use external tables other than they do much of the heavy lifting for you and they provide a seamless interface between the filesystem and the database. They effectively make files look like tables. 

This means it's trivial to implement parallelism for our pre-parser. We don't need to worry about how to handle the files and how to schedule everything, external tables take care of all of that for you. In our benchmark we used them in the following way




Our previous pipeline remains the same except that we don't need sqlloader its all managed by the table definition itself. So we end up with something similar to the following for the table definition (I've abbreviated it quite substantially and highlighted the important bits)

create table staging_ext_mydata_jan01
(    uniqueid NUMBER,
..
-- Lots of columns
..
 )
ORGANIZATION EXTERNAL
(
   TYPE oracle_loader
   DEFAULT DIRECTORY load_dir_jan01
   ACCESS PARAMETERS
   (
      RECORDS DELIMITED BY NEWLINE
      PREPROCESSOR exec_dir:'external_tab.sh'
      BADFILE log_dir: 'external.bad'
      LOGFILE log_dir: 'external.log'
fields terminated by '|'
OPTIONALLY ENCLOSED BY '"' AND '"'
   ( uniqueid char(100),
..
-- Lots of defintions
..
)
)
LOCATION('data1.txt',data2.txt'...'data100.txt'))
REJECT LIMIT UNLIMITED;

One of the things to note is that I've included the pipelined preprocessor inside of a shell script which looks like this 

/bin/dd if=$1 bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn

The important part of this script is the parameter ($1) that is passed to the shell script. This is the file name that the external table wants to process. 

A simple select statement from my "staging_ext_mydata_jan01" unzips and parses the data converting it to usable columns. Whats more if I issue the statement in parallel Oracle takes care of creating the processes for me and making sure everything is scheduled in an orderly fashion.

To finish the load we simply used a multi table insert to put the data into the correct tables in an efficient fashion. Using this approach we were able to read zipped files, parse them and insert them into our three target tables at over 1.5 million source records/sec.
Comments

New Build of Datagenerator

Im releasing a new build of Datagenerator simply because there hasn’t been one for a while. Thats not to say it hasn’t undergone significant changes. Most of them are as a result of enhancements to support schema creation for swingbench. In particular is the introduction of Pre and Post generation scripts. These allow me to run a complete schema creation from within datagenerator. These scripts appear as top level items from within the tree (see below).



In the side panel you can now include scripts and parameters for the scripts.



I’ve also included the script files used for generating the SH and SOE schemas used by swingbench. This should make it easier to understand what is going on and potentially create your own versions of the schemas.

In this release I’ve also improved the threading model and included one or two other performance enhancements....

In the next release I’m going to try and add support for for well know data items such as zip/post codes, NI numbers, Social Security etc.... as well as allowing users to plug their own data generators in.

You can download it from the usual place and as before leave comments below or via the comments page.
Comments

Database Time Monitor

I’ve released another little utility called “Database Time Monitor”. I think the name speaks for itself. You can find more details here and download it here.

Comments