Setting up Swingbench for Oracle Autonomous Data Warehousing (ADW)

The following blog details how to install swingbench against ADW. It’s very similar in structure to my previous blog for ATP with the major difference being the benchmark we are using. The example below will use the TPC-DS Like Benchmark but the approach and parameters apply to the SH benchmark as well. To install swingbench on ADW you’ll need to run through the following steps (Step 7 is optional).

Step 1/ Make Sure you have a SSH Public key


You are likely to already have a ssh key but it is possible that you want to create another purely for this exercise. You’ll need this key to create your application server. You can find details on how to do this here

https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key

It’s the .pub file or more precisely its contents that you’ll need. The public key file is typically created in the hidden .ssh directory in your home directory. The public key will look something like this (modified)

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfO/80wleUCYxY7Ws8c67PmqL2qRUfpdPGOduHmy9xT9HkCzjoZHHIk1Zx1VpFtQQM+RwJzArZQHXrMnvefleH20AvtbT9bo2cIIZ8446DX0hHPGaGYaJNn6mCeLi/wXW0+mJmKc2xIdasnH8Q686zmv72IZ9UzD12o+nns2FgCwfleQfyVIacjfi+dy4DB8znpb4KU5rKJi5Zl004pd1uSrRtlDKR9OGILvakyf87CnAP/T8ITSMy0HWpqc8dPHJq74S5jeQn/TxrZ6TGVA+xGLzLHN4fLCOGY20gH7w3rqNTqFuUIWuIf4OFdyZoFBQyh1GWMOaKjplUonBmeZlV

You’ll need this in step 3.

Step 2/ Create the ADW Instance


You’ll have to have gone through the process of acquiring an Oracle Cloud account but that’s beyond the scope of this walkthrough. Once you have the account and have logged into Oracle Cloud Infrastructure, click on the menu button in the top left of the screen and select “Autonomous Data Warehouse”. Then simply follow these steps.

Napkin 28-08-18, 1.38.46 pm

Step 3/ Create a compute resource for the application server


Whilst the ADW instance is creating we can create our application to run swingbench. For any reasonable load to be run against the application server you’ll need a minimum of two cores for larger workloads you may need a bigger application or potentially a small cluster of them.

In this walkthrough we’ll create a small 2 core Linux Server VM.

Iaas Creation 11-08-18, 9.48.29 am

This should only take a couple of minutes. On completion we’ll need to use the public IP address of the application server we created in the previous step.

Step 4/ Log onto application server and setup the environment


In this step we’ll use ssh to log onto the application server and setup the environment to run swingbench. ssh is available on MacOS and Linux. On platforms like Windows you can use Putty. You’ll need the IP address of the application server you created in the previous step.

First bring up a terminal on Linux/Mac. On Putty launch a new ssh session. The username will be “opc”

ssh opc@< IP Address of Appserver >

You should see something similar to

ssh opc@129.146.65.101
ECDSA key fingerprint is SHA256:kNbpKWL3M1wB6PUFy2GOl+JmaTIxLQiggMzn6vl2qK1tM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '129.146.65.101' (ECDSA) to the list of known hosts.
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa':


By default java isn’t installed on this VM so we’ll need to install it via yum. We’ll need to update yum first

$> sudo yum makecache fast

Then we can install java and its dependencies

$> sudo yum install java-1.8.0-openjdk-headless.x86_64

We should now make sure that swingbench works correctly

$> java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

We can now pull the swingbench code from the website

$> curl http://www.dominicgiles.com/swingbench/swingbenchlatest.zip -o swingbench.zip

and unzip it

$> unzip swingbench.zip

Step 5/ Download the credentials file


The next step is to get the credentials file for ATP. You can do this by following these steps.

Napkin 2 28-08-18, 2.25.22 pm

You’ll need to upload this to our application server with a command similar to

scp wallet_DOMADW.zip opc@< IP address of App Server >:

This will place our credentials file in the home directory of the application server. You don’t need to unzip it to use it.

Step 6/ Install a workload schema into the ADW instance

We can now install a schema to run our queries against. We do this by first changing to the swingbench bin directory

cd swingbench/bin

And then running the following command replacing your passwords with those that you specified during the creation of the ADW instance.

A quick explanation of the parameters we are using
  • -cf tells tpcdswizard the location of the credentials file
  • -cs is the connecting for the service of the ADW instance. It is based on the name of the instance and is of the form followed by one of the following _low, _medium or _high
  • -ts is the name of the table space to install swingbench into. It is currently always “DATA”
  • -its is the name of the index table space to install swingbench into. It is currently always “DATA”
  • -dba is the admin user, currently this is always admin
  • -dbap is the password you specified at the creation of the ADW instance
  • -u is the name you want to give to the user you are installing swingbench into (I used tpcds)
  • -p is the password for the user. It needs to follow the password complexity rules of ATP
  • -async_off you need to disable the wizards default behaviour of using async commits. This is currently prohibited on ADW
  • -scale indicates the size of the schema you want to create where a scale of 10 will generate 10GB of data. The indexes will take an additional amount of space roughly half the size of the data.
  • -create tells swingbench to create the schema (-drop will delete the schema)
  • -cl tells swingbech to run in character mode
  • -v tells swingbench to output whats going on (verbose mode)

You should see the following output. A scale of 10 should take just over 25 mins to create depending on how many CPUs you allocated to your compute and ADW instances.

SwingBench Wizard
Author : Dominic Giles
Version : 2.6.0.1086

Running in Lights Out Mode using config file : ../wizardconfigs/tpcdswizard.xml
Connecting to : jdbcGaspracle:thin:@domadw_high
Connected
Starting run
Starting script ../sql/tpcds_droptables.sql
Script completed in 0 hour(s) 0 minute(s) 6 second(s) 594 millisecond(s)
Starting script ../sql/tpcds_createtables.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 740 millisecond(s)
Inserting data into table STORE_SALES_21603029
Inserting data into table STORE_SALES_2
Inserting data into table STORE_SALES_14402020
Inserting data into table STORE_SALES_7201011
Inserting data into table CATALOG_SALES_10811609
Inserting data into table CATALOG_SALES_2
Inserting data into table CATALOG_SALES_3603871
Inserting data into table CATALOG_SALES_7207740
Inserting data into table WEB_SALES_5395379
Inserting data into table WEB_SALES_2
Inserting data into table STORE_RETURNS_179999
Inserting data into table WEB_SALES_3596920
Inserting data into table WEB_SALES_1798461
Inserting data into table INVENTORY_2452180
Inserting data into table INVENTORY_2451725
. . .
. . .
Connection cache closed
Starting script ../sql/tpcds_analyzeschema.sql
Script completed in 0 hour(s) 1 minute(s) 25 second(s) 439 millisecond(s)
Starting script ../sql/tpcds_constraints.sql
Script completed in 0 hour(s) 11 minute(s) 25 second(s) 427 millisecond(s)
Starting script ../sql/tpcds_transactions_pkg.sql
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 675 millisecond(s)

============================================
| Datagenerator Run Stats |
============================================
Connection Time 0:00:00.003
Data Generation Time 0:12:58.906
DDL Creation Time 0:12:58.965
Total Run Time 0:25:57.876
Rows Inserted per sec 135,690
Data Generated (MB) per sec 10.6
Actual Rows Generated 105,530,092
Commits Completed 5,366
Batch Updates Completed 528,519


It may report that the schema was created unsuccessfully because we asked it to not create any indexes on the schema. Just ignore this and carry on.

To be sure it installed correctly you can validate the schema using the following command

./sbutil -tpcds -cf ~/wallet_DOMADW.zip -cs domadw_high -u tpcds -p < Your user password > -val

You should see something similar to

Screenshot of RapidWeaver (06-09-2018, 20-13-48)

It should report that you are missing a single index INVENTORY_NUK. This is expected a a fix in the future will take into consideration that we asked for indexes (except those used for constraints) no to be created.

You can check on the number of rows that have been created with the following command

./sbutil -tpcds -cf ~/wallet_DOMADW.zip -cs domadw_high -u tpcds -p -tables

You should see something similar to this

Screenshot of RapidWeaver (06-09-2018, 20-15-27)
The size may be smaller or larger than this depending on a number of factors but in this case HCC has attempted to compress the data even though it was loaded via DML operations. A higher compression ratio can be achieved by reorganising the data.


Step 7/ Run a workload


In the next step we’ll be using swingbench’s load generator to create a load against ADW. Since we are running on a command line interface with no graphical UI we’ll be using the “charbench” command line utility. But before we begin we’ll disable a couple of the TPC-DS queries because they take a long time to complete. To do this just run the following command.


Now we can run a query workload against the TPCDS schema with a command similar to this


I won’t explain the parameters that I detailed earlier when running the wizard but for the new ones do the following
  • -v indicates what info should be shown in the terminal when running the command. In this instance I’ve asked that the users logged on, The Time remaining in the run, Tx/Min, Tx/Sec and average response time of the queries (in milliseconds)
  • -intermin -intermax indicates the time to sleep between each transaction.
  • -rt indicates how long to run the benchmark before automatically stopping it

You should see output similar to the following

Screenshot of RapidWeaver (06-09-2018, 20-19-41)

NOTE : The value used in the example above for the run time might not be enough to complete all of the queries (approximately 100) in which case either increase the run time or up the number of CPUs being used.

One thing to try whilst running the load against the server is to try and scale the number of available CPUs to the ADW instance up and down. This should see a increase in the number of queries being processed.

Screenshot of Google Chrome (30-08-2018, 13-22-38)

Somethings to note. At the end of each run you’ll end up with a results file in xml format in the directory you ran charbench from. i.e.

$ ls
bmcompare clusteroverview debug.log oewizard results00003.xml results00006.xml results00009.xml sbutil swingbench
ccwizard coordinator jsonwizard results00001.xml results00004.xml results00007.xml results2pdf shwizard tpcdswizard
charbench data minibench results00002.xml results00005.xml results00008.xml results.xml sqlbuilder


These xml files contain the detailed results of each run i.e. average transactions per second, completed transactions, percentile response times etc. Whilst these are difficult to read you can install swingbench on a windows or mac and use a utility called results2pdf to convert them into a more human parseable form. You can find some details on how to do that here.

http://www.dominicgiles.com/blog/files/86668db677bc5c3fc1f0a0231d595ebc-139.html

Using the methods above you should be able to create scripts that test the performance of the ADW server. i.e. running loads with different CPU counts, users, think times etc.

But beware that before comparing the results with on premise servers there are a lot of features enabled on the ATP server like db_block_checking and db_check_sum that may not be enabled on another Oracle instance.

Step 7/ Optional functionality


To make the demo more interactive you could show the charts on the service console. The only issue is that the refresh rate is a little slow. You can improve on this by using some utilities I provide. The first of these is Database Time Monitor (http://www.dominicgiles.com/dbtimeviewer.html).

To install it you first need to download it (
http://www.dominicgiles.com/downloads.html) to your PC or mac and make sure that you’ve installed a Java 8 JRE. Once you’ve done that you simply need to unzip it and change into the bin directory. From there all you need to do is to run a command similar to

$> ./dbtimemonitor -u admin -p < your admin password > -cs domadw_low -cf /Users/dgiles/Downloads/wallet_DOMADW.zip

or on Windows

$> dbtimemonitor.bat -u admin -p < your admin password > -cs domadw_low -cf /Users/dgiles/Downloads/wallet_DOMADW.zip

where the -cf parameter references the credential file you downloaded and the -cs parameter references the service for ATP. You should see a screen similar to this.

Screenshot of ChildMain (10-08-2018, 19-07-22)

You can use this to monitor in real time the activity of the ATP instance.

The final tool, cpumonitor (http://www.dominicgiles.com/cpumonitor.html), allows you to monitor the activity of the application server. This can be downloaded from here (http://www.dominicgiles.com/downloads.html) and again should be installed on your PC or Mac. This is done by simply unzipping the download. Then change into the bin directory on mac or linux or the winbin directory on a windows machine.

You’ll need to edit the XML file to reflect the location of the application server.



Save the file and then launch it with the command

$> ./cpumonitor
Screenshot of Terminal (11-08-2018, 19-04-00)

Conclusion

This walk through should show you how to install swingbench and it’s DSS and data warehousing benchmarks against ADW. I’ll be providing
Comments

Setting up Swingbench for Oracle Autonomous Transaction Processing (ATP)

The following blog details how to install swingbench against ATP and use it to run load tests. The example below used the Simple Order Entry (SOE) benchmark but its possible to follow the methodology laid out below to run any of the other supplied benchmarks (SH, TPC-DS etc). To do this you’ll need to run through the following steps (Step 8 is optional).

Step 1/ Make Sure you have a SSH Public key


You are likely to already have a ssh key but it is possible that you want to create another purely for this exercise. You’ll need this key to create your application server. You can find details on how to do this here

https://git-scm.com/book/en/v2/Git-on-the-Server-Generating-Your-SSH-Public-Key

It’s the .pub file or more precisely its contents that you’ll need. The public key file is typically created in the hidden .ssh directory in your home directory. The public key will look something like this (modified)

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfO/80wleUCYxY7Ws8c67PmqL2qRUfpdPGOduHmy9xT9HkCzjoZHHIk1Zx1VpFtQQM+RwJzArZQHXrMnvefleH20AvtbT9bo2cIIZ8446DX0hHPGaGYaJNn6mCeLi/wXW0+mJmKc2xIdasnH8Q686zmv72IZ9UzD12o+nns2FgCwfleQfyVIacjfi+dy4DB8znpb4KU5rKJi5Zl004pd1uSrRtlDKR9OGILvakyf87CnAP/T8ITSMy0HWpqc8dPHJq74S5jeQn/TxrZ6TGVA+xGLzLHN4fLCOGY20gH7w3rqNTqFuUIWuIf4OFdyZoFBQyh1GWMOaKjplUonBmeZlV

You’ll need this in step 3.

Step 2/ Create the ATP Instance


You’ll have to have gone through the process of acquiring an Oracle Cloud account but that’s beyond the scope of this walkthrough. Once you have the account and have logged into Oracle Cloud Infrastructure, click on the menu button in the top left of the screen and select “Autonomous Transaction Processing”. Then simply follow these steps.

ATP 11-08-18, 9.48.53 am


Step 3/ Create a compute resource for the application server


Whilst the ATP instance is creating we can create our application to run swingbench. For any reasonable load to be run against the application server you’ll need a minimum of two cores for larger workloads you may need a bigger application or potentially a small cluster of them.

In this walkthrough we’ll create a small 2 core Linux Server VM.

Iaas Creation 11-08-18, 9.48.29 am

This should only take a couple of minutes. On completion we’ll need to use the public IP address of the application server we created in the previous step.

Step 4/ Log onto application server and setup the environment


In this step we’ll use ssh to log onto the application server and setup the environment to run swingbench. Ssh is natively available on MacOS and Linux. On platforms like Windows you can use Putty. You’ll need the IP address of the application server you created in the previous step.

First bring up a terminal on Linux/Mac. On Putty launch a new ssh session. The username will be “opc”
ssh opc@< IP Address of Appserver >

You should see something similar to
$> ssh opc@129.146.65.101
ECDSA key fingerprint is SHA256:kNbpKWL3M1wB6PUFy2GOl+JmaTIxLQiggMzn6vl2qK1tM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '129.146.65.101' (ECDSA) to the list of known hosts.
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa': 
[opc@swingbench-client ~]$ 



By default java isn’t installed on this VM so we’ll need to install it via yum. We’ll need to update yum first

sudo yum makecache fast

Then we can install java and its dependencies

sudo yum install java-1.8.0-openjdk-headless.x86_64

We should now make sure that java works correctly

java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

We can now pull the swingbench code from the website

curl http://www.dominicgiles.com/swingbench/swingbenchlatest.zip -o swingbench.zip

and unzip it

unzip swingbench.zip

Step 5/ Download the credentials file


The next step is to get the credentials file for ATP. You can do this by following these steps.

Credentials 11-08-18, 4.09.07 pm

You’ll need to upload this to our application server with a command similar to

$> scp wallet_SBATP.zip opc@< IP Address of Appserver >:

This will placehe home directory of the application server. You don’t need to unzip it to use it.

Step 6/ Install a workload schema into the ATP instance


We can now install a schema to run our transactions against. We do this by first changing in to the swingbench bin directory

$> cd swingbench/bin

And then running the following command replacing your passwords with those that you specified during the creation of the ATP instance.

A quick explanation of the parameters we are using

  • -cf tells oewizard the location of the credentials file
  • -cs is the connecting for the service of the ATP instance. It is based on the name of the instance and is of the form followed by one of the following _low, _medium,_high,_parallel
  • -ts is the name of the table space to install swingbench into. It is currently always “data”
  • -dba is the admin user, currently this is always admin
  • -dbap is the password you specified at the creation of the ATP instance
  • -u is the name you want to give to the user you are installing swingbench into (I’d recommend soe)
  • -p is the password for the user. It needs to follow the password complexity rules of ATP
  • -async_off you need to disable the wizards default behavior of using async commits. This is currently prohibited on ATP
  • -scale indicates the size of the schema you want to create where a scale of 1 will generate 1GB of data. The indexes will take an additional amount of space roughly half the size of the data. A scale of 10 will generate a 10GB of data and roughly of 5GB of indexes
  • -hashpart tells the wizard to use hash partitioning
  • -create tells swingbench to create the schema (-drop will delete the schema)
  • -cl tells swingbech to run in character mode
  • -v tells swingbench to output whats going on (verbose mode)

You should see the following output. A scale of 1 should take just over 5 mins to create. If you specified more CPUs for the application server of ATP instance you should see some improvements in performance, but this is unlikely to truly linear because of the nature of the code.
SwingBench Wizard
Author  :	 Dominic Giles
Version :	 2.6.0.1082

Running in Lights Out Mode using config file : ../wizardconfigs/oewizard.xml
Connecting to : jdbc : oracle : thin : @sbatp_medium                             
Connected                                                                  
Starting run                                                               
Starting script ../sql/soedgdrop2.sql                                      
Script completed in 0 hour(s) 0 minute(s) 2 second(s) 691 millisecond(s)   
Starting script ../sql/soedgcreatetableshash2.sql                          
Script completed in 0 hour(s) 0 minute(s) 1 second(s) 433 millisecond(s)   
Starting script ../sql/soedgviews.sql                                      
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 31 millisecond(s)    
Starting script ../sql/soedgsqlset.sql                                     
Script completed in 0 hour(s) 0 minute(s) 0 second(s) 196 millisecond(s)   
Inserting data into table ADDRESSES_1124999                                
Inserting data into table ADDRESSES_2                                      
Inserting data into table ADDRESSES_375001                                 
Inserting data into table ADDRESSES_750000                                 
Inserting data into table CUSTOMERS_749999                                 
Inserting data into table CUSTOMERS_250001                                 
Inserting data into table CUSTOMERS_500000                                 
Inserting data into table CUSTOMERS_2                                      
Run time 0:00:19 : Running threads (8/8) : Percentage completed : 5.36

You can then validate the schema created correctly using the following command

$> ./sbutil -soe -cf ~/wallet_SBATP.zip -cs sbatp_medium -u soe -p < a password for the soe user > -val
The Order Entry Schema appears to be valid.
--------------------------------------------------
|Object Type    |     Valid|   Invalid|   Missing|
--------------------------------------------------
|Table          |        10|         0|         0|
|Index          |        26|         0|         0|
|Sequence       |         5|         0|         0|
|View           |         2|         0|         0|
|Code           |         1|         0|         0|
--------------------------------------------------

You may have noticed that the stats failed to collect in the creation of the schema (known problem) so you’ll need to collect stats using the following command

$>./sbutil -soe -cf ~/wallet_SBATP.zip -cs sbatp_medium -u soe -p -stats

And see the row counts for the tables with

$ ./sbutil -soe -cf ~/wallet_SBATP.zip -cs sbatp_medium -u soe -p < your soe password > -tables
Order Entry Schemas Tables
----------------------------------------------------------------------------------------------------------------------
|Table Name                  |                Rows|              Blocks|           Size|   Compressed?|  Partitioned?|
----------------------------------------------------------------------------------------------------------------------
|ORDER_ITEMS                 |           4,271,594|              64,448|        512.0MB|              |           Yes|
|ADDRESSES                   |           1,500,000|              32,192|        256.0MB|              |           Yes|
|LOGON                       |           2,382,984|              32,192|        256.0MB|              |           Yes|
|CARD_DETAILS                |           1,500,000|              32,192|        256.0MB|              |           Yes|
|ORDERS                      |           1,429,790|              32,192|        256.0MB|              |           Yes|
|CUSTOMERS                   |           1,000,000|              32,192|        256.0MB|              |           Yes|
|INVENTORIES                 |             897,672|               2,386|         19.0MB|      Disabled|            No|
|PRODUCT_DESCRIPTIONS        |               1,000|                  35|          320KB|      Disabled|            No|
|PRODUCT_INFORMATION         |               1,000|                  28|          256KB|      Disabled|            No|
|ORDERENTRY_METADATA         |                   4|                   5|           64KB|      Disabled|            No|
|WAREHOUSES                  |               1,000|                   5|           64KB|      Disabled|            No|
----------------------------------------------------------------------------------------------------------------------
                                                            Total Space           1.8GB

Step 7/ Run a workload


The first thing we need to do is to configure the load generator to load the users on in a sensible fashion (i.e. to not exceed the login rate). You could do this manually by editing the config file or use the following command.

We can now run a workload against the newly created schema using a command similar to

I
won’t explain the parameters that I detailed earlier when running the wizard but for the new ones do the following

• -v indicates what info should be shown in the terminal when running the command. In this instance I’ve asked that the users logged on, Tx/Min, Tx/Sec and the average response time for each transaction are shown.
• -min and -max indicate the time to sleep between each DML operation in a transaction (intra sleep). A Transaction is made up of many DML operations
• -intermin -intermax indicates the time to sleep between each transaction.
• -di indicates that I want to disable the following transactions SQ,WQ,WA. These are reporting queries and aren’t really needed.
• -rt indicates how long to run the benchmark before stopping it

You should see output similar to the following
$> ./charbench -c ../configs/SOE_Server_Side_V2.xml -cf ~/wallet_SBATP.zip -cs sbatp_low -u soe -p < your soe password > -v users,tpm,tps,vresp -intermin 0 -intermax 0 -min 0 -max 0 -uc 128 -di SQ,WQ,WA -rt 0:0.30
Author  :	 Dominic Giles
Version :	 2.6.0.1082

Results will be written to results.xml.
Hit Return to Terminate Run...

Time		Users	TPM	TPS	NCR	UCD	BP	OP	PO	BO	SQ	WQ	WA
17:29:53	[0/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:54	[0/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:55	[0/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:56	[40/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:57	[45/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:58	[51/128]	0	0	0	0	0	0	0	0	0	0	0
17:29:59	[60/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:00	[69/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:01	[78/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:02	[84/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:03	[95/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:04	[101/128]	0	0	0	0	0	0	0	0	0	0	0
17:30:05	[104/128]	0	0	419	395	547	554	0	570	0	0	0
17:30:06	[108/128]	118	118	653	110	379	1576	375	647	0	0	0
17:30:07	[116/128]	325	207	355	220	409	406	499	450	0	0	0
17:30:08	[128/128]	547	222	423	100	203	504	403	203	0	0	0
17:30:09	[128/128]	831	284	420	306	303	396	501	505	0	0	0
17:30:10	[128/128]	1133	302	344	234	232	884	603	217	0	0	0
17:30:11	[128/128]	1438	305	564	367	355	375	559	376	0	0	0
17:30:12	[128/128]	1743	305	443	150	323	319	233	143	0	0	0
17:30:13	[128/128]	2072	329	1712	179	108	183	325	179	0	0	0
17:30:14	[128/128]	2444	372	1036	102	147	204	194	134	0	0	0
17:30:15	[128/128]	2807	363	1584	85	182	234	179	169	0	0	0
17:30:16	[128/128]	3241	434	741	159	157	250	256	251	0	0	0
17:30:17	[128/128]	3653	412	517	91	178	181	176	137	0	0	0

We specified a runtime of 30 seconds (-rt 0:0.30) which meant the workload ran for a short period of time. You could increase this by changing the the -rt parameter to something larger like

-rt 1:30

Which would run the benchmark for 1 hour 30mins. or you could leave the -rt command off altogether and the benchmark would run until you hit return.

One thing to try whilst running the load against the server is to try and scale the number of available CPUs to the ATP instance up and down. This should see an increase in the number of transactions being processed.

Screenshot of Google Chrome (10-08-2018, 18-45-10)

Somethings to note. At the end of each run you’ll end up with a results file in xml format in the directory you ran charbench from. i.e.
$ ls 
bmcompare  clusteroverview  debug.log   oewizard          results00003.xml  results00006.xml  results00009.xml  sbutil      swingbench
ccwizard   coordinator      jsonwizard  results00001.xml  results00004.xml  results00007.xml  results2pdf       shwizard    tpcdswizard
charbench  data             minibench   results00002.xml  results00005.xml  results00008.xml  results.xml       sqlbuilder


These xml files contain the detailed results of each run i.e. average transactions per second, completed transactions, percentile response times etc. Whilst these are difficult to read you can install swingbench on a windows or mac and use a utility called results2pdf to convert them into a more human parseable form. You can find some details on how to do that here.

http://www.dominicgiles.com/blog/files/86668db677bc5c3fc1f0a0231d595ebc-139.html

Using the methods above you should be able to create scripts that test the performance of the ATP server. i.e. running loads with different CPU counts, users, think times etc.

But beware that before comparing the results with on premise servers there are a lot of features enabled on the ATP server like db_block_checking and db_check_sum that may not be enabled on another Oracle instance.

Step 8/ Optional functionality


To make the demo more interactive you could show the charts on the service console. The only issue is that the refresh rate is a little slow. You can improve on this by using some utilities I provide. The first of these is Database Time Monitor (http://www.dominicgiles.com/dbtimeviewer.html).

To install it you first need to download it (http://www.dominicgiles.com/downloads.html) to your PC or mac and make sure that you’ve installed a Java 8 JRE. Once you’ve done that you simply need to unzip it and change into the bin directory. From there all you need to do is to run a command similar to

$> ./dbtimemonitor -u admin -p -cs sbatp_low -cf /Users/dgiles/Downloads/wallet_SBATP.zip

or on Windows

$>
dbtimemonitor.bat -u admin -p -cs sbatp_low -cf /Users/dgiles/Downloads/wallet_SBATP.zip

The -cf parameter references the credential file you downloaded and the -cs parameter references the service for ATP. You should see a scree similar to this.

Screenshot of ChildMain (10-08-2018, 19-07-22)

You can use this to monitor in real time the activity of the ATP instance. Currently it reports all of the cores on the server you are running on. This will be fixed shortly to just show the cores available to you.

The final tool, cpumonitor (http://www.dominicgiles.com/cpumonitor.html), allows you to monitor the activity of the application server. This can be downloaded from here (http://www.dominicgiles.com/downloads.html) and again should be installed on your PC or Mac. This is done by simply unzipping the download. Then change into the bin directory on mac or linux or the winbin directory on a windows machine.

You’ll need to edit the XML file to reflect the location of the application server.


Save the file and then launch it with the command

$> ./cpumonitor
Screenshot of Terminal (11-08-2018, 19-04-00)

Conclusion


That's it. We’ve installed swingbench against Oracle Autonomous Transaction Processing. In my next blog entry we'll take a look at some automated tests you can user to kick the tyres on ATP. I'll also include a version that shows how to install the TPC-DS like schema that comes with swingbench against Oracle Autonomous Warehouse.


Comments

Creating, Querying and Loading Data in to the Oracle Autonomous Data Warehouse

ADW

Oracle Autonomous Data Warehouse (ADW) access via Python

The following shows how to access the Oracle Autonomous Data Warehouse using Python and how to load the data using the DBMS_CLOUD package via the cx_Oracle module. This is obviously simpler via the Graphical front end or SQL Developer but using Python provdes a simple scriptable model whilst hiding some of the complexities of using native REST APIs.

This simple example assumes that you've got an Oracle Cloud account and that you've created or got access an ADW database. You'll also have to download the credentials file to provide SQL*Net access. You can find the details on how to do that here. We'll be using Python in this short example but most of what we're doing could be achieved using the GUI and/or REST Calls.

Connecting to ADW Instance

To start with we'll make sure we can connect to the ADW Instance we've previously created. To do that we need to import the required libraries. If you dodn't have these I reccommend using PIP (and virtualenv)

In [ ]:
import cx_Oracle
import keyring
import os

We need to use an environment variable to reflect the location of the downloaded credentials files to be used by SQL*Net.

In [ ]:
os.environ['TNS_ADMIN'] = '/Users/dgiles/Downloads/wallet_DOMSDB'

This is equivlent to bash export TNS_ADMIN=/Users/dgiles/Downloads/wallet_DOMSDB and points to the unzipped directory containing the tnsnames.ora, sqlnet.ora etc. NOTE: you'll need to update the sqlnet.ora to ensure the wallet points to the same directory specified in the TNS_ADMIN environment variable. i.e.

WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/Users/dgiles/Downloads/wallet_DOMSDB")))
SSL_SERVER_DN_MATCH=yes

In the example above I've changed DIRECTORY to the location where I downloaded and unzipped the credentials file.

The next steps are to connect to the Oracle ADW instance. In the example below I've store my password using the Python Module "keyring". I'm also using the jupyter notebook magic sql functionality. We'll test the connection using the admin user and the connect string "domsdb_medium" which is one of the services included in the tnsnames.ora file.

In [ ]:
%load_ext sql
In [257]:
password = keyring.get_password('adw','admin')
%sql oracle+cx_oracle://admin:$password@domsdb_medium
Out[257]:
'Connected: admin@None'
In [258]:
%%sql admin@domsdb_medium
select 1 from dual
0 rows affected.
Out[258]:
1
1

Generating test data

We've connected to the oracle database and can now start uploading data to the instance. In this example we'll use datagenerator to generate the data into flat files and then place these on Oracle Object Store and load them from there.

The first step is to install datagenerator. You can find details on how to do that here. We can now simply generate data for the "SH" benchmark.

In [259]:
import subprocess

# Change the 2 following parameters to reflect your environment
generated_data_dir = '/Users/dgiles/Downloads/generated_data'
datagenerator_home = '/Users/dgiles/datagenerator'

# Change the following paramters relating to the way datagenerator will create the data
scale = 100
parallel = 8

dg_command = '{dg}/bin/datagenerator -c {dg}/bin/sh.xml -scale {s} -cl -f -d {gdd} -tc {p}'.format(
    dg = datagenerator_home,
    s = scale,
    gdd = generated_data_dir,
    p = parallel
)
# Typically we'd use a command similiar to the one below but since we're in a notebook it's easier to use the default functionality
# p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# (output, err) = p.communicate()

print(dg_command)
!{dg_command}
/Users/dgiles/datagenerator/bin/datagenerator -c /Users/dgiles/datagenerator/bin/sh.xml -scale 100 -cl -f -d /Users/dgiles/Downloads/generated_data -tc 8
Started Datagenerator, Version 0.4.0.1083

============================================
|           Datagenerator Run Stats        |
============================================
Connection Time                        0:00:00.000
Data Generation Time                   0:00:26.274
DDL Creation Time                      0:00:00.000
Total Run Time                         0:00:26.284
Rows Inserted per sec                        5,367
Data Generated (MB) per sec                    0.5
Actual Rows Generated                      137,000
Commits Completed                                0
Batch Updates Completed                          0

We should now have a series of files in the "generated_data_dir" directory. These will be a mix of csv files, create table scripts, loader scripts etc.

In [255]:
!ls {generated_data_dir}
CHANNELS.csv                   SALES.csv
CHANNELS.ctl                   SALES.ctl
COUNTRIES.csv                  SUPPLEMENTARY_DEMOGRAPHICS.csv
COUNTRIES.ctl                  SUPPLEMENTARY_DEMOGRAPHICS.ctl
CUSTOMERS.csv                  constraints.sql
CUSTOMERS.ctl                  createindexes.sql
PRODUCTS.csv                   createsequences.sql
PRODUCTS.ctl                   createtables.sql
PROMOTIONS.csv                 droptables.sql
PROMOTIONS.ctl

Uploading the data to the Oracle Object Store

We're really only interested in the "csv" files so we'll upload just those. But before we do this we'll need to establish a connection to the Oracle Object Store. I give some detail behind how to do this in this notebook. I'll be using the object storage out of the Frankfurt Region.

In [256]:
import oci
import ast

my_config = ast.literal_eval(keyring.get_password('oci_opj','doms'))
my_config['region'] = 'eu-frankfurt-1'
object_storage_client = oci.object_storage.ObjectStorageClient(my_config)
namespace = object_storage_client.get_namespace().data

We've now got a handle to the Oracle Object Store Client so we can now create a bucket which we'll call and upload the "CSV" Files.

In [ ]:
import os, io

bucket_name = 'Sales_Data'

files_to_process = [file for file in os.listdir(generated_data_dir) if file.endswith('csv')]

try:
    create_bucket_response = object_storage_client.create_bucket(
        namespace,
        oci.object_storage.models.CreateBucketDetails(
            name=bucket_name,
            compartment_id=my_config['tenancy']
        )
    )
except Exception as e:
    print(e.message)


for upload_file in files_to_process:
    print('Uploading file {}'.format(upload_file))
    object_storage_client.put_object(namespace, bucket_name, upload_file, io.open(os.path.join(generated_data_dir,upload_file),'r'))    

We need to create an authentication token that can be used by the ADW instance to access our Object storage. To do this we need to create an identity client.

In [ ]:
indentity_client = oci.identity.IdentityClient(my_config)
In [ ]:
token = indentity_client.create_auth_token(
    oci.identity.models.CreateAuthTokenDetails(
        description = "Token used to provide access to newly loaded files"
    ),
    user_id = my_config['user']
)

Creating users and tables in the ADW Instance

The following steps will feel very familiar to any DBA/developer of and Oracle database. We need to create a schema and assocated tables to load the data into.

First we'll need to create a user/schema and grant it the appropriate roles

In [ ]:
%sql create user mysh identified by ReallyLongPassw0rd default tablespace data

Grant the "mysh" user the DWROLE

In [ ]:
%sql grant DWROLE to mysh
In [ ]:
%sql oracle+cx_oracle://mysh:ReallyLongPassw0rd@domsdb_medium

We can now create the tables we'll use to load the data into.

In [ ]:
%%sql mysh@domsdb_medium
CREATE TABLE COUNTRIES (
  COUNTRY_ID           NUMBER       NOT NULL,
  COUNTRY_ISO_CODE     CHAR(2)      NOT NULL,
  COUNTRY_NAME         VARCHAR2(40) NOT NULL,
  COUNTRY_SUBREGION    VARCHAR2(30) NOT NULL,
  COUNTRY_SUBREGION_ID NUMBER       NOT NULL,
  COUNTRY_REGION       VARCHAR2(20) NOT NULL,
  COUNTRY_REGION_ID    NUMBER       NOT NULL,
  COUNTRY_TOTAL        NUMBER(9)    NOT NULL,
  COUNTRY_TOTAL_ID     NUMBER       NOT NULL,
  COUNTRY_NAME_HIST    VARCHAR2(40)
)
In [ ]:
%%sql mysh@domsdb_medium
CREATE TABLE SALES (
  PROD_ID       NUMBER     NOT NULL,
  CUST_ID       NUMBER     NOT NULL,
  TIME_ID       DATE       NOT NULL,
  CHANNEL_ID    NUMBER     NOT NULL,
  PROMO_ID      NUMBER     NOT NULL,
  QUANTITY_SOLD NUMBER(10) NOT NULL,
  AMOUNT_SOLD   NUMBER(10) NOT NULL
)
In [ ]:
%%sql mysh@domsdb_medium
CREATE TABLE SUPPLEMENTARY_DEMOGRAPHICS (
  CUST_ID                 NUMBER NOT NULL,
  EDUCATION               VARCHAR2(21),
  OCCUPATION              VARCHAR2(21),
  HOUSEHOLD_SIZE          VARCHAR2(21),
  YRS_RESIDENCE           NUMBER,
  AFFINITY_CARD           NUMBER(10),
  BULK_PACK_DISKETTES     NUMBER(10),
  FLAT_PANEL_MONITOR      NUMBER(10),
  HOME_THEATER_PACKAGE    NUMBER(10),
  BOOKKEEPING_APPLICATION NUMBER(10),
  PRINTER_SUPPLIES        NUMBER(10),
  Y_BOX_GAMES             NUMBER(10),
  OS_DOC_SET_KANJI        NUMBER(10),
  COMMENTS                VARCHAR2(4000)
)
In [ ]:
%%sql mysh@domsdb_medium
CREATE TABLE CUSTOMERS (
  CUST_ID                NUMBER       NOT NULL,
  CUST_FIRST_NAME        VARCHAR2(20) NOT NULL,
  CUST_LAST_NAME         VARCHAR2(40) NOT NULL,
  CUST_GENDER            CHAR(1)      NOT NULL,
  CUST_YEAR_OF_BIRTH     NUMBER(4)    NOT NULL,
  CUST_MARITAL_STATUS    VARCHAR2(20),
  CUST_STREET_ADDRESS    VARCHAR2(40) NOT NULL,
  CUST_POSTAL_CODE       VARCHAR2(10) NOT NULL,
  CUST_CITY              VARCHAR2(30) NOT NULL,
  CUST_CITY_ID           NUMBER       NOT NULL,
  CUST_STATE_PROVINCE    VARCHAR2(40) NOT NULL,
  CUST_STATE_PROVINCE_ID NUMBER       NOT NULL,
  COUNTRY_ID             NUMBER       NOT NULL,
  CUST_MAIN_PHONE_NUMBER VARCHAR2(25) NOT NULL,
  CUST_INCOME_LEVEL      VARCHAR2(30),
  CUST_CREDIT_LIMIT      NUMBER,
  CUST_EMAIL             VARCHAR2(40),
  CUST_TOTAL             VARCHAR2(14) NOT NULL,
  CUST_TOTAL_ID          NUMBER       NOT NULL,
  CUST_SRC_ID            NUMBER,
  CUST_EFF_FROM          DATE,
  CUST_EFF_TO            DATE,
  CUST_VALID             VARCHAR2(1)
)
In [ ]:
%%sql mysh@domsdb_medium
CREATE TABLE CHANNELS (
  CHANNEL_ID       NUMBER       NOT NULL,
  CHANNEL_DESC     VARCHAR2(20) NOT NULL,
  CHANNEL_CLASS    VARCHAR2(20) NOT NULL,
  CHANNEL_CLASS_ID NUMBER       NOT NULL,
  CHANNEL_TOTAL    VARCHAR2(13) NOT NULL,
  CHANNEL_TOTAL_ID NUMBER       NOT NULL
)
In [ ]:
%%sql mysh@domsdb_medium
CREATE TABLE PRODUCTS (
  PROD_ID               NUMBER(6)      NOT NULL,
  PROD_NAME             VARCHAR2(50)   NOT NULL,
  PROD_DESC             VARCHAR2(4000) NOT NULL,
  PROD_SUBCATEGORY      VARCHAR2(50)   NOT NULL,
  PROD_SUBCATEGORY_ID   NUMBER         NOT NULL,
  PROD_SUBCATEGORY_DESC VARCHAR2(2000) NOT NULL,
  PROD_CATEGORY         VARCHAR2(50)   NOT NULL,
  PROD_CATEGORY_ID      NUMBER         NOT NULL,
  PROD_CATEGORY_DESC    VARCHAR2(2000) NOT NULL,
  PROD_WEIGHT_CLASS     NUMBER(3)      NOT NULL,
  PROD_UNIT_OF_MEASURE  VARCHAR2(20),
  PROD_PACK_SIZE        VARCHAR2(30)   NOT NULL,
  SUPPLIER_ID           NUMBER(6)      NOT NULL,
  PROD_STATUS           VARCHAR2(20)   NOT NULL,
  PROD_LIST_PRICE       NUMBER(8)      NOT NULL,
  PROD_MIN_PRICE        NUMBER(8)      NOT NULL,
  PROD_TOTAL            VARCHAR2(13)   NOT NULL,
  PROD_TOTAL_ID         NUMBER         NOT NULL,
  PROD_SRC_ID           NUMBER,
  PROD_EFF_FROM         DATE,
  PROD_EFF_TO           DATE,
  PROD_VALID            VARCHAR2(1)
)
In [ ]:
%%sql mysh@domsdb_medium
CREATE TABLE PROMOTIONS (
  PROMO_ID             NUMBER(6)    NOT NULL,
  PROMO_NAME           VARCHAR2(30) NOT NULL,
  PROMO_SUBCATEGORY    VARCHAR2(30) NOT NULL,
  PROMO_SUBCATEGORY_ID NUMBER       NOT NULL,
  PROMO_CATEGORY       VARCHAR2(30) NOT NULL,
  PROMO_CATEGORY_ID    NUMBER       NOT NULL,
  PROMO_COST           NUMBER(10)   NOT NULL,
  PROMO_BEGIN_DATE     DATE         NOT NULL,
  PROMO_END_DATE       DATE         NOT NULL,
  PROMO_TOTAL          VARCHAR2(15) NOT NULL,
  PROMO_TOTAL_ID       NUMBER       NOT NULL
)
In [245]:
%%sql mysh@domsdb_medium
CREATE TABLE times AS
SELECT udate time_id,
  TO_CHAR(udate,'Day'Winking day_name,
  TO_CHAR(udate,'DD'Winking day_number_in_month,
  TO_CHAR(udate,'DDD'Winking day_number_in_year,
  TO_CHAR(udate,'YYYY' ) calendar_year,
  TO_CHAR(udate,'Q' ) calendar_quarter_number,
  TO_CHAR(udate,'MM' ) calendar_month_number,
  TO_CHAR(udate,'WW' ) calendar_week_number,
  TO_CHAR(udate,'YYYY-MM'Winking calendar_month_desc,
  TO_CHAR(udate,'YYYY-Q'Winking calendar_quarter_desc
FROM
  (SELECT to_date('31/12/1994','DD/MM/YYYY'Winking+rownum udate
  FROM all_objects
  WHERE to_date('31/12/1994','DD/MM/YYYY'Winking+rownum <= to_date( '31/12/2014','DD/MM/YYYY'Winking
  )
7305 rows affected.
Out[245]:
[]
In [248]:
%%sql mysh@domsdb_medium
select * from tab
0 rows affected.
Out[248]:
tname tabtype clusterid
COUNTRIES TABLE None
SALES TABLE None
SUPPLEMENTARY_DEMOGRAPHICS TABLE None
CUSTOMERS TABLE None
CHANNELS TABLE None
PRODUCTS TABLE None
PROMOTIONS TABLE None
TIMES TABLE None

Copying the data from the object store

We need to add the authorisation token to the newly created schema to allow it to access the object stores files. We can't do this using the sql magic syntax we've been using till this point so we'll do it using standard cx_Oracle calls.

In [ ]:
connection = cx_Oracle.connect('mysh', 'ReallyLongPassw0rd', 'domsdb_medium')
cursor = connection.cursor();
In [ ]:
cursor.callproc('DBMS_CLOUD.create_credential', keywordParameters = {'credential_name':'SALES_DATA_AUTH',
                                                                    'username':'dominic.giles@oracle.com',
                                                                    'password':token.data.token})

We can access the object storage using a url of the the format

https://swiftobjectstorage.<region>.oraclecloud.com/v1/<tenancy>/<bucket name>/<object name>

We can use this to dynamically generate a url for each of the objects inside of the bucket we've just created and use the DBMS_CLOUD package to copy the data into the ADW instance. The code below gets all of the names of the tables we've just created and the loops through each table copying the associated csv file into the ADW instance.

In [ ]:
from tqdm import tqdm

format = '''{"delimiter" : ",",
          "skipheaders" : 1,
          "ignoremissingcolumns" : "true",
          "removequotes" : "true",
          "dateformat" : "DD-MON-YYYY HH24:MI:SS",
          "blankasnull" : "true"}'''

file_location = '''https://swiftobjectstorage.{region}.oraclecloud.com/v1/{tenancy}/{bucket_name}/{table_name}.csv'''
region = my_config['region']
tenancy= 'oracleonpremjava'


rs = cursor.execute("select table_name from user_tables where table_name not like 'COPY%'")
rows = rs.fetchall()
for row in tqdm(rows):
    url = file_location.format(region=region, tenancy=tenancy, bucket_name=bucket_name, table_name=row[0])
    cursor.callproc('DBMS_CLOUD.copy_data', keywordParameters= {'table_name':row[0],
                                                           'credential_name':'SALES_DATA_AUTH',
                                                           'file_uri_list':url, 
                                                           'format': format
                                                           })
                               

We can now take a look and see how many rows we've loaded into the tables

In [261]:
rs = cursor.execute("select table_name from user_tables where table_name not like 'COPY%'")
rows = rs.fetchall()

for row in rows:
    rs2 = cursor.execute("select count(*) from {}".format(row[0]))
    rows2 = rs2.fetchone()
    print('{tn: <35}{rc:>10,}'.format(tn=row[0],rc=rows2[0]))
COUNTRIES                                  22
SALES                                  97,799
SUPPLEMENTARY_DEMOGRAPHICS             19,599
CUSTOMERS                              19,599
CHANNELS                                    4
PRODUCTS                                   71
PROMOTIONS                                502
TIMES                                   7,305

It's now possible to run standard queries against the newly loaded data. No need to create anything else (indexes etc.)

In [249]:
%%sql mysh@domsdb_medium
SELECT channels.channel_desc, 
       countries.country_iso_code,
       TO_CHAR(SUM(amount_sold), '9,999,999,999'Winking SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id 
  AND sales.cust_id=customers.cust_id 
  AND customers.country_id = countries.country_id
  AND sales.channel_id = channels.channel_id 
  AND channels.channel_desc IN ('Tele Sales','Internet'Winking 
  AND times.calendar_year = '2006' 
  AND countries.country_iso_code IN ('GB','DE','FR','DK'Winking
GROUP BY 
  ROLLUP(channels.channel_desc,countries.country_iso_code)
ORDER BY 1
0 rows affected.
Out[249]:
channel_desc country_iso_code sales$
Internet DE          3,130
Internet DK          1,676
Internet FR          2,545
Internet GB          2,320
Internet None          9,671
Tele Sales DE          3,077
Tele Sales DK          3,116
Tele Sales FR          3,184
Tele Sales GB          2,386
Tele Sales None         11,763
None None         21,434

Tidying up the object store and database

You can run the following steps if you want to remove all of the tables from the schema and purge the object store of files. Lets start by removing the tables. NOTE : The code below will remove all of the tables from the schema. Make sure you've not got anything in the schema that you want to keep before running it.

In [ ]:
from tqdm import tqdm

rs = cursor.execute("select table_name from user_tables")
rows = rs.fetchall()

for row in tqdm(rows):
    rs2 = cursor.execute("drop table {} purge".format(row[0]))

And then removing the object store files and bucket

In [ ]:
object_list = object_storage_client.list_objects(namespace, bucket_name)

for o in object_list.data.objects:
    print('Deleting object {}'.format(o.name))
    object_storage_client.delete_object(namespace, bucket_name, o.name)

print('Deleting bucket')    
response = object_storage_client.delete_bucket(namespace, bucket_name)
Comments

Making the alert log just a little more readable

One of the most valuable sources of information about what the Oracle database has done and is currently doing is the alert log. It's something that every Oracle Database professional should be familiar with. So what can you do to improve you chances of not missing important pieces of info? The obvious answer is that you should use a tool like Enterprise Manager. This is particularly true if you are looking after hundreds of databases.

But what if you are only looking after one or two or just testing something out? Well the most common solution is to simply tail the alert log file.

The only issue is that it's not the most exciting thing to view, this of course could be said for any terminal based text file. But there are things you can do to make it easier to parse visually and improve your chances of catching an unexpected issue.

The approach I take is to push the alert log file through python and use the various libraries to brighten it up. It's very easy to go from this (tail -f)

Screenshot of ScreenFloat (20-03-2018, 08-25-26)

To this

Screenshot of ScreenFloat (20-03-2018, 08-25-58)

The reason this works is that python provides a rich set of libraries which can add a little bit of colour and formatting to the alert file.

You can find the code to achieve this in the gist below



Just a quick note on installing this. You'll need either python 2.7 or 3 available on your server.

I'd also recommend installing pip and then the following libraries

pip install humanize psutil colorama python-dateutil

After you've done that it's just a case of running the script. If you have $ORACLE_BASE and $ORACLE_SID set the library will try and make a guess at the location of the alert file. i.e

python alertlogparser.py

But if that doesn't work or you get an error you can also explicitly specify the location of the alert log with something like


python alertlogparser.py -a $ORACLE_BASE/diag/rdbms/orcl/orcl/trace/alert_orcl.log


This script isn't supposed to be an end product just a simple example of what can be achieved to make things a little easier. And whilst I print information like CPU load and Memory there's nothing to stop you from modifying the script to display the number of warnings or errors found in the alert log and update it things change. Or if you really want to go wild implement something similar but a lot more sophisticated using python and curses

The age of "Terminal" is far from over….
Comments

Changing the size of redo logs in python

I create a lot of small databases to do testing on. The trouble is that I often need to change the size of redo log files when I'm testing large transaction workloads or loading a lot of data. Now there are lots of better ways to do whats shown in the code below but this approach gave me the chance to keep brushing up my python skills and use the might cx_oracle driver. The following should never be considered anything but a nasty hack but it does save me a little bit of time i.e. don't use this on anything but a test database… Clearly the sensible way to do this is to write my own scripts to build databases.

The following code works it's way through the redo log files drops one thats inactive and then simply recreates it. It finished when it's set all of the redo to the right size.

Running the script is simply a case of running it with the parameters shown below

python ChangeRedoSize -u sys -p welcome1 -cs myserver/orclcdb --size 300

Note : the user is the sysdba of the container database if you are using the multitenant arhcitecture and the size is in Mega Bytes.

You should then see something similar to the following



Current Redo Log configuration
+-----------+------------+--------------+-----------+---------------+----------+
| Group No. | Thread No. | Sequence No. | Size (MB) | No of Members |  Status  |
+-----------+------------+--------------+-----------+---------------+----------+
|     1     |     1      |     446      | 524288000 |       1       | INACTIVE |
|     2     |     1      |     448      | 524288000 |       1       | CURRENT  |
|     3     |     1      |     447      | 524288000 |       1       |  ACTIVE  |
+-----------+------------+--------------+-----------+---------------+----------+
alter system switch logfile
alter system switch logfile
alter database drop logfile group 2
alter database add logfile group 2 size 314572800
alter system switch logfile
alter database drop logfile group 1
alter database add logfile group 1 size 314572800
alter system switch logfile
alter system switch logfile
alter system switch logfile
alter system switch logfile
alter database drop logfile group 3
alter database add logfile group 3 size 314572800
alter system switch logfile
All logs correctly sized. Finishing...
New Redo Log configuration
+-----------+------------+--------------+-----------+---------------+----------+
| Group No. | Thread No. | Sequence No. | Size (MB) | No of Members |  Status  |
+-----------+------------+--------------+-----------+---------------+----------+
|     1     |     1      |     455      | 314572800 |       1       |  ACTIVE  |
|     2     |     1      |     454      | 314572800 |       1       | INACTIVE |
|     3     |     1      |     456      | 314572800 |       1       | CURRENT  |
+-----------+------------+--------------+-----------+---------------+----------+
Comments

Notes on pre-parsing data for Oracle data loads

Sometimes data simply isn't in a form that is easy to load into an Oracle database i.e. column form. It would be great if everybody exchanged data in a simple CSV form with a single file to table mapping. Sadly that isn't the case and sometimes you have to do a little work to get it into a form thats useable. A recent benchmark highlighted this issue very well. The customer provided the data in compressed CSV form (so far so good) but the data was held in key value pairs (not so good). They also provided us with a mapping file that describes how it all fits together. 

Now typically the approach many people would take would be to develop some form of program that parses all of the data and writes it to staging  area and then loads all of it in one go to the target database. I make no criticism of this approach since it works well and as long as its not time critical. It's by far the simplest method. However Im a big fan of taking advantage of whats already available and one of the most underused and powerful features of the Oracle database is  the preparser. It enables you to pipeline various operations so they all run as quickly as possible. So going back to my benchmark we used this approach to load data into out target database. It consisted of 4 steps
  • Read the data of the filesystem as efficiently as possible and write it to stdout
  • Read from stdin and Unzip the the data writing it to stdout
  • Read from stdin into a java program to do the key value mapping and error detection/correction writing the output to stdout
  • Read from stdin into sqlloader
I will at this time point out I'm not really using Oracle's pre-parser I'm just using good old "Pipes" but why this is important will become clearer later This approach gave us a great deal of flexibility and simplified the code we had to write. It operates in some respects as a serialised map reduce flow but I'll come back to that another day and explain how it can be integrated directly into a massively parallel approach. It's also possible to get Java to natively read the zipped file as well having said that I offloaded that process to the os to enable me to use different compression formats when needed.

The java program simple reads from stdin and writes to stdout. To handle key value pairs just required the program to read the mapping file in and split and parse the values from stdin. The data was then written to stdout in a well know order.

Java extract from my program... 


            BufferedReader br = new BufferedReader(new InputStreamReader(System.in), OneMB);
            String line = null;
            HashMap keyValuePairs = null;
            MyTokenizer mt = null;
            while ((line = br.readLine()) != null) {
                keyValuePairs = new HashMap(200);
                mt = new MyTokenizer(line, delimitor);
                for (String token : mt) {
                    int loc = token.indexOf("=");
                    if (loc != -1) {
                        String i = token.substring(0, loc);
                        String s = token.substring(loc + 1, token.length());
                        keyValuePairs.put(i, s);
                    }
                StringBuffer outRec = new StringBuffer(1000);
                outRec.append(checkForNull(keyValuePairs.get("uniqueID"), "")).append(seperator);
                // mapping logic similar to above repeats
                System.out.println(outRec.toString());
            }

All that was needed for sqlloader to process the files was a control file that understood the order of the columns and any additional formatting. 

One of the additional benefits is that we can load the data via "direct path" and implement other features such as multi table insert. The Java preparser enables you to add all of the additional formatting to make this a trivial process.

The following diagram illustrates the process.

This equates into a Unix/Linux statement such as

/bin/dd if=myverybigfile.txt bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn | sqlloader bench/bench control=kv.ctl data=\"-\" direct=TRUE;

NOTE : one thing you may have noticed is that Im using dd to do 1MB I/Os. This just an efficiency operation and works well on structures such as DBFS, you could skip this part of the operation if needed.

Which brings us onto external tables and the preparser


External Tables and pre-parsers 

As I mentioned earlier I like to take advantage of functionality that's already available and one of those features in the Oracle database is external tables. I don't intend to go into much detail as to why you should use external tables other than they do much of the heavy lifting for you and they provide a seamless interface between the filesystem and the database. They effectively make files look like tables. 

This means it's trivial to implement parallelism for our pre-parser. We don't need to worry about how to handle the files and how to schedule everything, external tables take care of all of that for you. In our benchmark we used them in the following way




Our previous pipeline remains the same except that we don't need sqlloader its all managed by the table definition itself. So we end up with something similar to the following for the table definition (I've abbreviated it quite substantially and highlighted the important bits)

create table staging_ext_mydata_jan01
(    uniqueid NUMBER,
..
-- Lots of columns
..
 )
ORGANIZATION EXTERNAL
(
   TYPE oracle_loader
   DEFAULT DIRECTORY load_dir_jan01
   ACCESS PARAMETERS
   (
      RECORDS DELIMITED BY NEWLINE
      PREPROCESSOR exec_dir:'external_tab.sh'
      BADFILE log_dir: 'external.bad'
      LOGFILE log_dir: 'external.log'
fields terminated by '|'
OPTIONALLY ENCLOSED BY '"' AND '"'
   ( uniqueid char(100),
..
-- Lots of defintions
..
)
)
LOCATION('data1.txt',data2.txt'...'data100.txt'))
REJECT LIMIT UNLIMITED;

One of the things to note is that I've included the pipelined preprocessor inside of a shell script which looks like this 

/bin/dd if=$1 bs=1024k status=noxfer 2>/dev/null | /bin/gunzip -c | java -classpath /home/oracle/loader.jar com.dom.KeyValueParserStdIn

The important part of this script is the parameter ($1) that is passed to the shell script. This is the file name that the external table wants to process. 

A simple select statement from my "staging_ext_mydata_jan01" unzips and parses the data converting it to usable columns. Whats more if I issue the statement in parallel Oracle takes care of creating the processes for me and making sure everything is scheduled in an orderly fashion.

To finish the load we simply used a multi table insert to put the data into the correct tables in an efficient fashion. Using this approach we were able to read zipped files, parse them and insert them into our three target tables at over 1.5 million source records/sec.
Comments

Timing groups of SQL operations

Some times I feel like I’ve missed out on a whole chunk on functionality in Oracle products. One little nugget is the “timing” function in SQL*Plus. This allows you to time groups of operations.

Obviously turning on is achieved with the “set timing on” operation. i.e

SQL > set timing on

SQL > select count(1) from all_objects;

COUNT(1)
----------
68653

Elapsed: 00:00:03.95

SQL>


Which is great but what if want to time mulitiple operations. Use the timing function and simply give the timer a name, in this case statement timer.

SQL> timing start statement_timer
SQL> select count(1) from all_objects;

COUNT(1)
----------
68653

SYS@orcl > /

COUNT(1)
----------
68653

SQL> timing show statement_timer;
timing for: statement_timer
Elapsed: 00:00:30.85
SQL>


Which times anything that went on in between the timer starting and finishing. In this case also my typing of the commands. Its a fantastic utility for timing stages in a batch job including call outs to os operations.
Comments

Datawarehousing benchmark

Recently some of my colleagues and myself worked on a data warehousing benchmark. This comprised of a 1TB data set consisting of a 14 billion row fact table with 5 dimensions of a much smaller size. Nothing outrageous a classical star schema... bread and butter to Oracle. We used the following hardware configuration

2 Intel based servers.
  • 4 CPUs with Dual Cores (hyper threading enabled).
  • 64 GB of memory in each
  • 2 dual ported 4Gb HBAs
  • 2 dual ported 1Gb Nics
The storage consisted of
  • 10 low cost storage arrays
  • 20 controllers
  • 20 trays of disk with 14 15k 36GB drives in each
The fabric consisted of
  • 2 mid range 16 port fibre switches
The network consisted of
  • 1 mid range 16 port 1Gb switch (I know we really should have had two of these)

The point here is that the hardware was low end but with the ability to deliver plenty of CPU and I/O at very attractive price point. For those that aren't that interested in any more of the details the headline figures where that we performed full table scans at over 1.6GB/sec but due to compression we achieved logically three times this figure.

We used a stock 2.6 64 bit Linux kernel we modified the following kernel settings
  • wem, rmem were to set to 512k to reflect the fact that we were going to use 32k block sizes
  • Huge pages were enabled... its a much more sensible way of managing the shared memory required for the SGA
  • We enabled jumbo frames in the switch and set the MTU on the interconnect to 9000bytes
  • We used the deadline scheduler to improve I/O prioritisation.
  • Multipath was used to provide DMP over the 4 paths to disk from each of the servers.

The Oracle database was largely unchanged from the defaults with the exception of the following settings
  • 16GB for SGA and PGA
  • The fact table was compressed
  • 32k block sized to improve compression ratios
  • Two parallel instance groups to ensure execution of certain queries on a given node.
  • Query rewrite was enabled.
  • ocfs2 was used for the quorum and ocr files
  • ASM was used to provide storage for the datafiles etc. Each physical tray of disk was presented to ASM as logical disk of roughly 500Gb.
  • ASMlib was used to persist disk identity and simplify configuration.

The tests consisted of a series of SQL statements that were run serial, concurrently and as a series of streams. I cant give much in the way of specifics for obvious reasons but the machine ran at about 80% utilization during the test most of the queries returned sub minute when run against the full data and many sub second when materialized views were exploited.

However the real benefit came from the fact that low the cost commodity hardware made the system relatively cheap and easy to put together. Like any system there were things we would have liked to have done differently if we'd had the chance... The benchmark was done in 10 days (OS Install, disk, layout, Oracle install, data load etc.) and so things were done in a rush and with a little more thought could have been done much more efficiently... We are fairly certain we could have improved the I/O performance but we didn't have the luxury of experimentation and so we made a call and lived with the consequences... I strongly recommend that any one building one of these systems spends some time with Oracle Orion disk benchmarking system to determine an optimal layout. A recent customer evaluating several possible disk layout configurations showed over 100% difference between two subtly different versions.
Comments

On the subject of I/O

One the things that constantly surprises me when talking with clients about hardware for a new database server is that I/O is always at the bottom of the list. Typically the list will look something like this (listed in order of perceived importance)
  • CPUs, have we enough. Fast as possible.

  • Memory, as much as we can put in the box. Oracle don't charge us for that

  • SAN, big as possible.

At this stage the purchase order is usually given the nod and the hardware supplier will ship yet another run of the mill box. Don't get me wrong. Many experienced DBAs have been through this process many times before and realise that not only is the list in the wrong order but its missing some critical components.
  • HBAs, need to specify these in proportion to the CPUs and attached storage

  • NICs, might need a lot of these i.e public, cluster interconnect, storage, management, backup. And typically in multiples for resillience or performance.

  • Backup, are we using the existing backup infrastructure?

I don't blame anyone for this way of thinking, its the way its always been. When discussing a new server the first question that people tend to ask is "So whats this monster packing? 16 CPUs!!!" followed by lots of very macho grunts and hollering. The standard licensing model (not just Oracles) doesn't help. It starts with premise of a CPU describing the power of a server, and to a large degree it does but misses the point of what a database is all about and that's information. Typically that information is held in ones and zeros on a bunch of spinning scrap metal. The real power of a database comes from its ability to aggregate, analyze and process those ones and zeros, turn it into information and push results out to interested parties. Paraphrasing a little "Its all about I/O stupid".

With this in mind I'm constantly surprised by the imbalance of I/O put into servers both disk and network. Its not unusual to see a 4 cpu server running with the latest generation Intel and AMD CPUs but with a single HBA and dual ported NIC. Whilst memory is cheap many of these servers still run 32 bit kernels. This typically means only a small proportion of the database is cached in memory be it in the SGA or file cache (don't me started on file cache). I'd make a rash guess that whilst the size of the memory in a typical database server has increased the average size of the SGA hasn't increased in line with this trend. To make matters worse the typical size of a database has got significantly bigger. This has to lead us to the conclusion that less of the database is cached and as a result a bigger proportion of its is located on disk. As I said this is just a guess but its backed up with real customer engagements. What would be of interest is to have performed an analysis over the last 10 years to see if the wait event for scattered and sequential reads had decreased or increased as a proportion of the total wait event in production databases.

What I'm driving at is the need to move I/O way up the agenda when sizing a server for databases. The number of CPUs needs to be married to the number of I/O channels available. It makes no sense to buy database licenses for a machine that will simply sit and wait on I/O, Its simply wasting money. Equally it makes no sense to stuff a 4 cpu machine full of HBAs for a database application that will perform index lookups on a index that fits comfortably in the cache. Adding HBAs later to an existing server isn't necessarily a simple option either especially for a mission critical application or one that has hard coded paths to disk.

The next obvious question is "well thats well and good but how do I size the ratio of HBAs to CPUs." and in a typically vague fashion I reply "well that depends". The type of application and the type of processor should heavily influence the decision. Certainly the CPU has been winning the race in terms of performance over the last few years and it needs a lot more I/O to keep it busy. But the equation also needs to be balanced with the amount of memory available on the box. A large SGA will certainly reduce the need to visit disk. The best advice I can give is to speak to your hardware supplier and find out what the current state of play is. Also check the latest TPC-C and TPC-H figures show. Whilst these are generally edging towards the extremes of performance it does show what a hardware supplier believed was needed to show their hardware in the best light.

Comments

SQL Developer goes production.

SQL Developer has gone production. Congratulations to the entire development team. I've been using it every day now and it feels solid and performs well and I continue to find out new tricks and features each day. If you've not given it ago, try it...
Comments

SQLDeveloper

If you haven't tried this tool out I strongly recomend you pop over here. Its a massive step forward for database developers/DBA who really have felt a little neglected by Oracle over the last few years. I've had countless complaints about how SQLPlus/vi/notepad are still used by many as their development tools of choice and how it really isn't good enough. Well I've know about th tool for a while now and have had to keep quite but Im glad the cats out of the bag and its got such positive reviews... especially because of its price... free.

It features much of the functionality you'd expect in a top end development tool plus features that many of its competitors charge top dollar for. The best piece of news is that its an extensible framework and plugins have started to pop up all of the place... One of my particular favourites is at fourthelephant. Perhaps a little over the top for a text man like myself but I appreciate the work that must have gone into it.

They've inspired me to think about putting one together myself... The API is pretty simple and so it shouldn't be too taxing.... Any ideas? drop me a line and I'll see what I can do.
Comments

rlwrap : Command line editing in sqlplus

I imagine alot of people in the Linux comunity already use this. However if you new to Linux and struggling with recalling text on the commandline from within sqlplus, rlwrap might be the tool for you... its just a wrapper you put around a command line program and provides command line editing.... I've been using it for the last few months and couldn't live without it

You can find the tool here
Comments

Server side failover

I've had a number of emails recently with requests for help with the new server side failover functionaility in 10g release 2. This functionality is described in the Oracle10g release 2 documentation but I've been told its not really obvious.

Let start by explaining what it is. Server side failover allows the sysadmin/dba to configure the profile of connection availablity on the server using a service. Users are effectively unaware of what will happen in the advent of a node in the cluster failing. Previously in (9i and 10g) users needed an entry in their local tnsnames file to describe which nodes they could failover to and which nodes were used to load balance connections on. Unless you used a remote naming service to maintain the connection information every time you added or removed nodes from a cluster it meant an update to potentially hundreds or thousands of tnsnames files.

This was simplified with easy connect in 10g release 1 which allowed the creation and connection to a service specified on the server. For the first time users only needed to connect to a nominated "listener node" and know the name of the service, for example imagine we have a nominated server inside of our organisation called "oracleservice", this of course is the name used for our virtual ip that will float between our cluster of listeners. In 10g release 1 we could create a service called "orderentry" using either dbca or srvctl that would allow our users to connect to it using a connect string of the form

sqlplus soe/soe@//oracleservice/orderentry

This greatly simplifies Oracle network maintenance. In some cases it could mean the removal of tnsnames files from the client or application server. It has other advantages for the DBA as well. If some business event occurs that requires the provisioning of a new application or a new resource profile for a short period of time the DBA can provision it in seconds and trivially remove it when it is no longer required.

Sadly in Oracle10g release 1 this functionality didn't support Transparent Application Failover (TAF), this meant that DBAs still needed to maintain tnsnames files contain a description of what nodes a service could failover on. The good news is that in Oracle10g release 2 this all changed. DBA's could set up a service specifying TAF and the Oracle OCI layer would use this definition provided by the server to describe the load balancing and failover profile.

Implementing this functionality is pretty trivial but there is a step that might catch you out. So lets go through it step by step

To set up the service you can use either Oracle DBCA, the DBMS_SERVICE package, Enterprise Manager or srvctl. The choice is entirely dependent on what you have running. DBCA or Enterprise manager provide the simplest mechanism but you will still have to run the final step using the dbms_service package to tell the database about its failover profile.

I'll use the DBMS_SERVICE package and srvctl for the sake of brevity. In the following example I have a database called db10g2 with two instances db10g21 and db10g22. Im going to create a service called "orderentry" that will provide transparent application failover between the two instances.

The first step is to create the service using srvctl

srvctl add service -d db10g2 -s orderentry -r "db10g21,db10g22" -a "db10g21,db10g22" -P BASIC

and check on its status

$ > srvctl status service -d db10g2 -s "orderentry"
Service orderentry is not running.

So we'll have to start the service first

$ > srvctl start service -d db10g2 -s "orderentry"

if we now use sqlplus connecting as system/sys we can see the service.

SYSTEM@db10g21 > select SERVICE_ID, NAME, NETWORK_NAME, failover_method from dba_services;

id Name Network Name Failover
--- ------------------ ------------------------- ------------
1 SYS$BACKGROUND
2 SYS$USERS
3 orderentry orderentry


The thing to note is that the service hasn't got a failover profile associated with it. So we'll have to modify it using the DBMS_SERVICE package

SYS@db10g21 > get t1.sql
1 begin
2 DBMS_SERVICE.MODIFY_SERVICE(
3 service_name => 'orderentry',
4 failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC,
5 failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT,
6 failover_retries => 180,
7 failover_delay => 5);
8* end;
SYS@db10g21 >

if we now select the service information again

id Name Network Name Failover
--- ------------------ ------------------------- ------------
1 SYS$BACKGROUND
2 SYS$USERS
3 orderentry orderentry BASIC

We can now test the service using sqlplus.

sqlplus soe/soe@//node1/orderentry

SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jan 18 14:11:47 2006

Copyright (c) 1982, 2005, Oracle. All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SOE@//node1/orderentry >

So all we need to do now is to fire up swingbench and use the service we've created.

[oracle@node1 bin]$ ./charbench -cs //node1/orderentry -dt oci -uc 30 -a
Author : Dominic Giles
Version : 2.2

Results will be written to results.xml.
Users : 30 TPM : 272 Nested TPM : 0

If we log onto the database we can see that the connections have being balanced across the two nodes

SYS@db10g21 >;

1 select instance_name, count(1) usercount, nvl(username,'INTERNAL') user_name,
2 failover_type, failover_method
3 from gv$session s, gv$instance i
4 where s.inst_id = i.inst_id
5 group by instance_name, username, failover_type, failover_method
6* order by username, instance_name
SYS@db10g21 > /

Instance No. of Users Username Fail Over Type Fail Over Method
---------- ------------ ---------- ------------------ ------------------
db10g21 15 SOE SELECT BASIC
db10g22 15 SOE SELECT BASIC
db10g21 6 SYS NONE NONE
db10g22 6 SYS NONE NONE
db10g21 23 INTERNAL NONE NONE
db10g22 25 INTERNAL NONE NONE

so lets shut down of the instances

SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jan 18 15:28:22 2006

Copyright (c) 1982, 2005, Oracle. All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SYS@db10g22 > shutdown abort;
ORACLE instance shut down.
SYS@db10g22 >

And re-query the session profile

Instance No. of Users Username Fail Over Type Fail Over Method
---------- ------------ ---------- ------------------ ------------------
db10g21 30 SOE SELECT BASIC
6 SYS NONE NONE
25 INTERNAL NONE NONE


There's a lot more thats possible using the service approach to database connection but I'll discuss that in another blog.





Comments