Creating, Querying and Loading Data in to the Oracle Autonomous Data Warehouse
Oracle Autonomous Data Warehouse (ADW) access via Python¶
The following shows how to access the Oracle Autonomous Data Warehouse using Python and how to load the data using the DBMS_CLOUD package via the cx_Oracle module. This is obviously simpler via the Graphical front end or SQL Developer but using Python provdes a simple scriptable model whilst hiding some of the complexities of using native REST APIs.
This simple example assumes that you've got an Oracle Cloud account and that you've created or got access an ADW database. You'll also have to download the credentials file to provide SQL*Net access. You can find the details on how to do that here. We'll be using Python in this short example but most of what we're doing could be achieved using the GUI and/or REST Calls.
Connecting to ADW Instance¶
To start with we'll make sure we can connect to the ADW Instance we've previously created. To do that we need to import the required libraries. If you dodn't have these I reccommend using PIP (and virtualenv)
import cx_Oracle
import keyring
import os
We need to use an environment variable to reflect the location of the downloaded credentials files to be used by SQL*Net.
os.environ['TNS_ADMIN'] = '/Users/dgiles/Downloads/wallet_DOMSDB'
This is equivlent to bash
export TNS_ADMIN=/Users/dgiles/Downloads/wallet_DOMSDB
and points to the unzipped directory containing the tnsnames.ora, sqlnet.ora etc. NOTE: you'll need to update the sqlnet.ora to ensure the wallet points to the same directory specified in the TNS_ADMIN environment variable. i.e.
WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/Users/dgiles/Downloads/wallet_DOMSDB")))
SSL_SERVER_DN_MATCH=yes
In the example above I've changed DIRECTORY to the location where I downloaded and unzipped the credentials file.
The next steps are to connect to the Oracle ADW instance. In the example below I've store my password using the Python Module "keyring". I'm also using the jupyter notebook magic sql functionality. We'll test the connection using the admin user and the connect string "domsdb_medium" which is one of the services included in the tnsnames.ora file.
%load_ext sql
password = keyring.get_password('adw','admin')
%sql oracle+cx_oracle://admin:$password@domsdb_medium
%%sql admin@domsdb_medium
select 1 from dual
Generating test data¶
We've connected to the oracle database and can now start uploading data to the instance. In this example we'll use datagenerator to generate the data into flat files and then place these on Oracle Object Store and load them from there.
The first step is to install datagenerator. You can find details on how to do that here. We can now simply generate data for the "SH" benchmark.
import subprocess
# Change the 2 following parameters to reflect your environment
generated_data_dir = '/Users/dgiles/Downloads/generated_data'
datagenerator_home = '/Users/dgiles/datagenerator'
# Change the following paramters relating to the way datagenerator will create the data
scale = 100
parallel = 8
dg_command = '{dg}/bin/datagenerator -c {dg}/bin/sh.xml -scale {s} -cl -f -d {gdd} -tc {p}'.format(
dg = datagenerator_home,
s = scale,
gdd = generated_data_dir,
p = parallel
)
# Typically we'd use a command similiar to the one below but since we're in a notebook it's easier to use the default functionality
# p = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
# (output, err) = p.communicate()
print(dg_command)
!{dg_command}
We should now have a series of files in the "generated_data_dir" directory. These will be a mix of csv files, create table scripts, loader scripts etc.
!ls {generated_data_dir}
Uploading the data to the Oracle Object Store¶
We're really only interested in the "csv" files so we'll upload just those. But before we do this we'll need to establish a connection to the Oracle Object Store. I give some detail behind how to do this in this notebook. I'll be using the object storage out of the Frankfurt Region.
import oci
import ast
my_config = ast.literal_eval(keyring.get_password('oci_opj','doms'))
my_config['region'] = 'eu-frankfurt-1'
object_storage_client = oci.object_storage.ObjectStorageClient(my_config)
namespace = object_storage_client.get_namespace().data
We've now got a handle to the Oracle Object Store Client so we can now create a bucket which we'll call and upload the "CSV" Files.
import os, io
bucket_name = 'Sales_Data'
files_to_process = [file for file in os.listdir(generated_data_dir) if file.endswith('csv')]
try:
create_bucket_response = object_storage_client.create_bucket(
namespace,
oci.object_storage.models.CreateBucketDetails(
name=bucket_name,
compartment_id=my_config['tenancy']
)
)
except Exception as e:
print(e.message)
for upload_file in files_to_process:
print('Uploading file {}'.format(upload_file))
object_storage_client.put_object(namespace, bucket_name, upload_file, io.open(os.path.join(generated_data_dir,upload_file),'r'))
We need to create an authentication token that can be used by the ADW instance to access our Object storage. To do this we need to create an identity client.
indentity_client = oci.identity.IdentityClient(my_config)
token = indentity_client.create_auth_token(
oci.identity.models.CreateAuthTokenDetails(
description = "Token used to provide access to newly loaded files"
),
user_id = my_config['user']
)
Creating users and tables in the ADW Instance¶
The following steps will feel very familiar to any DBA/developer of and Oracle database. We need to create a schema and assocated tables to load the data into.
First we'll need to create a user/schema and grant it the appropriate roles
%sql create user mysh identified by ReallyLongPassw0rd default tablespace data
Grant the "mysh" user the DWROLE
%sql grant DWROLE to mysh
%sql oracle+cx_oracle://mysh:ReallyLongPassw0rd@domsdb_medium
We can now create the tables we'll use to load the data into.
%%sql mysh@domsdb_medium
CREATE TABLE COUNTRIES (
COUNTRY_ID NUMBER NOT NULL,
COUNTRY_ISO_CODE CHAR(2) NOT NULL,
COUNTRY_NAME VARCHAR2(40) NOT NULL,
COUNTRY_SUBREGION VARCHAR2(30) NOT NULL,
COUNTRY_SUBREGION_ID NUMBER NOT NULL,
COUNTRY_REGION VARCHAR2(20) NOT NULL,
COUNTRY_REGION_ID NUMBER NOT NULL,
COUNTRY_TOTAL NUMBER(9) NOT NULL,
COUNTRY_TOTAL_ID NUMBER NOT NULL,
COUNTRY_NAME_HIST VARCHAR2(40)
)
%%sql mysh@domsdb_medium
CREATE TABLE SALES (
PROD_ID NUMBER NOT NULL,
CUST_ID NUMBER NOT NULL,
TIME_ID DATE NOT NULL,
CHANNEL_ID NUMBER NOT NULL,
PROMO_ID NUMBER NOT NULL,
QUANTITY_SOLD NUMBER(10) NOT NULL,
AMOUNT_SOLD NUMBER(10) NOT NULL
)
%%sql mysh@domsdb_medium
CREATE TABLE SUPPLEMENTARY_DEMOGRAPHICS (
CUST_ID NUMBER NOT NULL,
EDUCATION VARCHAR2(21),
OCCUPATION VARCHAR2(21),
HOUSEHOLD_SIZE VARCHAR2(21),
YRS_RESIDENCE NUMBER,
AFFINITY_CARD NUMBER(10),
BULK_PACK_DISKETTES NUMBER(10),
FLAT_PANEL_MONITOR NUMBER(10),
HOME_THEATER_PACKAGE NUMBER(10),
BOOKKEEPING_APPLICATION NUMBER(10),
PRINTER_SUPPLIES NUMBER(10),
Y_BOX_GAMES NUMBER(10),
OS_DOC_SET_KANJI NUMBER(10),
COMMENTS VARCHAR2(4000)
)
%%sql mysh@domsdb_medium
CREATE TABLE CUSTOMERS (
CUST_ID NUMBER NOT NULL,
CUST_FIRST_NAME VARCHAR2(20) NOT NULL,
CUST_LAST_NAME VARCHAR2(40) NOT NULL,
CUST_GENDER CHAR(1) NOT NULL,
CUST_YEAR_OF_BIRTH NUMBER(4) NOT NULL,
CUST_MARITAL_STATUS VARCHAR2(20),
CUST_STREET_ADDRESS VARCHAR2(40) NOT NULL,
CUST_POSTAL_CODE VARCHAR2(10) NOT NULL,
CUST_CITY VARCHAR2(30) NOT NULL,
CUST_CITY_ID NUMBER NOT NULL,
CUST_STATE_PROVINCE VARCHAR2(40) NOT NULL,
CUST_STATE_PROVINCE_ID NUMBER NOT NULL,
COUNTRY_ID NUMBER NOT NULL,
CUST_MAIN_PHONE_NUMBER VARCHAR2(25) NOT NULL,
CUST_INCOME_LEVEL VARCHAR2(30),
CUST_CREDIT_LIMIT NUMBER,
CUST_EMAIL VARCHAR2(40),
CUST_TOTAL VARCHAR2(14) NOT NULL,
CUST_TOTAL_ID NUMBER NOT NULL,
CUST_SRC_ID NUMBER,
CUST_EFF_FROM DATE,
CUST_EFF_TO DATE,
CUST_VALID VARCHAR2(1)
)
%%sql mysh@domsdb_medium
CREATE TABLE CHANNELS (
CHANNEL_ID NUMBER NOT NULL,
CHANNEL_DESC VARCHAR2(20) NOT NULL,
CHANNEL_CLASS VARCHAR2(20) NOT NULL,
CHANNEL_CLASS_ID NUMBER NOT NULL,
CHANNEL_TOTAL VARCHAR2(13) NOT NULL,
CHANNEL_TOTAL_ID NUMBER NOT NULL
)
%%sql mysh@domsdb_medium
CREATE TABLE PRODUCTS (
PROD_ID NUMBER(6) NOT NULL,
PROD_NAME VARCHAR2(50) NOT NULL,
PROD_DESC VARCHAR2(4000) NOT NULL,
PROD_SUBCATEGORY VARCHAR2(50) NOT NULL,
PROD_SUBCATEGORY_ID NUMBER NOT NULL,
PROD_SUBCATEGORY_DESC VARCHAR2(2000) NOT NULL,
PROD_CATEGORY VARCHAR2(50) NOT NULL,
PROD_CATEGORY_ID NUMBER NOT NULL,
PROD_CATEGORY_DESC VARCHAR2(2000) NOT NULL,
PROD_WEIGHT_CLASS NUMBER(3) NOT NULL,
PROD_UNIT_OF_MEASURE VARCHAR2(20),
PROD_PACK_SIZE VARCHAR2(30) NOT NULL,
SUPPLIER_ID NUMBER(6) NOT NULL,
PROD_STATUS VARCHAR2(20) NOT NULL,
PROD_LIST_PRICE NUMBER(8) NOT NULL,
PROD_MIN_PRICE NUMBER(8) NOT NULL,
PROD_TOTAL VARCHAR2(13) NOT NULL,
PROD_TOTAL_ID NUMBER NOT NULL,
PROD_SRC_ID NUMBER,
PROD_EFF_FROM DATE,
PROD_EFF_TO DATE,
PROD_VALID VARCHAR2(1)
)
%%sql mysh@domsdb_medium
CREATE TABLE PROMOTIONS (
PROMO_ID NUMBER(6) NOT NULL,
PROMO_NAME VARCHAR2(30) NOT NULL,
PROMO_SUBCATEGORY VARCHAR2(30) NOT NULL,
PROMO_SUBCATEGORY_ID NUMBER NOT NULL,
PROMO_CATEGORY VARCHAR2(30) NOT NULL,
PROMO_CATEGORY_ID NUMBER NOT NULL,
PROMO_COST NUMBER(10) NOT NULL,
PROMO_BEGIN_DATE DATE NOT NULL,
PROMO_END_DATE DATE NOT NULL,
PROMO_TOTAL VARCHAR2(15) NOT NULL,
PROMO_TOTAL_ID NUMBER NOT NULL
)
%%sql mysh@domsdb_medium
CREATE TABLE times AS
SELECT udate time_id,
TO_CHAR(udate,'Day' day_name,
TO_CHAR(udate,'DD' day_number_in_month,
TO_CHAR(udate,'DDD' day_number_in_year,
TO_CHAR(udate,'YYYY' ) calendar_year,
TO_CHAR(udate,'Q' ) calendar_quarter_number,
TO_CHAR(udate,'MM' ) calendar_month_number,
TO_CHAR(udate,'WW' ) calendar_week_number,
TO_CHAR(udate,'YYYY-MM' calendar_month_desc,
TO_CHAR(udate,'YYYY-Q' calendar_quarter_desc
FROM
(SELECT to_date('31/12/1994','DD/MM/YYYY'+rownum udate
FROM all_objects
WHERE to_date('31/12/1994','DD/MM/YYYY'+rownum <= to_date( '31/12/2014','DD/MM/YYYY'
)
%%sql mysh@domsdb_medium
select * from tab
Copying the data from the object store¶
We need to add the authorisation token to the newly created schema to allow it to access the object stores files. We can't do this using the sql magic syntax we've been using till this point so we'll do it using standard cx_Oracle calls.
connection = cx_Oracle.connect('mysh', 'ReallyLongPassw0rd', 'domsdb_medium')
cursor = connection.cursor();
cursor.callproc('DBMS_CLOUD.create_credential', keywordParameters = {'credential_name':'SALES_DATA_AUTH',
'username':'dominic.giles@oracle.com',
'password':token.data.token})
We can access the object storage using a url of the the format
https://swiftobjectstorage.<region>.oraclecloud.com/v1/<tenancy>/<bucket name>/<object name>
We can use this to dynamically generate a url for each of the objects inside of the bucket we've just created and use the DBMS_CLOUD package to copy the data into the ADW instance. The code below gets all of the names of the tables we've just created and the loops through each table copying the associated csv file into the ADW instance.
from tqdm import tqdm
format = '''{"delimiter" : ",",
"skipheaders" : 1,
"ignoremissingcolumns" : "true",
"removequotes" : "true",
"dateformat" : "DD-MON-YYYY HH24:MI:SS",
"blankasnull" : "true"}'''
file_location = '''https://swiftobjectstorage.{region}.oraclecloud.com/v1/{tenancy}/{bucket_name}/{table_name}.csv'''
region = my_config['region']
tenancy= 'oracleonpremjava'
rs = cursor.execute("select table_name from user_tables where table_name not like 'COPY%'")
rows = rs.fetchall()
for row in tqdm(rows):
url = file_location.format(region=region, tenancy=tenancy, bucket_name=bucket_name, table_name=row[0])
cursor.callproc('DBMS_CLOUD.copy_data', keywordParameters= {'table_name':row[0],
'credential_name':'SALES_DATA_AUTH',
'file_uri_list':url,
'format': format
})
We can now take a look and see how many rows we've loaded into the tables
rs = cursor.execute("select table_name from user_tables where table_name not like 'COPY%'")
rows = rs.fetchall()
for row in rows:
rs2 = cursor.execute("select count(*) from {}".format(row[0]))
rows2 = rs2.fetchone()
print('{tn: <35}{rc:>10,}'.format(tn=row[0],rc=rows2[0]))
It's now possible to run standard queries against the newly loaded data. No need to create anything else (indexes etc.)
%%sql mysh@domsdb_medium
SELECT channels.channel_desc,
countries.country_iso_code,
TO_CHAR(SUM(amount_sold), '9,999,999,999' SALES$
FROM sales, customers, times, channels, countries
WHERE sales.time_id=times.time_id
AND sales.cust_id=customers.cust_id
AND customers.country_id = countries.country_id
AND sales.channel_id = channels.channel_id
AND channels.channel_desc IN ('Tele Sales','Internet'
AND times.calendar_year = '2006'
AND countries.country_iso_code IN ('GB','DE','FR','DK'
GROUP BY
ROLLUP(channels.channel_desc,countries.country_iso_code)
ORDER BY 1
Tidying up the object store and database¶
You can run the following steps if you want to remove all of the tables from the schema and purge the object store of files. Lets start by removing the tables. NOTE : The code below will remove all of the tables from the schema. Make sure you've not got anything in the schema that you want to keep before running it.
from tqdm import tqdm
rs = cursor.execute("select table_name from user_tables")
rows = rs.fetchall()
for row in tqdm(rows):
rs2 = cursor.execute("drop table {} purge".format(row[0]))
And then removing the object store files and bucket
object_list = object_storage_client.list_objects(namespace, bucket_name)
for o in object_list.data.objects:
print('Deleting object {}'.format(o.name))
object_storage_client.delete_object(namespace, bucket_name, o.name)
print('Deleting bucket')
response = object_storage_client.delete_bucket(namespace, bucket_name)