Jupyter Notebooks, Python and Oracle Installation on Linux

OracleDockerSetupLinux

Installing Jupyter-lab and Docker environment on Linux

The following walk-through guides you through the steps needed to set up your enviroment to run Jupyter-lab, Oracle and Docker to build and run docker images for testing. This should work for either an on premises install or on Oracle's cloud using IaaS (Compute). This walkthough will serve primarily as a reminder to myself.

Prerequisites

I'm making the assumption that you're running on Linux (I've a similar walkthrough for mac). In my example I'm using Oracle Enterprise Linux 7 (OEL). I'm also assuming a few other things

  • Python 3.6 or higher is installed
  • You have access to root either directly or via sudo. In this example I'm installing everything in the Oracle user account that has sudo privelege
  • Docker is installed. If it isn't, see this excellent guide. If you aren't running as root also make sure you follow this final step as well.

Install

The install is pretty simple. It consists of setting up python, installing Oracle Instant client, installing Git and then cloning this directory to the server. Lets start with setting up the Python Environment

Python Setup

By default OEL 7 runs Python 2 rather than Python 3 which is likely to change in the future, until then we have a few steps we need to run through. The first is to install pip and virtualenv. We can easily do this on OEL with yum

sudo yum install python36-pip

If this command fails saying something like No package python36-pip available. then you'll need to edit the yum config file at /etc/yum.repos.d/public-yum-ol7.repo and enable the software development repos.

The next step is to install virtualenv. Virtualenv enables us to create isolated sandpits to develop Python applications without running into module or library conflicts. Once we have pip installed it's very simple to install

sudo pip3.6 install virtualenv

Next we can create a virtual environment and enable it.

virtualenv -p /usr/bin/python36 myvirtualenv
source myvirtualenv/bin/activate

This will create a directory called myvirtualenv (you can call it what you like) with it's own version of the python interpreter and pip. Once we "active it", any library we install will only be in this directory and won't effect the system as a whole. You should see you command prompt change when you activate it. It should look something like this

[04:59 PM : oracle@ora18server ~]$ virtualenv -p /usr/bin/python36 myvirtualenv
Running virtualenv with interpreter /usr/bin/python36
Using base prefix '/usr'
  No LICENSE.txt / LICENSE found in source
New python executable in /home/oracle/myvirtualenv/bin/python36
Also creating executable in /home/oracle/myvirtualenv/bin/python
Installing setuptools, pip, wheel...
done.
[05:00 PM : oracle@ora18server ~]$ source myvirtualenv/bin/activate
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$ ls
Desktop       ora18server-certificate.crt  sql    swingbench
myvirtualenv  OracleUtils                  sqlcl  wallet
(myvirtualenv) [05:00 PM : oracle@ora18server ~]$

Running the following command will show what Python models we have installed at this point.

(myvirtualenv) [05:00 PM : oracle@ora18server ~]$ pip list
Package    Version
---------- -------
pip        19.0.3 
setuptools 40.8.0 
wheel      0.33.1

Which shouldn't be very many.

Git Installation

We now need to install Git which is useful for managing and versioning code. That might not be a requirement for you but it also makes it very simple to clone existing repostories. Installing it is very simple.

sudo yum install git

We can now clone my IPython/Jupyter notebooks from github which provides you with the code for creating your own Oracle Docker Images.

git clone https://github.com/domgiles/JuypterLabWork.git

This will create a directory call JuypterLabWork

Installing Oracle Instant Client

One of the recent updates to Oracles install models is the support for RPMs and yum installations without the need for click through agreements. This makes it very simple to install a client with a single command

sudo yum install oracle-instantclient18.3-basic

This will typically install the the software into /usr/lib/oracle/18.3

Installing Jupyter-Lab

In the JupyterLabWork directory that was created when we ran the git clone command there's a file called requirments.txt. This is a list of modules needed to run the notebooks in that directory. To install them all you need to do is to run the command

pip install -r requirements.txt

This will install all of the needed modules. From there all we need to do is to run the command

jupyter-lab

If you're running directly on a workstation or virtual machine and have a browser installed it should take you directly into the jupyter environment.

jupter lab image

If you're running headless then when jupyter-lab starts it should give you a url that you can connect to. Look for something like

[I 12:52:30.450 NotebookApp] The Jupyter Notebook is running at:
[I 12:52:30.450 NotebookApp] http://oracle18cserver:8888/?token=f71e677e202f5fffc3d20fe458ff973e616e0dc3b8eaf072
[I 12:52:30.450 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

That's it.

Comments

Connecting to Oracle ADB from Python

Connecting to Oracle ADB

Connect to Oracle ATP or ADW from Python

I realised that in some of my previous posts I didn't really detail connecting to ATP and ADW. Here's a slight more in depth walkthough.

Connecting to Oracle Autonomous Transaction Processing or Autonomous Datawarehouse is pretty simple from Python. It requires only a few things

  • Oracle Instant Client (Or alternative)
  • A Python environment.
  • The Oracle_CX Python driver Module.
  • A valid wallet for an ATP or ADW service

Let's go through each of these in turn

Oracle Instant Clent

The next step is pretty straight forward. You can download the oracle instant client from here.

instant client screen shot

You'll only need the basic package. Unzip the downloaded file into a suitable location. It's worth pointing out that on Linux this step is even easier. You can now use yum to install the instant client direct from the command line. You can find details on how to configure it here

Python environment

There's plenty of guides out there that show you how to install python on your windows or mac. If you haven't done this already This guide is a good place to start. I'm assuming that you've also gone through the steps of installing pip. If not you can follow this simple guide. I'd also advice you create a virtual environment with virtualenv before you doing anything else. It's considered best practice and isolates you from current or future library conficts.

First lets create our virtual env

virtualenv adb_virt_env

And then active it (I'm assuming linux or mac)

source adb_virt_env/bin/activate

The next step is to install the Python driver. This is as simple as

pip install cx_Oracle

And thats all we need to do at this stage inside to setup our Python environment.

Oracle ADW or ATP Wallet

The final thing we need is the wallet containing the credential and connect string details to enable us to connect to ATP or ADW. You'll need to log onto Oracle OCI console to do this unless have been provided the wallet by a colleague. Simply navigate to your ATP or ADW instance and follow the instructions below.

download wallet screen shot

While it's not necessary we'll download and unzip the wallet into the virtual directory we've created (adb_virt_env).

$> ls
bin                cx_Oracle-doc      include            lib                pip-selfcheck.json wallet_SBATP.zip
$> unzip wallet_SBATP.zip
Archive:  wallet_SBATP.zip
  inflating: cwallet.sso             
  inflating: tnsnames.ora            
  inflating: truststore.jks          
  inflating: ojdbc.properties        
  inflating: sqlnet.ora              
  inflating: ewallet.p12             
  inflating: keystore.jks
$> ls
bin                cx_Oracle-doc      include            lib                pip-selfcheck.json tnsnames.ora       wallet_SBATP.zip
cwallet.sso        ewallet.p12        keystore.jks       ojdbc.properties   sqlnet.ora         truststore.jks

Next we need to edit the sqlnet.ora file to reflect the location where it's located. Currently for my environment it looks like

WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="?/network/admin")))
SSL_SERVER_DN_MATCH=yes

We'll need to change the DIRECTORY parameter to our virtual environment. In my case /Users/dgiles/Downloads/adb_virt_env. So for my environment it will look like

WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/Users/dgiles/Downloads/adb_virt_env")))
SSL_SERVER_DN_MATCH=yes

We should also take a look at tnsnames.ora to see which services we'll be using. You can do this by taking a look in the tnsnames.ora file. There's likely to by lots of entries if you have lots of ATB or ADW instances in you OCI compartment. In my instance I'll be using a connect string called sbatp_medium which has a medium priority but pick the one appropriate to your environment.

sbatp_medium = (description= (address=(protocol=tcps)(port=1522)(host=adb.us-phoenix-1.oraclecloud.com))(connect_data=(service_name=gebqwccvhjbqbs_sbatp_medium.atp.oraclecloud.com))(security=(ssl_server_cert_dn=
        "CN=adwc.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US"))   )

We'll only need to remember its name for the next step.

The Code

Finally we're ready to write some code. The first step is to import the modules we'll need. In this case it's just cx_oracle and os

In [16]:
import cx_Oracle
import os

We need to set the environment variable TNS_ADMIN to point at our directory (adb_virt_env) where all of the files from our wallet are located.

In [17]:
os.environ['TNS_ADMIN'] = '/Users/dgiles/Downloads/adb_virt_env'

And now we can simply connect to ATP or ADW instance using a standard Python database connect operation using the connect string we remebered from the tnsnames.ora file. NOTE : I'm assuming you've created a user in the database or you're using the admin user created for this instance.

In [18]:
connection = cx_Oracle.connect('admin', 'ReallyLongPassw0rd', 'sbatp_medium')

And thats it... From here on in we can use the connection as it was a local database.

In [19]:
cursor = connection.cursor()
rs = cursor.execute("select 'Hello for ADB' from dual")
rs.fetchall()
Out[19]:
[('Hello for ADB',)]
Comments

Update to MonitorDB

Just a quick one I've update MonitorDB to enable it to use wallets. So it can now run against Oracle Autonomous Transaction Processing and Oracle Autonomous Data Warehouse.

You can add the location in the configuration file or

Screenshot of IntelliJ IDEA (22-01-2019, 18-48-21)

On the command line

Screenshot of Terminal (22-01-2019, 18-51-09)

I've also compiled it for Java8 and used the latest jdbc drivers.

You can find it here
Comments

Oracle SODA Python Driver and Jupyter Lab

json_atp

Oracle SODA Python Driver and Jupyter Lab

This workbook is divided into two sections the first is a quick guide to setting up Jupyter Lab (Python Notebooks) such that it can connect to a database running inside of OCI, in this case an ATP instance. The second section uses the JSON python driver to connect to the database to run a few tests. This notebook is largely a reminder to myself on how to set this up but hopefully it will be useful to others.

Setting up Python 3.6 and Jupyter Lab on our compute instance

I won't go into much detail on setting up ATP or ADW or creating a IaaS server. I covered that process in quite a lot of detail here. We'll be setting up something similar to the following

OCI Cloud

Once you've created the server You'll need to logon to the server with the details found on the compute instances home screen. You just need to grab it's IP address to enable you to logon over ssh.

OCI Cloud

The next step is to connect over ssh to with a command similar to

ssh opc@132.146.27.111
Enter passphrase for key '/Users/dgiles/.ssh/id_rsa': 
Last login: Wed Jan  9 20:48:46 2019 from host10.10.10.1

In the following steps we'll be using python so we need to set up python on the server and configure the needed modules. Our first step is to use yum to install python 3.6 (This is personal preference and you could stick with python 2.7). To do this we first need to enable yum and then install the environment. Run the following commands

sudo yum -y install yum-utils
sudo yum-config-manager --enable ol7_developer_epel
sudo yum install -y python36
python3.6 -m venv myvirtualenv
source myvirtualenv/bin/activate

This will install python and enable a virtual environment for use (our own Python sand pit). You can make sure that python is installed by simply typing python3.6 ie.

$> python3.6
Python 3.6.3 (default, Feb  1 2018, 22:26:31) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

Make sure you type quit() to leave the REPL environment.

We now need to install the needed modules. You can do this one by one or simply use the following file requirements.txt and run the following command

pip -p requirements.txt

This will install all of the need python modules for the next step which is to start up Jupyter Lab.

Jupyter Lab is an interactive web based application that enables you do interactively run code and document the process at the same time. This blog is written in it and the code below can be run once your environment is set up. Vists the website to see more details.

To start jupyer lab we run the following command.

nohup jupyter-lab --ip=127.0.0.1 &

Be aware that this will only work if you have activated you virtual environment. In out instance we did this with with the command source myvirtualenv/bin/activate. At this point the jupyter-lab is running in the background and and is listening (by default) on port 8888. You could start a desktop up and use VNC to view the output. However I prefer to redirect the output to my own desktop over ssh. To do this you'll need to run the following ssh command from your desktop

ssh -N -f -L 5555:localhost:8888 opc@132.146.27.111

Replacing the IP address above with the one for your compute instance

This will redirect the output of 8888 to port 5555 on your destop. You can then connect to it by simply going to the following url http://localhost:5555. After doing this you should see a screen asking you input a token (you'll only need to do this once). You can find this token inside of the nohup.out file running on the compute instance. It should be near the top of the file and should look something like

[I 20:49:12.237 LabApp] http://127.0.0.1:8888/?token=216272ef06c7b7cb3fa8da4e2d7c727dab77c0942fac29c8

Just copy the text after "token=" and paste it in to the dialogue box. After completing that step you should see something like this

Jupyter Lab

You can now start creating your own notebooks or load this one found here. I'd visit the website to familiarise yourself on how the notebooks work.

Using Python and the JSON SODA API

This section will walk through using The SODA API with Python from within the Jupyter-lab envionment we set up in the previous section. The SODA API is a simple object API that enables developers persist and retrieve JSON documents held inside of the Oracle Database. SODA drivers are available for Java, C, REST, Node and Python.

You can find the documentation for this API here

To get started we'll need to import the need the following python modules

In [11]:
import cx_Oracle
import keyring
import os
import pandas as pd

We now need to set the location of the directory containing the wallet to enable us to connect to the ATP instance. Once we've done that we can connect to the Oracle ATP instance and get a SODA object to enable us to work with JSON documents. NOTE : I'm using the module keyring to hide the password for my database. You should replace this call with the password for your user.

In [20]:
os.environ['TNS_ADMIN'] = '/home/opc/Wallet'
connection = cx_Oracle.connect('json', keyring.get_password('ATPLondon','json'), 'sbatp_tpurgent')
soda = connection.getSodaDatabase()

We now need to create JSON collection and if needed add any additional indexes which might accelerate data access.

In [21]:
try:
    collection = soda.createCollection("customers_json_soda")
    collection.createIndex({ "name"   : "customer_index",
                          "fields" : [ { "path"     : "name_last",
                          "datatype" : "string"}]})
except cx_Oracle.DatabaseError as ex:
    print("It looks like the index already exists : {}".format(ex))

We can now add data to the collection. Here I'm inserting the document into the database and retrieving it's key. You can find find some examples/test cases on how to use collections here

In [22]:
customer_doc = {"id"          : 1,
       "name_last"    : "Giles",
       "name_first"   : "Dom",
       "description"  : "Gold customer, since 1990",
       "account info" : None,
       "dataplan"     : True,
       "phones"       : [{"type" : "mobile", "number" : 9999965499},
                         {"type" : "home",   "number" : 3248723987}]}
doc = collection.insertOneAndGet(customer_doc)
connection.commit()

To fetch documents we could use SQL or Query By Example (QBE) as shown below. You can find further details on QBE here. In this example there should just be a single document. NOTE: I'm simply using pandas DataFrame to render the retrieved data but it does highlight how simple it is to use the framework for additional analysis at a later stage.

In [23]:
documents = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getDocuments()
results = [document.getContent() for document in documents]    
pd.DataFrame(results)
Out[23]:
account info dataplan description id name_first name_last phones
0 None True Gold customer, since 1990 1 Dom Giles [{'type': 'mobile', 'number': 9999965499}, {'t...

To update records we can use the replaceOne method.

In [24]:
document = collection.find().filter({'name_first': {'$eq': 'Dom'}}).getOne()
updated = collection.find().key(doc.key).replaceOne({"id"          : 1,
       "name_last"    : "Giles",
       "name_first"   : "Dominic",
       "description"  : "Gold customer, since 1990",
       "account info" : None,
       "dataplan"     : True,
       "phones"       : [{"type" : "mobile", "number" : 9999965499},
                         {"type" : "home",   "number" : 3248723987}]},)
connection.commit()

And just to make sure the change happened

In [25]:
data = collection.find().key(document.key).getOne().getContent()
pd.DataFrame([data])
Out[25]:
account info dataplan description id name_first name_last phones
0 None True Gold customer, since 1990 1 Dominic Giles [{'type': 'mobile', 'number': 9999965499}, {'t...

And finally we can drop the collection.

In [26]:
try:
    collection.drop()
except cx_Oracle.DatabaseError as ex:
    print("We're were unable to drop the collection")
In [27]:
connection.close()
Comments

Accessing the Oracle Object Store

OCIConnection