Server side failover

I've had a number of emails recently with requests for help with the new server side failover functionaility in 10g release 2. This functionality is described in the Oracle10g release 2 documentation but I've been told its not really obvious.

Let start by explaining what it is. Server side failover allows the sysadmin/dba to configure the profile of connection availablity on the server using a service. Users are effectively unaware of what will happen in the advent of a node in the cluster failing. Previously in (9i and 10g) users needed an entry in their local tnsnames file to describe which nodes they could failover to and which nodes were used to load balance connections on. Unless you used a remote naming service to maintain the connection information every time you added or removed nodes from a cluster it meant an update to potentially hundreds or thousands of tnsnames files.

This was simplified with easy connect in 10g release 1 which allowed the creation and connection to a service specified on the server. For the first time users only needed to connect to a nominated "listener node" and know the name of the service, for example imagine we have a nominated server inside of our organisation called "oracleservice", this of course is the name used for our virtual ip that will float between our cluster of listeners. In 10g release 1 we could create a service called "orderentry" using either dbca or srvctl that would allow our users to connect to it using a connect string of the form

sqlplus soe/soe@//oracleservice/orderentry

This greatly simplifies Oracle network maintenance. In some cases it could mean the removal of tnsnames files from the client or application server. It has other advantages for the DBA as well. If some business event occurs that requires the provisioning of a new application or a new resource profile for a short period of time the DBA can provision it in seconds and trivially remove it when it is no longer required.

Sadly in Oracle10g release 1 this functionality didn't support Transparent Application Failover (TAF), this meant that DBAs still needed to maintain tnsnames files contain a description of what nodes a service could failover on. The good news is that in Oracle10g release 2 this all changed. DBA's could set up a service specifying TAF and the Oracle OCI layer would use this definition provided by the server to describe the load balancing and failover profile.

Implementing this functionality is pretty trivial but there is a step that might catch you out. So lets go through it step by step

To set up the service you can use either Oracle DBCA, the DBMS_SERVICE package, Enterprise Manager or srvctl. The choice is entirely dependent on what you have running. DBCA or Enterprise manager provide the simplest mechanism but you will still have to run the final step using the dbms_service package to tell the database about its failover profile.

I'll use the DBMS_SERVICE package and srvctl for the sake of brevity. In the following example I have a database called db10g2 with two instances db10g21 and db10g22. Im going to create a service called "orderentry" that will provide transparent application failover between the two instances.

The first step is to create the service using srvctl

srvctl add service -d db10g2 -s orderentry -r "db10g21,db10g22" -a "db10g21,db10g22" -P BASIC

and check on its status

$ > srvctl status service -d db10g2 -s "orderentry"
Service orderentry is not running.

So we'll have to start the service first

$ > srvctl start service -d db10g2 -s "orderentry"

if we now use sqlplus connecting as system/sys we can see the service.

SYSTEM@db10g21 > select SERVICE_ID, NAME, NETWORK_NAME, failover_method from dba_services;

id Name Network Name Failover
--- ------------------ ------------------------- ------------
1 SYS$BACKGROUND
2 SYS$USERS
3 orderentry orderentry


The thing to note is that the service hasn't got a failover profile associated with it. So we'll have to modify it using the DBMS_SERVICE package

SYS@db10g21 > get t1.sql
1 begin
2 DBMS_SERVICE.MODIFY_SERVICE(
3 service_name => 'orderentry',
4 failover_method => DBMS_SERVICE.FAILOVER_METHOD_BASIC,
5 failover_type => DBMS_SERVICE.FAILOVER_TYPE_SELECT,
6 failover_retries => 180,
7 failover_delay => 5);
8* end;
SYS@db10g21 >

if we now select the service information again

id Name Network Name Failover
--- ------------------ ------------------------- ------------
1 SYS$BACKGROUND
2 SYS$USERS
3 orderentry orderentry BASIC

We can now test the service using sqlplus.

sqlplus soe/soe@//node1/orderentry

SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jan 18 14:11:47 2006

Copyright (c) 1982, 2005, Oracle. All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SOE@//node1/orderentry >

So all we need to do now is to fire up swingbench and use the service we've created.

[oracle@node1 bin]$ ./charbench -cs //node1/orderentry -dt oci -uc 30 -a
Author : Dominic Giles
Version : 2.2

Results will be written to results.xml.
Users : 30 TPM : 272 Nested TPM : 0

If we log onto the database we can see that the connections have being balanced across the two nodes

SYS@db10g21 >;

1 select instance_name, count(1) usercount, nvl(username,'INTERNAL') user_name,
2 failover_type, failover_method
3 from gv$session s, gv$instance i
4 where s.inst_id = i.inst_id
5 group by instance_name, username, failover_type, failover_method
6* order by username, instance_name
SYS@db10g21 > /

Instance No. of Users Username Fail Over Type Fail Over Method
---------- ------------ ---------- ------------------ ------------------
db10g21 15 SOE SELECT BASIC
db10g22 15 SOE SELECT BASIC
db10g21 6 SYS NONE NONE
db10g22 6 SYS NONE NONE
db10g21 23 INTERNAL NONE NONE
db10g22 25 INTERNAL NONE NONE

so lets shut down of the instances

SQL*Plus: Release 10.2.0.1.0 - Production on Wed Jan 18 15:28:22 2006

Copyright (c) 1982, 2005, Oracle. All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SYS@db10g22 > shutdown abort;
ORACLE instance shut down.
SYS@db10g22 >

And re-query the session profile

Instance No. of Users Username Fail Over Type Fail Over Method
---------- ------------ ---------- ------------------ ------------------
db10g21 30 SOE SELECT BASIC
6 SYS NONE NONE
25 INTERNAL NONE NONE


There's a lot more thats possible using the service approach to database connection but I'll discuss that in another blog.





blog comments powered by Disqus