Yesterday, I was working on a simple Golden Gate replication between two Oracle databases. Everything worked as expected, the source and target schemas were synchronized and the replication lag was about 0 seconds.
Unfortunately; after a couple of hours, I notice a 2 hours lag on the replicat…
After some research on GoldenGate and with some queries on v$session and v$sql, I realize that the lag is due to a 100 million rows table on which there is no primary key or unique key defined.
Important point: I was not allowed to add constraints on the source database!
How does Golden Gate handle this situation?
If no unique key is defined for a table, all columns of the table will be used to represent the key. Let’s illustrate this sentence with one simple example.
On the source database, we have a table “TABLE1” on schema “ARO” without primary key or unique key. The table looks like that:
SQL> desc ARO.TABLE1 Name Null? Type ----------------------------------------- -------- ---------------------------- COL1 VARCHAR2(30 CHAR) COL2 VARCHAR2(10 CHAR) COL3 NUMBER COL4 VARCHAR2(10 CHAR) SQL> select * from ARO.TABLE1; COL1 COL2 COL3 COL4 ---------- ---------- ---------- ---------- hello col2_1 1 col4_1 world col2_2 2 col4_2 hello col2_3 3 col4_3 world col2_4 4 col4_4
Now, imagine that on source database, a user updates the table:
SQL> update ARO.TABLE1 set col1='hello' where col1='world'; 2 rows updated.
In the background, Golden Gate will collect the values from columns “COL2”, “COL3” and “COL4” from the updated rows even if they are not updated.
On the target database, Golden Gate will “translate” the update as follows:
update ARO.TABLE1 set col1='hello' where col1='world' and col2='col2_2' and col3 = 2 and col4 = 'col4_2'; update ARO.TABLE1 set col1='hello' where col1='world' and col2='col2_4' and col3 = 4 and col4 = 'col4_4';
This operation involve two full table scan on target database for only one update on source database!
In addition, in the Golden Gate report file, you will see this type of warning:
WARNING OGG-06439 No unique key is defined for table TABLE1. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key.
How to optimize replication in such a situation?
Identify column(s) that form a unique value
First: you must identify one or more columns that will form a unique value! If your table does not contain such columns, unfortunately, there is no solution for you.
If you don’t know which columns form a unique key, you can use a query like:
SQL> select COL3, COL4, count(*) from TABLE1 having count(*) > 1 group by COL3, COL4; no rows selected
Here, I want to test if COL3 and COL4 form a unique value. If the query does not return any row, the value is unique. Else, you need to try with others columns.
Once you have identified the columns that form a unique value, you have two cases.
If you are allowed to add a constraint on the source database, you only need to add a unique constraint on the identified column(s). Otherwise, you will need to modify your Golden Gate replicat parameters and add an additional index to your target database. That’s what I am going to show you here.
Add KEYCOLS to replicat
In my example, it is obvious that the values of “COL3” are unique. Therefore, I will tell Golden Gate that it can consider this column as unique although no constraints have been defined on the source table. For that purpose, I will use the “KEYCOLS” option of the “MAP” parameter.
Connect to Golden Gate on your target database with GGSCI, and edit your replicat parameters:
GGSCI (labo1) 1> edit param MYREPL
Original parameter file:
(...) MAP ARO.*, target ARO.* ;
Modified parameter file:
(...) MAP ARO.TABLE1, target ARO.TABLE1, KEYCOLS (COL3); MAP ARO.*, target ARO.* ;
If you need to use multiple columns to identify unique row, you can add them in the “KEYCOLS” option:
(...) MAP ARO.TABLE1, target ARO.TABLE1, KEYCOLS (COL3, COL4); MAP ARO.*, target ARO.* ;
The order is important because Golden Gate will use the first rule of the list when it will encounters a query on “ARO.TABLE1”.
Then, save the file, and restart the replicat:
GGSCI (labo1) 2> stop MYREPL GGSCI (labo1) 3> start MYREPL
Now, the first update of my example will be “translated” this way by Golden Gate:
update ARO.TABLE1 set col1='hello' where col3 = 2; update ARO.TABLE1 set col1='hello' where col3 = 4;
Better! However, if you do not have an index on the “COL3” column of the target table, Oracle will still do a full scan of the table.
Add an index on the unique column(s)
Lastly, we need to add an index on the identified columns in order to get rid of the full table scan.
Nothing difficult here, we just have to connect on the target database and add the new index:
SQL> CREATE INDEX "ARO"."OGG_TABLE1" ON "ARO"."TABLE1"(COL3);
Your Golden Gate replicat will now perform an index scan instead of full table scan!
Warnings in report files
You may notice a new warning message in your Golden Gate report file after you added the new MAP condition (if you already have a MAP on the entire schema):
INFO OGG-06506 Wildcard MAP resolved (entry ARO.*): MAP "ARO"."TABLE1", target ARO."TABLE1". WARNING OGG-06439 No unique key is defined for table TABLE1. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key. WARNING OGG-02081 Detected duplicate TABLE/MAP entry for source table ARO.TABLE1 and target table ARO.TABLE1. Using prior TABLE/MAP specification.
This is not a problem! You can safely ignore this message. As you placed the MAP condition with KEYCOLS before the MAP condition for the entire schema in your replicat parameters file, Golden Gate will only consider the first one for table “ARO.TABLE1”.
Stay tuned for more DBA stuff!