OK. Here is a simple task that I am trying to achieve all through out tonight.
I have a big (infact very big) table on one database (SRCDB) and I am trying to pull the data from that table to a table in different database (TGTDB). Both SRCDB and TGTDB reside in different HP Unix servers connected over network. And I have only SELECT privilege on the SRCDB table. The table has no index (And I am not allowed to create an index, or any database object for that matter, in the SRCDB). But the SRCDB table has many partitions, only one of which I am supposed to pull from that.
Let's suppose the SRCDB table has 10 partitions. Each partition has 500 million records. And as I said above, I need to pull data from only one partition to the target. So what will be the best suited strategy here?
When I started to think about this, following options came into my mind:
1. Using Transportable Table Space
2. Using CTAS over DBLink
3. Using direct load path
4. Copy command
5. Data Pump
I started with transportable table space as this is supposed to be the fastest one as it copies the whole table space from one database to the other. But I soon realized that I can not use the transportable table space option as I do not have any DDL permission on the source side. In order to use transportable table space, one needs to alter the tablespace to make it read only like below,
ALTER TABLESPACE src_tbl_space READ ONLY;
Once I ruled out the transportable tablespace option, it didn't take much time for me to rule out the Copy option also. Not only Copy option is deprecated, the same is inherently slower than the other options. So now at this point I had only CTAS and Direct path load options open. To use any one of them, I knew that I must first create a dblink at my end to access the data from remote end. DBLink will help me to SELECT remote database data as easily as I SELECT my local system data. So without wasting much time, I first created a DBlink and wrote a direct path insert script like below,
INSERT /*+ APPEND */
INTO TGT_TABLE
SELECT *
FROM SRC_TABLE;
Voila! This method is faster than the conventional insert (that is, simple INSERT INTO statement) because, this method writes data directly in the datafiles bypassing the buffer cache. This method also ignores any referential integrity constraints and starts the loading after the high-water-mark of the target table - thereby - increasing the performance. So this looked like a perfect solution. But then I paused.
Somewhere I heard CTAS (Create table as select) is faster than direct path method. So I thought of giving it a try. The only issue with CTAS is - it creates the target table - so if my target table is pre-existing, I can not use this method. Fortunately, my target table is not pre-existing, so I can use CTAS. Moreover CTAS statements that reference remote objects can run in parallel. So I fired a statement like this,
CREATE TABLE TGT_TABLE
NOLOGGING PARALLEL 8
AS
SELECT *
FROM SRC_TABLE partition (my_partition);
And guess what? I encountered following,
ORA-14100: partition extended table name cannot refer to a remote object.
(And this point I realized - even a simple task like this can have so much hard press me. But I guess I can't help it - that's the life with databases!)
So I realized the only option for me is to use WHERE clause and hope for partition pruning. So I did,
CREATE TABLE TGT_TABLE
NOLOGGING PARALLEL 8
AS
SELECT *
FROM SRC_TABLE
WHERE record_date between to_date('02-Jan-2010', 'DD-Jan-YYYY') and to_date('10-Jan-2010', 'DD-Jan-YYYY');
And finally I am done. At the end of the exercise - I know how to bring a big table from remote database in the most efficient way.
Loading Huge Table Over DBLINK
I have a big (infact very big) table on one database (SRCDB) and I am trying to pull the data from that table to a table in different database (TGTDB). Both SRCDB and TGTDB reside in different HP Unix servers connected over network. And I have only SELECT privilege on the SRCDB table. The table has no index (And I am not allowed to create an index, or any database object for that matter, in the SRCDB). But the SRCDB table has many partitions, only one of which I am supposed to pull from that.
Let's suppose the SRCDB table has 10 partitions. Each partition has 500 million records. And as I said above, I need to pull data from only one partition to the target. So what will be the best suited strategy here?
My Options
When I started to think about this, following options came into my mind:
1. Using Transportable Table Space
2. Using CTAS over DBLink
3. Using direct load path
4. Copy command
5. Data Pump
Comparing Copy, Direct Path and CTAS
I started with transportable table space as this is supposed to be the fastest one as it copies the whole table space from one database to the other. But I soon realized that I can not use the transportable table space option as I do not have any DDL permission on the source side. In order to use transportable table space, one needs to alter the tablespace to make it read only like below,
ALTER TABLESPACE src_tbl_space READ ONLY;
Once I ruled out the transportable tablespace option, it didn't take much time for me to rule out the Copy option also. Not only Copy option is deprecated, the same is inherently slower than the other options. So now at this point I had only CTAS and Direct path load options open. To use any one of them, I knew that I must first create a dblink at my end to access the data from remote end. DBLink will help me to SELECT remote database data as easily as I SELECT my local system data. So without wasting much time, I first created a DBlink and wrote a direct path insert script like below,
INSERT /*+ APPEND */
INTO TGT_TABLE
SELECT *
FROM SRC_TABLE;
Voila! This method is faster than the conventional insert (that is, simple INSERT INTO statement) because, this method writes data directly in the datafiles bypassing the buffer cache. This method also ignores any referential integrity constraints and starts the loading after the high-water-mark of the target table - thereby - increasing the performance. So this looked like a perfect solution. But then I paused.
Somewhere I heard CTAS (Create table as select) is faster than direct path method. So I thought of giving it a try. The only issue with CTAS is - it creates the target table - so if my target table is pre-existing, I can not use this method. Fortunately, my target table is not pre-existing, so I can use CTAS. Moreover CTAS statements that reference remote objects can run in parallel. So I fired a statement like this,
CREATE TABLE TGT_TABLE
NOLOGGING PARALLEL 8
AS
SELECT *
FROM SRC_TABLE partition (my_partition);
And guess what? I encountered following,
ORA-14100: partition extended table name cannot refer to a remote object.
(And this point I realized - even a simple task like this can have so much hard press me. But I guess I can't help it - that's the life with databases!)
So I realized the only option for me is to use WHERE clause and hope for partition pruning. So I did,
CREATE TABLE TGT_TABLE
NOLOGGING PARALLEL 8
AS
SELECT *
FROM SRC_TABLE
WHERE record_date between to_date('02-Jan-2010', 'DD-Jan-YYYY') and to_date('10-Jan-2010', 'DD-Jan-YYYY');
And finally I am done. At the end of the exercise - I know how to bring a big table from remote database in the most efficient way.
Very useful article. However, I think you might consider trying Oracle data pump (depends on the version of Oracle you are using). Data pump is quite faster as per my experience.
ReplyDeleteTruly interesting information. Good to read.
ReplyDelete- Ananda
Cool and that i have a neat provide: How To Budget House Renovation home kitchen remodel
ReplyDelete