Jump to content
Lucas Scabora

Bulk dynamic insert operations into an index-organized table

Recommended Posts

Hello, 

I implemented a PLSQL procedure to create and populate an index-organized table. The procedure dynamically creates N columns for the intended table (whose value is informed by a parameter). 

For the rows, I based myself on an existing (huge) table (approximately 1,468,365,182 rows). For that, I created a cursor to manipulate one row at a time. The objective here is similar to the PIVOT operation, combining multiple rows as additional columns. The difference is that I combine these columns up to N columns, and the extra columns will be allocated in additional rows (as required). 

The PLSQL mounts each row following the criterion above and then inserts it in the index-organized table, however, the required time for the procedure is impractical (up to some days). I also tried to adapt the procedure to use FORALL and BULK COLLECT INTO, however, since I dynamically built each tuple such approach failed. Another approach was to disable commits during the insert operations, however, the time continued to be impractical. 

Is there any other approach to perform a huge amount of insert operations into an index-organized table in an efficient way?

Share this post


Link to post
Share on other sites

Hi Lucas, and welcome to the forum!

Are you loading the IOT from a flat file, or another table?

Please read:

http://www.dba-oracle.com/t_insert_tuning.htm

One point five trillion rows is gonna take awhile!

Are you CPU bound during the procedure or I/O bound?

You may benefit from deploying SSD or adding more CPU’s.

Have you explored parallel DML?

http://www.dba-oracle.com/t_parallel_dml_updates_inserts.htm

Look at using a hash partitioned IOT:

http://www.dba-oracle.com/t_index_organized_tables.htm

Also, take an AWR or STATSPACK report for a five minute interval and post the results.

please read:

http://www.dba-oracle.com/t_identify_find_bottlenecks_with_awr.htm

Good luck!

Share this post


Link to post
Share on other sites

 

Hello, 

Thanks for the advice. I'm loading an IOT from another table already existing in Oracle.

The PLSQL procedure tries to transform the following table:

id    , column
-------, --------------
1     ,    A
1     ,    B
1     ,    C
2     ,    X

into the following table (as IOT):

id    , column1  ,  column2
-------, -------------- , ------------------
1     ,       A        ,        B         
1     ,       C        ,    NULL
2     ,       X        ,    NULL

So the procedure manages a cursor that sorts the first table by its id, and iterates over the rows trying to fill the row of the resulting table. This is performed by dynamically creating a string corresponding to the VALUES clause of an insert operation. When the id changes or the number of columns exceeds N, I insert the current row (EXECUTE IMMEDIATE) and start a new one. Additionally, I added a sequential number with the id to compose the PK due to its unique constraint.

Moreover, according to the value of N (in the example, N = 2), I will perform a PIVOT operation limiting the number of columns. If the number of rows (with the same id) is greater than N, other rows are allocated to store that. The only index/constraint is the primary key, which I think is required for the IOT approach.

Following your advice, I will test the parallel and append hints and use the NOLOGGING feature. Moreover, I will further evaluate the possible database's bottlenecks with the AWR report.

While evaluating the query processing, it is mostly I/O bound. Unfortionally I cannot alter the hardware of the machine.

A (partial) solution that I adopted is generating a CSV file with the resulting table and loading it with SQLLDR. It works, but generating more than 20GB of data and copying it into the database seems redundant.

Share this post


Link to post
Share on other sites

Hi Lucas,

Do the math.  At an optimistic platter disk I/O speed of ten milliseconds, multiply by one and a half trillion!

Sadly, time takes time!

The only other idea I have is to try a parallel CTAS with an order by clause.

http://www.dba-oracle.com/t_create_table_select_ctas.htm

There are also 3rd party sorting tools that are faster than Oracle:

http://www.dba-oracle.com/t_oracle_sorting.htm

Let us know your fastest option.

Good luck!

Share this post


Link to post
Share on other sites

Hello, 

thanks for all the help. I think that will be better to export the tables as CSV files using this (your) tutorial:

http://www.dba-oracle.com/t_export table_to_csv.htm

If the data are into files, I usually sort each file's lines directly in bash using the following tutorial (splitting according to the available memory):

https://unix.stackexchange.com/questions/350046/sort-large-csv-files-90gb-disk-quota-exceeded

Thanks again for everything. If I come up with a solution inside Oracle I will let you know.

Best regards.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×