Monday, August 30, 2010

Should we keep Index and Data in separate tablespace?

Index and Data in Separate Tablespaces

Recently I got caught up in a developer-DBA argument regarding placing of database indexes in separate tablespace from data tablespace. These developers and DBAs belong to a project where all data and indexes are stored in different tablespaces.

But still there are a few indexes on a daily truncate-and-load type temporary table that are managed through the ETL code. Meaning, those indexes are dropped before batch loading and recreated after the load. The trouble here is - the ETL jobs create the index in the data tablespace instead of creating them in the index tablespace.
While DBA wants developers to change the code to create the indexes in the proper tablespace, developers’ argument is why creating indexes in a different tablespace are so important?

Why do you need a separate Tablespace for Indexes?

The DBA’s argument is they need a separate tablespace for indexes for performance reasons.

And that is off course wrong.

Putting all your indexes in a separate tablespace than your data does not increase performance (tablespace is not a ‘performance thing’.) Table data and table indexes are not read simultaneously. Oracle reads index first followed by the data. So what really needs to be achieved for performance is an evenly distributed I/O over all available devices. Separating data and index does not necessarily achieve a balanced I/O. And if one really wants to achieve an evenly distributed I/O – a far better idea will be to take all the available devices – create a really big stripe set and put everything in it.

So DBA is wrong if s/he wants it for performance reason.



But then, is there any other benefit for separating index and data tablespace?

There can be some administrative plus points for the same. For example separating index and data might help in recovery. If you place all of your indexes on a separate disk, you don't have to worry if the disk of backup becomes corrupt because you can rebuild the indexes to restore the device.

Sunday, August 8, 2010

Testing in data warehouse projects

Recently Arun Sundararaman from Accenture has posted an article on DWBI Testing in Information-management.com. The article can be found out here. The article brings in a very timely discussion on the state of testing methodologies for data warehousing projects.

DWBI testing is so far the least explored area in the data warehousing domain. Majority of data warehousing projects that fail, rarely fail in the implementation phase, rather they mostly fail in the user acceptance phase. This is largely due to the fact that end users often find their data warehouse generating unacceptable reports (Or reports generating numbers outside their "tolerance" limit) while compared to actually known business scenarios. Whatever be the root cause of that, proper testing is the only way of detecting and fixing those issues.

Unfortunately, in the current data warehousing context, the only viable method of testing is through manual SQL scripting. Metadata management tools fail miserably if "SQL Override" or "Stored Procedures" are used in the ETL phase. But that's not the only real problem of automated testing. The main issue is we are yet to come up with a generic testing strategy for data warehouse data reconciliation method.

I believe this is a high time when data warehousing practitioners, both individual and organizations, take data warehousing testing seriously and develop a common methodology for the same.