Checking Data Type Consistency in Oracle
In large databases it can be a challenge to have data type consistency across many tables and views, especially since SQL does not understand PL/SQL’s
When designing the overall structure of the tables, tools such as SQL Developer’s Data Modeller can be used to reduce the pain associated with potential data type inconsistencies.
However, as databases grow and evolve, data types may diverge and cause headaches when moving data back and forth.
Here I present a utility to identify and automatically fix many of these issues.
In a previous post I introduced the
utils package, which included subprograms relevant to both PL/SQL and SQL.
Since such a mixture ruins the serial reusability, which is great for reducing the memory footprint of a package, I decided to split the package into two components:
The latter is mainly interesting for functions that you can call from SQL statements, whereas the former is an extension of what was the package
utils but with
If you want the gory details, you can read the commit message Create logic for data type checks on the repository and the comments in the code.
Anyway, the package
plsql_utils contains a procedure called
fix_data_type_issues that depends on a view
data_type_issues, which are all available on the Bitbucket repository.
The view lists which data type is more common and this is used by
plsql_utils.fix_data_type_issues to automatically resolve the inconsistencies.
What the procedure basically does is check whether the table is empty or not and either alters the table in place or uses
DBMS_REDEFINITION to redefine the table online.
DBMS_METADATA API is used to generate the DDL statement without constraints but with partition and storage clauses, so that the table after redefinition matches the original table exactly.
There is also a check whether the table has a primary key and if so uses that to move the data from the original to the interim table; if there is no (enabled) primary key, the ROWID is used.
You can install the required objects by executing the following script from the directory, which is the same as
SCRIPT - deployment.pls on my Git repository:
@'TABLE - error_log.sql' / @'PACKAGE - errors.pks' / @'PACKAGE - errors.pkb' / @'PACKAGE - type_defs.pkg' / @'PACKAGE - plsql_utils.pks' / @'PACKAGE - plsql_utils.pkb' / @'PACKAGE - sql_utils.pks' / @'PACKAGE - sql_utils.pkb' / @'VIEW - data_type_issues.sql' /
The forward slashes have been added so that the script runs successfully in SQL*Plus too.
Shall we run the procedure on a couple of dummy tables?
CREATE TABLE t1 ( id NUMBER NOT NULL , num NUMBER(4,2) , text VARCHAR2(100) , another_num NUMBER(6,2) , CONSTRAINT t1_pk PRIMARY KEY ( id ) , CONSTRAINT t1_text_ck CHECK ( LENGTH(text) > 10 ) ); INSERT INTO t1 SELECT ROWNUM , DBMS_RANDOM.NORMAL , DBMS_RANDOM.STRING('U',15) , DBMS_RANDOM.NORMAL FROM dual CONNECT BY LEVEL <= 100; CREATE TABLE t2 AS SELECT * FROM t1 WHERE 1=2; ALTER TABLE t2 MODIFY ( num NUMBER(5,3) ); ALTER TABLE t2 MODIFY ( another_num NUMBER(8,2) ); CREATE SMALLFILE TABLESPACE sandbox1_ts DATAFILE 'df_sandbox1.dat' SIZE 10M AUTOEXTEND ON; CREATE SMALLFILE TABLESPACE sandbox2_ts DATAFILE 'df_sandbox2.dat' SIZE 10M AUTOEXTEND ON; CREATE TABLE t3 PARTITION BY HASH (id) PARTITIONS 3 STORE IN (sandbox1_ts, sandbox2_ts, sandbox1_ts) AS SELECT * FROM t1 WHERE 1=2; ALTER TABLE t3 MODIFY ( text VARCHAR2(20) ); ALTER TABLE t3 MODIFY ( another_num NUMBER(8,2) ); INSERT INTO t3 SELECT * FROM t1;
What these SQL statements do is create three tables with the same column names but
t1 has a different data type for
fix_data_type_issues procedure in
plsql_utils is supposed to do is fix these issues based on the most prevalent data type available, so at the end we expect all three tables to have the same structure:
Name Null Type ----------- -------- ------------- ID NOT NULL NUMBER NUM NUMBER(4,2) TEXT VARCHAR2(100) ANOTHER_NUM NUMBER(8,2)
t3 contain data, so that means the
DBMS_REDEFINITION package will be used.
Technically, we could get away with a simple
ALTER TABLE ...
MODIFY statement whenever the data type change does not involve a modification of the precision or scale even when there is data in the table.
Of course, the data type after the redefinition must be able to squeeze in the data already in the table.
Now, let’s run the procedure:
This is what it spits out on my sandbox database (12c).
SANDBOX.T2: NUM - NUMBER(5,3) [1x] -> NUMBER(4,2) [2x] Successfully redefined. SANDBOX.T3: TEXT - VARCHAR2(20) [1x] -> VARCHAR2(100) [2x] Successfully redefined. SANDBOX.T1: ANOTHER_NUM - NUMBER(6,2) [1x] -> NUMBER(8,2) [2x] Successfully redefined.
The numbers in the square brackets show the occurrences of the data type, that is, the number of tables in which the column listed appears with that particular data type.
You can verify that indeed all tables have the same data types and that any constraints, indexes, partitions, and so on are still there.
There are a few exceptional cases that I’d like to discuss in a bit more detail.
So, what does the procedure do when there is a tie?
Nothing. It shows the data type issue but informs you that ties cannot be resolved automatically, please redefine manually.
What If The Data Does Not Fit?
When the redefinition causes data not to fit anymore, the procedure says so: encountered an error during the redefinition.
These errors can be seen from the
For example, ORA-12899: value too large for column “SANDBOX”.”COPY_TAB_XNJVJ7SRXSND8LX”.”TEXT” (actual: 15, maximum: 10).
It shows you that the temporary object
COPY_TAB_XNJVJ7SRXSND8LX has values that cannot be resized to the data type suggested.
Yes, internally an object with a semi-random name is generated, which should not cause any issues with name clashes unless your developers are forced to follow a naming convention that favours gibberish.
redefine_column sub-procedure is placed inside its own
END block, the main procedure continues unabatedly.
All redefinition errors are automatically logged in the
error_log table, so you can review these later.
In case there are errors, the last line of the output will inform you of the number of errors encountered.
Excluding Certain Columns
What if you have the column
id in 500 tables with different data types and you don’t want
id to be fixed automatically?
Well, the procedure currently does not handle this situation but you can add it yourself fairly easily as most of the functionality is already baked into the package:
- Create a SQL (schema-level) type, so you don’t run into a PLS-00642 error that tells you that local collection types are not allowed in SQL statements.
In 12c you can increase the length of the
VARCHAR2to the maximum in PL/SQL, but in case you’re still stuck with 11g or below that won’t work. Just use
CREATE TYPE varchar2_ntt IS TABLE OF VARCHAR2(4000)instead. Please observe that technically we can define a nested table type with fewer characters as column names are at most 30 characters. However, the schema-level type may have other uses, so it’s often advantageous to define it as generically as possible.
- Overload the
split_objectsfunction to return the schema-level type
- Add a formal parameter to
exclude_cols_in, that takes as its argument a list of column names, just like
find_tableprocedure. Remember to modify the procedure’s signature in the package specification accordingly.
- Call the overloaded
WHERE column_name IN (SELECT * FROM TABLE(exclude_cols_in))to the cursor, which requires you to either call the
split_objectsfunction in the declaration section before you define the cursor or move all the assignments to the main block. In the correct sequence, that is. The appearance of
TABLEis an example of the so-called table expression.
Complex/Custom Data Types
Most tables use the standard data types provided by Oracle.
Sometimes a table may hold composites, such as nested tables, as columns.
In this case the
fix_data_type_issues procedure tells you that custom data types cannot be automatically redefined at present.
This functionality could be added to the package but that is something for another day, as it requires the package to dynamically build the constructor function and provide a custom mapping of the columns required.
Such special cases can be done best manually.
Please check Oracle’s Database Administrator’s Guide for an example of an online redefinition with object types.
In order to compile and run the subprograms in the package, advanced replication must be enabled. You can check this by using the following query:
SELECT value FROM v$option WHERE parameter = 'Advanced replication';
Without it, you’ll run into trouble.
EXECUTE_CATALOG_ROLE cannot be used to execute
DBMS_REDEFINITION from the package because roles are disabled when executing (or compiling) stored PL/SQL code.
Hence, you have to grant the
EXECUTE privilege on
After you’re done fiddling with the demo tables, you can kick them out of the database:
DROP TABLE t1; DROP TABLE t2; DROP TABLE t3; DROP TABLESPACE sandbox1_ts INCLUDING CONTENTS AND DATAFILES; DROP TABLESPACE sandbox2_ts INCLUDING CONTENTS AND DATAFILES;
And that’s it.
I hope you like the new features in