Database technology has seen great advances in the last 20 to 25 years. Despite these advances few basic issues still remain to be addressed. Designers need to be aware of these short comings and cover up these short comings by design.
- Lack of Support for Change: “Change is the constant” is the famous and common statement in the software field. Due to constant changes, we are all faced with need to change database table structures, quite frequently. Databases support these changes through the alter table structure statement. But Databases do not support these changes by effective date. When we add a new field, all historical rows will get this new field and if this field is a mandatory field, we will have to provide a default value for all historical rows. Similarly, if a field has to be deleted, then the field will be removed from all historical rows as well. Clearly, support for change is not present in current databases.
- Transaction Atomicity Problem: Databases support atomic transaction updates very well by supporting begin, commit and roll back. By using Begin, Commit and rollback we can ensure the “All or None” principle of transactional updates. However, this atomicity is enforced only until a commit has been done. After the transaction has been completed, one can partially remove / undo some of the updates that were part of a previously executed transaction. Hence persisted atomicity is not supported. This can also be explained in an alternative manner – cause and effect are loosely coupled in a database. Cause and effects may be modeled / represented by several updates in a database as a single transaction. Post commit, we can either, remove the cause and retain the effects, or we can remove all or some of the effects and retain the cause. This is the basis for all database level frauds or tampering. The fundamental reason for this problem is because databases do not support differentiation of fundamental and derived fields. Fundamental fields are manually / externally provided fields and derived fields are derived either on a one time basis or on a continuous basis. In a banking system, a term deposit maturity amount is a onetime derived field, whereas a savings accounts balance field is a continuously derived field. If databases can store explanations for derived fields, then a cause and effects can be tightly coupled, making the databases a little bit more secure and a little bit more compliant to business integrity requirements.
- Lack of support for Locking Granularity: Databases support row level locking very well but do not support field (table column) level locks, or locks for a collection of fields or a single lock for multiple rows across multiple tables. To get a field level lock, we always have to lock a full row. And when we have to lock rows from multiple tables, deadlocks are possible if different programs (possibly written by different developers) attempt locking of these rows in different orders.
- Lack of support for Static Locks: Dynamic locks are held for the duration of a transaction and are released on commit or rollback. A static lock is a persistent lock that can lock a full row or lock a specific column or a set of specific columns. A static lock is explicitly acquired and explicitly released by a person / user. A static lock prevents modification to the row / field that has been locked. Static locks can be very useful in collaborative document creation or data management. A good example will be making of an inspection report in a bank which is being made by multiple inspectors in a collaborative manner. Static locks can also be used for supporting privacy requirements.
- Lack of support for Constant Fields or Immutable Fields: Programming languages have support for constant fields and immutable fields. Constant fields get values on row insertion and cannot be changed thereafter. Whereas an immutable field can be set and changed till it is locked for further changes. An immutable field is kept open for changes for a relatively longer time, as compared to a constant field. Tables storing audit trails or log files must be designed to have all immutable fields. Fixed term banking deposits have several immutable fields like the deposit amount, deposit interest rate, maturity date, etc.
- Improper Bulk Transaction Management Support: A bulk file to be processed may be composed of several elementary transactions and quite often we see a need to have the entire bulk file to be treated as one atomic transaction. If the bulk file is large, say containing millions of records, then a bulk file transaction may result in the locking of a large number of rows. This is true, even if nested transaction facility of databases is used. Databases do not support nested transactions, where committed sub transaction data releases locked entities in the sub transaction and provides visibility of updated data, outside of the bulk transaction. This may require database technology support, user supplied compensation / roll back code to be executed in case of bulk file transaction rollback.
- Lack of Support for Intelligent Caching: Databases support in memory cache or in memory copies of full tables. Caching is usually based on rows accessed and retained in memory for reuse and entries in cache memory are removed when cache grows using a suitable algorithm like the LRU algorithm. But in most of the practical enterprise class systems need to cache based on two facts:
- Database contains reference data and transaction data. Only reference data needs to be cached.
- Reference data may some rows that are highly used and some rows that are sparingly used or not used at all in the same table.
For example, in a banking application, a corporate account row in the Accounts table may be accessed very frequently as the account may see a very big volume of transactions, whereas a retail account row may only be accessed a few times in a month.
Hence when the rows are to be cached, it makes sense only to cache the heavily used rows and not the sparingly used rows. This can result in a very high cache hit rate and improve performance. The cache also can be built in advance using the historical transaction volume data.
8. Static Data Model: Some database information elements need different structures due to the diversity of the world we live in. For example, in India, person names may need initials, first name and surname and in the USA person names may be made up of last name and first names. The same person name may have different structures in the middle east. Currently, in databases, either we have to provide all possible fields to store all elementary information elements and then provide logic to use the combination as applicable, making things very complex or just simplify and use the USA model or some other model to all geographies, making a big compromise. The problem is because databases do not support a dynamic data model. The database structure and a table are tightly coupled. What is required a “water marked” field. A water marked field will be a string field or a CLOB field or a BLOB field with an associated data structure definition, selected from a list of valid data structure definitions.
For example, if we store a single field called PERSON-NAME as varchar2(100) field and associate a structure id along with the field to break the field as required, we will have a “water marked” field. In the above example, we can following structures:
Structure ID : Person-name-S01
(last-name varachar2(50) + first-name varchar2(50) with a separator = “,” OR “|” )
Structure ID : Person-name-S02
(initials varchar(10) + first-name varchar2(45) + sur-name varchar2(45) with a separator = “,” OR “|”.
SQL queries must logically provide all the sub fields in the person table structure and implicitly provide nulls for information elements not applicable and thus make the information elements in the structure also participate in the queries.
- Lack of support for Table Volume Views: Databases support views which can be subset of columns from a single table or can combine columns from multiple tables. But take the case of a transaction table that grows very rapidly, which gets millions of rows added every day. Such transaction tables grow rapidly and will reduce application performance over time. Hence data from such tables is periodically truncated and moved to a warehouse or a historical backup database. Many a times, Banks and Insurance companies have long term products extending to 25 years or more. Hence when they need transactions rows to be combined from production database table with the data from a warehouse / backup database table, designers and developers must do this themselves, as databases do not support database views over same table volumes. A Table Volume View can provide a single logical table combining production transaction table with one or more copies of transaction tables stored in the warehouse or in the backup database, providing support to eliminate duplicate rows by using the duplicate rows from the latest table volume. We can call this feature a “Back Volume Plugin”.
- Lack of support for Built-in Field Level encryption and redundancy features: Databases are used to store financial data or critical health care data. Data protection is of utmost importance to prevent frauds or wilful disruptions to operations. If databases provide a slew of encryption algorithms and provide redundant storage of select critical fields with varying encryption provisions, database security can be improved.
Enterprise application designers have to recognize the above short comings and make specific design features to cover up these short comings.