Recovery of Data Dependencies From Program Source Codes
The approach is based on the program path patterns for implementing the most commonly used methods for enforcing data dependencies.
We believe that the approach should be able to recover majority of data dependencies designed in database applications.
A prototype system has been implemented for the proposed approach in UNIX by using Lex and Yacc.
Many of the world's database applications are built on old generation DBMSs.
Due to the nature of system development, many data dependencies are not discovered in the initial system development; they are only discovered during the system maintenance stage.
Although keys can be used to implement functional dependencies in old generation DBMSs, due to the effort in restructuring databases during the system maintenance stage, many of these dependencies are not defined explicitly as keys in the databases.
They are enforced in transactions.
Most of the conventional files and relational databases allow only the definition of one key.
As such, most of the candidate keys are enforced in transactions.
The feature for implementing inclusion dependencies and referential constraints in a database is only available in some of the latest generations of DBMSs.
As a result, most of the inclusion dependencies and referential constraints in legacy databases are also not defined explicitly in the databases and are enforced in transactions.
To avoid repeated retrieval of related records for the computation of a total in query and reporting programs, the required total is usually maintained and stored by some transactions that update the database such that other programs can retrieve them directly from the database.
As such, many sum dependencies are maintained by transactions in database applications.
In summary, much of the functional dependencies, key constraints, inclusion dependencies, referential constraints, and sum dependencies in existing database applications are enforced in transactions.
Therefore, transactions are the only source that can accurately reflect them.
The proposed approach can be used to automatically recover these data dependencies designed in database applications during the reverse engineering and system maintenance stages.
These dependencies constitute the majority of data dependencies in database applications.
In the case that data dependencies are jointly enforced from schema, transactions, and their GUI (graphical user interface) definitions, the approach is still applicable.
The data dependencies defined explicitly in database schema can be found from the schema without much effort.
The GUI definition for a transaction can be interpreted as part of the transaction and analyzed to recover data dependencies designed.
All the recovered data dependencies designed form the design of data dependencies for the whole database application.
Extensive works have been carried out in database integrity constraints that include data dependencies.
However, these works mainly concern enforcing integrity constraints separately in a data management system (Blakeley, Coburn, & Larson, 1989; Orman, 1998; Sheard & Stemple, 1989) and the discovery of data dependencies hold in the current database (Agrawal, Imielinski, & Swami, 1993; Andersson, 1994; Anwar, Beck, & Navathe, 1992; Kantola, Mannila, Raiha, & Siirtola, 1992; Petit, Kouloumdjian, Boulicaut, & Toumani, 1994; Piatatsky-Shapiro & Frawley, 1991; Signore, Loffredo, Gregori, & Cima, 1994; Tsur, 1990).
No direct relationship exists between the former work and the proposed approach.
The distinct difference between Tan's work and the latter work is that the proposed approach recovers data dependencies designed in a database, whereas the latter work discovers data dependencies hold in the current database.
A data dependency that is designed in a database may not hold in the current database, due to the update by the transactions that were developed wrongly during the earlier stage, or to the update by the query utility without any validation.