Databases Stored Procedure for Automated Change-log

Assume you have a table where each row contains various features about customers. These features can be static (eg. account creation time, gender, etc) and/or dynamic (eg. “number of searches”, “last visited on”, etc). One of the challenges encountered in maintaining such a table is that different processes updated these features at different times. For instance, some of the dynamic features can be real-time whereas others might be coming from Hadoop and might have 24-hour latency. From a debugging perspective having a table that gets updated by different processes can be a nightmare as there is no way to travel in time and determine how feature values have changed over time. An obvious solution is to keep a changelog.

While there are many ways to create such a changelog, this post focuses on leveraging database stored procedures to automatically create changelog when inserting or updating features. Databases stored procedure allows defining custom processing on inserted/updated data in the database and gets triggered during regular CRUD (Create-Read-Update-Delete) operations. I particularly like this approach as it’s way more tightly integrated with the databases and keeps the application layer cleaner and leaner.

There are three steps to get the stored procedure working. The post below provides an example of the three steps. In the example below, the aim is to observe any kind of insert or update operation on one table and create an entry in the table so that we can keep of how features in table changed over time.

Step 1: Define the table to observe
To demonstrate how stored procedures work, let’s create two tables. As defined below, the table contains various features about customers. We also define another table, changelog, to capture changes (when and how) in customer features.

Step 2: Define the task to perform
Next, as shown below, we create a function that will listen to changes in the table and add an entry in the table. The function has access to many useful variables but the two most important are “OLD” and “NEW”. As the name suggests, the “OLD” variable contains a database row that will be updated before changes. The “NEW” variable contains a copy of the row after changes. Below, we convert the updated row into JSON blob and remove “id” as we are separately storing it. This will become clear once we insert rows into our customer feature table below.

Step 3: Define Triggers to connect the observed table and the task to perform
Lastly, we create separate triggers to bind the above function to the and the operations on the table.

Now, the fun begins. Inserting a new record or updating an existing row in the table automatically creates a new record in the changelog table.

That’s pretty awesome. You can get pretty creative with stored functions. For instance, one might notice above changelog contains a copy of all the features, even those that haven’t changed. One can easily modify stored functions as below to only capture features that have changed. However, be mindful that having complex stored procedures will increase your CURD operation time and also consume more CPU.

Senior Machine Learning Engineer, Varo Money; Contributor and Maintainer of sklearn-pandas library

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store