top of page
ClsoeIT Logo

Audit trail handling with MongoDB

A few years ago, we faced the important task of implementing an audit trail to the CRM web application. The term may sound fancy, but it basically means a history of changes. It doesn't really matter if the company is a subject of an audit, it is always useful to have a list of changes in the system with the author name and timestamp attached. I was excited to implement the audit trail because, at that time, I was working in my free time on a different project for which I envisioned a system based on storing just differences.

We have several different entities in the CRM and I didn't want to make a specific implementation for each one. The ideal state was, that the audit trail could be easily connected to any entity in the system without any additional knowledge of tracked entities.  I and Tibor designed a service that takes advantage of MongoDB. This popular NoSQL database stores data as JSON-like documents. We took advantage of this fact by implementing a service that calculates the difference between an old state a new state of the tracked entity. This difference is represented as JSON patch and saved to a chronologically sorted list.

Audit trail scheme with MongoDB

By applying these JSON patches to the current document, we can travel back to any version of the document. This system will also preserve schema changes, which is neat. The older the version is, the more performance heavy the operation is due to the application of patches. To mitigate this problem, only a limited amount of patches is saved together with the full document - we call this entity an Audit Page. This means the number of operations is always limited by the maximum count of patches on the Audit Page. Getting the historical version means finding the correct Audit Page and application of patches to the full document from the page. The example above shows only 4 patches per page, but we use way more in the CRM. The specific number depends on how large (complicated) is the tracked document.

Subtasks Navigation for Jira promotion Banner

Some of our tracked entities are more complex, some of them less. Showing all types of them in the same format in GUI was tricky, but we agreed the simple text representation would be enough. We implemented a transformation function per each document type, which transforms JSON to a suitable text representation. In case of an update, both texts are rendered next to each other, and differences are highlighted similarly to how GitHub highlights the changes.

Audit trail GUI

I recently attended Data + AI Summit 2021 and learned, how Delta Lake storage works internally. Delta Lake is a storage layer for Spark, which saves tables as files. It was nice to realize this interesting technology uses a similar system of differences interlaced with full snapshots under the hood. It allows the same time traveling capabilities, but on a table level.

Related Posts

See All

Validating and generating Atlassian JWT

When developing a add-on to cloud Jira, a usual need is to communicate with the Jira using REST API. The authentication is done using JWT and it took us a while to figure out how to validate and gener

MGS integration with antivirus

One of the MGS features is to manage model-related files and documents. Of course, common and other model non-related files can be uploaded also to the public folder, users' private folder, or shared

Flattening Docker images

Docker images are stored as layers. To be more precise, the filesystem is layered, so each change (RUN, COPY,…) will result in adding a new layer. This approach has many advantages - images can be bui


bottom of page