Technical Due Diligence - Relational Databases

Despite the relative popularity of NoSQL and graph databases, relational databases like MySQL, SQL Server, Oracle, and PostgreSQL continue to be indispensable for storing and managing software application data. Because of this, technical due diligence teams are practically guaranteed to encounter them within almost any project. Novice team members will gravitate towards understanding the schema which is of course important but only paints a small part of the overall risk picture. A complete research and risk assessment will additionally include information about the following database characteristics:

Licensing
Security
Performance
Disaster Recovery

I identify these three characteristics because technical due diligence is all about identifying and quantifying risk assessment, and not about nerding out over the merit of past decisions. The importance of quantifying risk assessment is in most cases no greater than when evaluating the software product's data store, for several reasons:

Poor security practices open up the possibility of application data having already been stolen, or in danger of being imminently stolen, placing the buyer in legal danger.
Poor performance due to inadequate or incorrect indexing, insufficient resourcing, or a combination of the two might result in disgruntled customers who are considering cancelling their subscription. Some of these customers may be major contributors to company revenue, severely damaging the company's outlook should they wind up departing following acquisition.
A lack of disaster recovery planning outs the buyer in greater short-term risk following acquisition due to an outage which may occur precisely at a time when personnel are not available or are not entirely up to speed.

Confirm Database License Details

Be sure to confirm all database licenses are in compliance with the company's use case, and if the database is commercially licensed you'll need to additional confirm the available features and support contract are in line with expectations. To highlight the importance of this verification work I'll point out a few ways in which expectations might not be met:

The buyer requires the data to be encrypted at-rest due to regulatory issues, however the product data is in fact not encrypted-at-rest due to use of the Heroku Essential Postgres tier which does not offer this feature. There could possibly be an easy fix here which involves simply upgrading to a tier which does support encryption-at-rest, however you should receive vendor confirmation (in writing) that encryption is indeed possible as a result of upgrading, and whether any downtime will be required to achieve this requirement.
The buyer's downtime expectations are more strict than what is defined by the cloud service provider's SLA.
TODO TALK ABOUT MEMSQL

Security Considerations

Version

All mainstream databases (MySQL, Oracle, PostgreSQL, etc) will have well-defined end-of-life (EOL) dates associated with each release. The EOL date identifies the last date in which that particular version will receive security patches. Therefore it is critical to determine what database versions are running in production in order to determine whether the database has potentially been running in an unpatched state. For instance MySQL 5.7 has an EOL date of October 25, 2023, and therefore if the seller's product is still running MySQL 5.7 after that date then it is in danger of falling prey to any vulnerabilities identified after that EOL date.

Of course, the EOL date isn't the only issue at play here. If the database version hasn't reached its EOL date then you should still determine whether the database has been patched appropriately. For instance as of the time of this writing MySQL 8.2 was released only 9 months ago (on October 12, 2023) and there are already 11 known vulnerabilities. It's entirely possible that none of these vulnerabilities are exploitable in the context of the seller's product, however its nonetheless important to catalog these possibilities and supply this information to the buyer. In my experience where there is smoke there is fire and unpatched software is often symptomatic of much larger issues associated with technical debt and a lack of developer discipline.

Connectivity and Privileges

Enterprise web applications will typically run in an N-Tier architecture, meaning the web, data, caching, and job processing components can all be separately managed and scaled. This configuration means each tier will often run on separate servers and therefore a network connection between the database and web application tiers will need to be configured. Most databases can be configured to allow for connections from anywhere (almost invariably a bad idea), which is precisely what you don't want to see when that database is only intended to be used by the web application because it means malicious third-parties have a shot at successfully logging in should they gain access to or guess the credentials.

Connecting users will be associated with a set of privileges which define what the user can do once connected to the database. It is considered best practice to assign those users the minimum privileges required to carry out their tasks. Therefore a database user which is intended to furnish information to a data visualization dashboard should be configured with read-only privileges, whereas a customer relationship management (CRM) application would require a database user possessing CRUD (create, retrieve, update, delete) database privileges.

Therefore when examining database connectivity and privileges you should at a minimum answer the following questions:

What users are defined and active on the production databases, and from what IP addresses / hostnames are they accessible?
Is the database server accessible to the wider internet and if so, why?
What privileges do the defined database users possess, and why?
To what corporate applications are production databases connected? This includes the customer-facing application, business intelligence software, backup services, and so forth.
What other non-production databases exist?
Where is production data replicated? Are these destinations located within jurisdictions compliant by the laws and SLA under which the buyer's target IP operates?

Encryption

Satisfying this review requirement is relatively straightforward, and completed in two steps:

From a security standpoint, data is often defined as being encrypted at-rest and in-transit, the former referring to its encryption state when residing in the database or on server, and the latter referring to its encryption state when being transferred from the application to the requesting user or service. You'll want to determine whether these two best practices are implemented.
If the data is not encrypted at-rest (which is typical and not necessarily an issue for many use cases), then how is sensitive data like passwords encrypted (or hashed)? You often won't be able to determine this by looking at the database itself; web frameworks will typically dictate the password hashing scheme, such as Laravel's use of the Bcrypt algorithm for this purpose.

Technical Due Diligence - Relational Databases

Confirm Database License Details

Security Considerations

Version

Connectivity and Privileges

Encryption

Performance

Indexes

Disaster Recovery

Backups