In this episode we’re talking database technology. Specifically, we’re talking about how we have moved to NoSQL databases as standard as we are designing for load. There’s a lot of ground to cover ranging from the underlying computer science to the choices you can make to get started.
The highlights of the podcast are:
- We begin by referencing the 7 Outcomes, especially those that are most relevant to NoSQL technology. Depending on the use case, NoSQL can assist with Rapid Delivery. Usually the drivers for NoSQL are Available & Scalable and Costs Optimised.
- We explore relational databases, their strengths and how they achieve a highly consistent, high-quality data model.
- We look at the problems that come with traditional relational databases, especially the issues of scale and availability.
- Horizontal scale is only achieved through a partitioning strategy. These strategies used to be implemented through system design but over time we have been able to take advantage of technologies that have already solved these problems.
- Availability is achieved through ensuring multiple copies of data are stored on separate machines. In the event of a failure we can fall back on a different copy of the data to ensure continuity of service. There are different strategies for achieving this, and the choice of technology depends on your design goals and use case.
- Data consistency: there is always a trade-off between consistency and performance. Technologies such as Cassandra that write to a log and have a background process to update the current data set give high write performance but lower consistency. Technologies such as MongoDB that ensure repeatable reads are necessarily slower but allow a more consistent experience when reading data.
- There’s a gratuitous Only Fools and Horses joke thrown in for good measure.
- The choices we have made for our NoSQL:
- Fit for purpose, match the technology to the use case (rapid delivery)
- Can host in Kubernetes (at least, host the compute then map the storage)
- Efficient runtime (costs optimised)
- Highly available and scalable
- We look at the GLU NoSQL stack and the reasons we have chosen each:
- ElasticSearch: highly flexible and available with the way it manages nodes. We use this for high volumes of data that we need to query in a flexible manner, such as diagnostic and audit information.
- Prometheus: geared towards time-series data, especially when this data needs to be examined in a near-real-time manner. We use this for system performance data and dashboard it using Grafana.
- ScyllaDB: this is a Cassandra-like database that is more efficient on compute as there is no Java involved (a JVM needs tuning as part of a performance test cycle). Very fast, especially for writes.
- ArangoDB: This is used as a multi-model database. We are especially interested in the graphing model as we can add and traverse relationships between entities. This is great for product recommendations, social information and other cases where relationships between different entities need to be captured and used in real time.
- We also mention cloud-hosted NoSQL databases, notably that Cosmos is an interesting one to consider.
You can watch the podcast here:
The audio version is here: