The Objective Blog

Keep up with what we're thinking, reading, and doing.

The Database Dilemma: Part 2

February 19th, 2020 - by Casey Harding - Salt Lake City, Utah

If you missed it, in Part 1 we glossed over some of the benefits and limitations associated with the two prevailing database paradigms, relational and non-relational databases. In Part 2, we will give a more detailed analysis of the two approaches and give some suggestions when considering which to use. 

While the initial hype has diminished over the last few years, at one point there was a big push throughout the industry to replace the relational database with its non-relational counterpart. While relational databases were and still are powerful solutions, this pressure was not completely unfounded. With the ever-growing Internet of things and rampant increase of user activity on collaborative web applications, data in the 21st century have been generated and amassed like never before. “Big Data”, often unstructured or semi-structured in nature, has made it imperative for problem-solvers in certain spaces to employ an alternative, more flexible data paradigm. While there are certainly relevant use cases for non-relational databases, relational databases have retained their place as our go-to data store for most, if not all, of the web applications we develop.  

Why Relational Databases Are the Go-To 

Over the years, we’ve worked with hundreds of clients who come from vastly different industries. Through discovery conversations with our clients, we often are able to forge a pretty solid understanding of what data will be channeled into the application. At the same time, we can also determine, with some certainty, whether or not the data will likely remain relatively consistent over time. If the data appears to be mostly structured in nature and unlikely to dramatically change, we will more than likely suggest using a relational database. Data inputs that are clearly defined and have inherent associations marry well with the relational databases. Relational databases make it easy to optimize the relatedness of data sets and query the information in a rather streamlined, performant manner. 

Since all the information about a particular record lives within a single document in a non-relational database, it’s often difficult and computationally expensive to query for similar information stored across more than one document. With some of our non-relational database projects in the past, we have established joins on documents to mimic some of the behaviors and benefits that come from relational databases. However, this has often felt like a backward approach and we end up wishing we had just started with a relational database in the first place. 

When We Use Non-Relational Databases 

While relational databases seem to be the best fit for managing primary information collected and stored by web applications, more often than not, we will also utilize Redis and Elasticsearch for many custom web applications. Redis is a wicked fast key-value store, making it convenient for storing caching sessions. It can also be helpful in generating and scheduling task queues for background workers. Elasticsearch is another powerful implementation of a non-relational database. Known for its content flexibility, Elasticsearch allows for rapid and performant full-text search. Imposing a similar, comprehensive search of a relational database would require manual joins on all the tables or records you’d like to include, a process that can very easily become quite cumbersome.  Redis and Elasticsearch aside, if we come across a data architecture problem that would merit the use of a non-relational database, we would use one! Neo4j is excellent for graph-type data (think of user networks in social media platforms like Facebook or LinkedIn), MongoDB can be employed for unstructured “big data”, and DynamoDB is a handy key-value store.  

Our Recommendations   

  1. Know your data. We cannot stress this enough. Do everything possible to understand your data inside and out. Clearly defining your data set will not only help you to select the appropriate data paradigm but will also prepare you/your team with details that will likely impact the decisions you make throughout the project. 
  2. Map it out. If you’re caught in the middle, consider drafting a database schema. This brainstorming artifact and the process of creating it will likely provide valuable insight to support one approach or another. It may also highlight edge-cases in which retrieving groups of data might be more or less computationally expensive and/or complex to query. 
  3. Know your client/team. While this is and should be the least important factor in selecting a database, it’s certainly worth considering. In the case of agency work, it’s important that the solution is something the client is capable and willing to maintain in the long term. For in-house projects, selecting a database that your colleagues have working experience with may mitigate or eliminate the initial learning curve and potentially provide you with more resources to support the project in the short-term. 

Conclusion

When deciding on the appropriate data strategy for any given project, it’s extremely important to consider the consequences in terms of performance and data retrieval. While many factors can come into play, sometimes by virtue of a bad decision regarding the database, projects can require significantly more time and thereby impose avoidable frustration as budget and deadlines are stretched.

We’ve found that using both non-relational and relational databases can be extremely beneficial for the majority of applications we engineer: using relational databases as the primary solution for data that is relatively uniform and then employing non-relational databases for things like search and caching. As both paradigms evolve, it’s likely that the distinction between them will diminish. Until then, careful selection and utilization of both paradigms will greatly impact the ease, longevity, and scalability of the applications we build.