In 1962, the first database, Integrated Data Store
(IDS), made its debut. It was a hierarchical type, the only kind available at that time. It wasn't until the early 1970s that relational databases appeared and SQL took over. Some 56 years later, relational databases remain popular, but a few alternatives have emerged, some only recently. Let’s see what’s available, as well as their strengths and weaknesses.
Also known as navigational databases, hierarchical databases are a means of organizing data with one-to-many relationships. Parent structures (e.g., tables) can have many “children,” but each child can only have one parent. Adding and deleting new data is easy in this system; but when you need to find data, the hierarchy can get in your way. Hierarchical was popular when tape storage was the main thing, but the days of tape are firmly in the rear-view mirror; and in addition, hierarchical does not support many-to-many relationships. Because of the disadvantages, you wouldn't expect to find hierarchical databases in wide use, but there's one in every copy of Windows, powering the Registry. They are also used in banking and telecoms; probably the best-known example is IBM IMS
(Information Management System), which turns 50 years old this year.
When you see the word “database,” it almost always means a relational database, in which data is stored in tables with relations between tables. The first commercial one was released by Oracle in 1979. According to the Db Engines website
, relational databases currently make up 77.6 percent of all databases in use. The advantages of relational over other types of databases include the use of SQL, ease of changing the structure of tables, greater consistency in data, and handling of many-to-many relationships. However, relational databases are slower than hierarchical and NoSQL, and joins between tables can become very slow if the tables are large.
A Quick Aside: Key Value Stores
A key value store is similar to the Dictionary data type in C# and other programming languages. It maintains a list of keys and their associated values, and can retrieve them very rapidly. A popular example of this is the open-source Redis
, which holds everything in RAM for speed, and persists changes by either dumping memory to disk or appending to a journal log file. Key-value stores have a simple structure and are highly scalable, which makes them faster than relational databases with ACID (Atomicity, Consistency, Isolation and Durability). In general, non-relational databases don't have ACID, but the price of higher performance is less consistency, reliability and security. Consistency is poorer because after the updated value is written, reads may still fetch older data for a short while. The term for this is “eventually consistent,” and this scheme is known as BASE consistency. For games, blog posts, and the like, BASE may prove suitable; but for high-value financial transactions, you want ACID consistency. It would be bad if your account showed you as having more funds because a previous update hadn't become consistent yet (and you spent money that you didn't have as a result). Key-value is one category of NoSQL databases; others include document and content stores. A document store is a specialized key-value store—instead of storing data across columns in a table or multiple tables, it’s all held in one document. It's also possible to store complex structures in a document, perhaps by using arrays. A content store handles other data types such as media (e.g., videos, images, music files and so on). These typically include full-text search, versioning, hierarchically structured content, and access control. NoSQL databases have found uses in places such as online games. Zynga used a precursor to Couchbase
in their games; it was capable of 500,000 operations per second. In games such as “Farmville,” when you move the cursor to harvest a number of crops, each individual crop harvest is an update to a database. You might do 20 per second; a relational database could not cope with that volume of updates, but NoSQL databases can.
Object-oriented databases have been around since the late 1980s, but have never been as popular as relational databases. They attempt to solve a problem known as object-relational impedance mismatch
. Handling relational tables as objects isn't always easy, especially with multiple linked tables. This mismatch happens because using a relational database with object-oriented code requires workarounds. What if your object has variable length lists? How do you store those in the database? For a number of years, there has been ORM middleware (object relation mapping) such as Hibernate
that solves these problems and allows relational databases to work with object-oriented code. The object-oriented database (OOD) lets you define your object as a class and takes care of persisting it for you. Compared to relational databases, structure changes are harder with OOD. And as there's no standard query language like SQL, you have to code your own queries.
In mathematics, a graph
is a set of points together with unordered (or ordered) pairs of points (known as edges or lines). A graph database uses graph structure to store data. There's no schema as there is with relational databases. You can store complex structures of data in a graph database, which would be hard or impossible in a relational database; the points could be data about people, businesses, accounts, or any other item. It's their relationship that is stored and quickly retrieved in a graph database. For example, imagine a family tree with all the couples, offspring and relationships to each other. A search to find distant cousins would be complex in SQL but a lot easier in a graph database. Relational databases do well when there's only one or two levels of relationships; graph databases handle far greater depths.
Time Series Databases
When I worked in finance, writing code to evaluate instruments against time data curves, I could really have used a time series database. These are written to store data items with an associated time stamp. Sure, you could do that with a relational database, but data tends to accumulate quite quickly. How do you deal with all that data? How long will you need to keep it? Given the volume of data, being able to down-sample is handy; this means using every tenth or perhaps hundredth value and storing that out. If you run queries against very large datasets without some kind of down-sampling, you could run out of memory. Some databases (e.g., OpenTSB
) support down-sampling and storing rollups, as they're called. While time series databases may seem to have limited utility, there are probably more uses than you realize. For instance, monitoring Internet of Things (IoT) devices, asset tracking and trading systems; one way of think of them is as a sophisticated event-logging backend.
You probably wouldn’t go wrong if you chose a relational database as your first choice for an application, but if your use case matches one of these other types, give it a go!