Learn Data Science Today

Relational Database Essentials

What is the first thing that comes to your mind when you hear the word “database”? For many people, this question is more challenging than it might seem at first. An answer like “a big file where much information is stored” is not satisfactory and would not please potential employers.

You should remember there are two main types of databases – relational and non-relational. The former will be the focus of this post, while the latter regards more complex systems. Although understanding non-relational databases requires a serious mathematical and programming background, some of the logic applied in its coding is the same as SQL. Likewise, relational databases have a few advantages on their own. A small bit of theory will explain why they are still the preferred choice in many companies and institutions.

Databases’ main goal is to organize huge amounts of data that can be quickly retrieved upon users’ request. Therefore, they must be compact, well-structured, and efficient in terms of speed of data extraction.

Today, people need such extra efficiency because data occupies memory space and… the bigger its size, the more sluggish the database is and the slower the retrieval process becomes. If we have a database containing one multi-million-row table, with many columns, then every time a request has been received, the server must load all the records, with all fields, and it would take too much time for a task to be completed. Don’t forget every symbol is a container of information and requires bytes of storage space. Hence, loading that much data will not be an easy job for the computer.

So, what allows us to contain so much data on the server, but lets us efficiently use only the portions we need for our analysis? The secret lies behind the use of mathematical logic originating from relational algebra. Please, don’t worry – we will not bother you with math.

Imagine each table with data is represented by a transparent circle that contains all the data values of the table, categorized by columns or, as we will often call them, fields. Now, if our database consisted of only one table, a giant circle would represent the entire database, something like this huge table from our fictional example with the “Sales” database. And when we need a piece of information from the database, for example if we wish to see who has bought something on a certain date, we will have to lift this whole big circle and then search for what we need. This challenge seems vague and the process of data extraction will not be efficient.

See what can happen if we split the circle into 3 smaller circles, just as we did with the “Sales” database. One circle will stand for the “Sales” table, the other for “Customers”, and the last one for “Items”. There are various theoretical combinations between 3 or more circles, but in our database, we have the following model. “Sales” and “Customers” have the same “customer ID” column, and “Sales” and “Items” have the same “item code” column. This way, we can see the circles overlap as they have common fields.

So, if we’d like to extract the same information, the names of the customers who have purchased something on a given date, we will need only the “date of purchase” column from the “Sales” table and the “first” and “last name” from the “Customers” table. So, to satisfy this request, we will not need to lift the third circle from our database, “Items”. This way, we can save energy or, more technically, increase efficiency. Less data, represented as only two of the three circles, will be involved in this operation.

The mathematical trick lies in relating the tables to one another. Relationships were formed namely through these common fields. Anyway, I am sure that now you understand why we use the term relational database.

Some professionals may refer to the tables, or the circles in our plot, as relations because, theoretically, they are the smallest units in the entire system that can carry integral logical meaning. Likewise, the three circles are all part of our “Sales” database.

When we combine the database and its existing relations, we obtain the famous term relational database management system, frequently abbreviated as RDBMS. 😊

We hope this theoretical illustration makes things clearer. SQL is designed for managing relational database management systems and can do that by creating relations between the different tables in a database.

Stay tuned for the next post, where we will outline the most substantial differences between a database and a spreadsheet.

Check out our next post, where we will take a closer look at the advantages and disadvantages of Database vs Spreadsheet.

 

Data Science PR

Add comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Follow us

Don't be shy, get in touch. We love meeting interesting people and making new friends.