An Introduction to Databases for New Web Developers
A database is a collection of related data, organized and stored in a set of files.
Databases are like spreadsheets, where related data are stored in an organized manner, i.e., rows and columns.
A spreadsheet can be considered as a database too.
However, conventional databases are more sophisticated in design and provide advanced features to manage a large amount of data.
Almost every website or web app uses a database solution (directly or via an API) to store and retrieve data.
For example, online newspapers keep their news and articles in a database.
An online retailer keeps its product catalogs, customer details, and order information in a database too.
A WordPress website keeps your blog posts in a database.
Remember your mobile’s Contact app? It also keeps all your contact information in a database.
Most databases define and enforce a fixed structure for the stored data.
This structure is called a Database Schema.
That said, each record we want to insert into the database must match the schema, or the operation would fail.
Below is the schema for a table in a WordPress database.
As you can see, each field (which is the equivalent of a column in a spreadsheet) only allows a particular type of value.
For instance, post_date can only store date & time values.
Apart from schema-based databases, there are also schema-less databases, which I’ll cover in a bit.
Database management systems
Like spreadsheets that need a program like Excel or Google Sheets, we need specialized programs to work with databases.
A Database Management System (DBMS) is a program to store, retrieve, and protect data in a set of tables within a database.
When people refer to the term database, they refer to the database and the DBMS as one unit.
A DBMS allows you to manage a database via a user interface (visual and command line) or programmatically from your code.
Developers usually interact with a database from the code – to create, read, update, and delete data.
The visual or text-based interface is usually used for the initial setup of the database and performing administrative tasks.
Types of DBMS
There are many database management systems in the market, which are different in how they store and manage data.
These are the most popular ones: Relational Databases and No-SQL Databases.
Relational Databases are Schema-based database management systems, which store data across several related data collections called tables.
No-SQL Databases are schemaless databases, which store data as JSON objects (also known as documents) or key/value pairs.
Relational database management systems (RDBMS):
A relational database is a database, which stores data in several “related” tables.
Think of a table as a spreadsheet in an Excel document, where we store data in rows & columns.
For example, an online store keeps the customers’ data, products, and orders across separate tables.
Separating data in multiple tables helps us avoid data redundancy.
Data redundancy happens when we store everything in a single giant table because there will be many repetitions of the same data.
Imagine a customer buys three products in three different orders.
If you use only one table to store this information, the table would be something like:
The problem with the above table is that we’ve repeated customer information with each order.
It makes the table’s size bigger.
Besides, if customers change their emails at some point, all of these instances would have to be updated.
But what if we keep product and customer data into separate tables?
And refer to them in the orders table:
Dedicated table for customers
As you can see, we can store customer data in a dedicated table.
Each customer has a unique ID in this table.
This unique ID is called the Primary Key.
Next, we store the product data in another table:
Dedicated table for products
Each product in the above table (products) has a primary key too.
After putting customers and products in dedicated tables, the order table would look like this:
As you can see, for each order, we only refer to the respective customer and products via their primary keys.
The order table has a primary key too.
Since product_ID and customer_ID are the primary keys of other tables, we call them Foreign Keys inside the orders table.
Avoiding data redundancy in a database is called data normalization.
Related data stored across multiple tables can be related and stitched together with the help of a Primary Key.
You can use these key-based interconnections to extract detailed information about one transaction from multiple tables – without having a single giant table with many repetitions.
Common RDBMS brands in the market are MySQL, PostgreSQL, MariaDB, Oracle, and MSSQL.
Every RDBMS provides a query language that allows you to query and update data in tables.
The most common query language across relational database management systems is Structured Query Language (SQL).
SQL is a DSL language used to do various database operations, such as CRUD, data protection, optimizations, etc.
Here’s an example of a SQL statement, which retrieves the delivery_address of a customer with the email address firstname.lastname@example.org:
SELECT delivery_address from customers where email=“email@example.com”
And this is what we’d get:
If you’re wondering where you write and run queries, you either write it in your code as well as your RDBMS CLI or its query editor.
Unlike relational databases, No-SQL databases store data without adhering to a fixed and strict data schema.
They store data as key/value pairs, graphs, or documents.
One of the most common types of No-SQL databases is document-oriented databases, which store data as documents, such as JSON, XML, or BSON (Binary JSON) files.
Each document is an equivalent of a row in an RDBMS but with a dynamic structure.
It means each document (or record) can contain more or fewer fields than other documents within the same collection.
Thanks to their dynamic nature and high performance, NoSQL databases are used in big data and real-time web applications,
NoSQL and RDBMS can complement each other to fulfill various data storage needs. This architecture is called Polyglot persistence.
Prominent players of the NoSQL world are MongoDB, CouchDB, Amazon DynamoDB, and Redis, to name a few.
How back-end developers use databases
As a back-end developer, you often work with a database programmatically from your code.
Each programming language provides special database connectors called drivers (need to be installed separately for each database) to enable you to work with various databases in your code.
If you read this far, you can tweet to the author to show them you care. Tweet a Thanks
Never miss a guide like this!
Join the DWD mailing list for your weekly dose of knowledge! This mailing list is for doers and thinkers. If you're a curious person, you might find it useful, just as 2,000 other devs do!
Disclaimer: This post may contain affiliate links. I might receive a commission if a purchase is made. However, it doesn’t change the cost you’ll pay.