“Big data” has slowly transitioned from a buzzword to a part of everyday life, but it’s still somewhat mysterious. What exactly is big data? And what are relational and non-relational databases (buzzwords you may also know)?
Our new Big Data Fundamentals course offers a high-level overview of the types of big data, its architecture, and some main technologies that use it. Here, we’ll take a quick look at the three main types of big data: structured, semi-structured, and unstructured.
Picture a traditional database — chances are, you imagined structured data in a relational database. Structured data conforms to a strict format, typically made up of inventory items, airline reservations, and similar information. Everything in the database is in the same format, making it easily searchable and mapped to predefined fields — so you know what the data should look like, and that’s exactly what it will look like. Surprisingly, it only makes up about 10% of all data.
As technology and data evolved, semi-structured and unstructured data emerged. CSV, YAML, and JSON documents are some examples of semi-structured data, which is data containing semantic tags. Semi-structured data could be stored in a traditional database with some processing (though it’s challenging fitting it within those constraints). It’s best for non-relational (aka, NoSQL) databases, which typically store semi-structured data in JSON. Like its more strictly formatted predecessor, unstructured data represents only 10% of all data.
Then there’s the “pretty much everything else” category. Unstructured data can be just about anything; from a large collection of different text documents and emails to images and audio. Even if parts of the data have internal structure, at a macro level, there isn’t a uniform structure amongst them. So, it doesn’t necessarily fit into a database — you can’t flatten it and stick it into nice, neat rows and columns. It makes up about 80% of all data and is the reason we have the term “big data” in the first place. We’re all creating huge amounts of unstructured data with social media posts, emails, etc so this type of data continues to grow.
This is just the tip of the big data iceberg. With the lightning-fast rate at which we’re all creating data, this field is only going to continue growing. To find out what makes up some of this data, how it’s stored, and use cases for each kind, check out my new Big Data Fundamentals course. For all of our Big Data content, you can view other courses here. And as always, feel free to join our Community Slack or leave a comment below.