Big Data is one of the new terms that we hear regularly these days along with “Internet of Things” (IOT), “Artificial Intelligence” (AI), and “Cloud Computing”. Interestingly, when you Google the term, you can read a wide variety of definitions – which is not unexpected considering how new the concept is!
In this article, I would like to try to provide a simple understanding of the concept from a technical point of view.
One of the best definitions I encountered is this one: “any voluminous amount of structured, semi-structured, and unstructured data that has the potential to be mined for information”.
This definition will make more sense when we compare it with how data was traditionally handled. For old developers like myself, useful data always meant structured tables of data organized in relational databases. It enabled you to do meaningful searches and display the results in a useful format – like what you see in CRM or ERP applications you daily use. But as the above definition describes, the Big Data is now about big volumes of data that are not necessarily structured in tables of databases.
Why is Big Data mentioned now?
The first question that arises after reading the above lines would probably be: What has changed in the past couple of years that has made Big Data so important?
In my opinion, there are 3 main reasons:
- Advance of Artificial Intelligence technologies: masses of unstructured data have always been available – they are the source for any structured data! One of the main reasons we have been organizing such data into structured databases over the past few decades has been the lack of search and data mining algorithms. Finding and presenting the data in a meaningful and useful way is a complicated process that would require very sophisticated programming algorithms. However, these sophisticated programming algorithms are now available.
- Advances in computer hardware: storing high volumes of data, searching through them, and accessing useful data on a timely manner calls for advanced computer hardware that enables super-fast data access and very high-speed processing power. Such hardware was not widely available a decade ago.
- Advances of the Internet: there is no doubt that the invention of the Internet has been one of the most important events of the 20th During the 21st century, the internet has constantly provided connectivity at higher speeds with more mobility. The result is an access to an incredible amount of unstructured data from all over the world in the form of videos, pictures, text, and codes.
- Increased amount of data: this is usually referred to as the three ‘V’s – Volume, Velocity, and Variety. The volume of data, the speed it is becoming available, and the variety of the data has simply made traditional methods of structuring them impossible!
As the world becomes more connected, not only do we face a huge growth of man-created data, but also an exponentially increasing amount of data created by machines. Some examples of such machines are:
- CCTV Cameras: there are an increasing number of constant streaming videos captured from CCTV cameras. The volume of data created by CCTV cameras would simply not enable any timely analysis by humans. It is also not possible to “structure” them in any meaningful way.
- IOT sensors: it is expected that there would be over 20 billion IOT devices by 2020. Each of these devices would be constantly creating data. The variety of devices and hence the data they create would again call for the “Big Data” solution.
- Network Equipment Logs: network switches, routers, security appliances, servers, and other network equipment each create their different logs which again is a huge amount of useful data, if it can be analyzed efficiently.
We’re obviously at the verge of a new technological revolution which would be known by the emergence of robots, artificial intelligence, IOT devices, virtual reality, auto-driven vehicles, and many other great technologies. But at the core of this revolution is the “data” and how it can be analyzed efficiently and intelligently. “Big Data” is the concept for making that happen and includes technologies on how to store such volumes of data, and more importantly how to extract the required information efficiently from it.