How Big is Big Data?
Big Data is any data that is too big to be stored, managed and analyzed via conventional database technologies. So the “data” in big data can really be anything. It doesn’t have to be social media data, and it is certainly not limited to user-generated content. It can be genomic, financial, environmental, or even astronomical.
Let’s define Big Data, taking into account its characteristics, data capturing & input devices, data resolutions, etc.
One of the most obvious characteristics of big data is that the devices for capturing those data are either already ubiquitous or becoming ubiquitous.
When any data capturing device becomes ubiquitous, there is a high probability that whatever data those devices are capturing will eventually become big data. This is pretty obvious, because more data capturing devices translate directly into a proportional increase in data production rate.
Besides the increase in capturing units, there is also an increase in the variety of data sensor and input devices.
The variety of data sensors and input devices not only increases the data production rate, it also produces an explosion of metadata for segmentation.
Another major contributor to the bigness of big data is that data resolution is increasing rapidly.Images and videos will take up more of storage volume and make the data even bigger. Therefore, any data that is experiencing a rapid increase in data resolution (whether it is spatial, temporal or any other dimension) is likely to evolve into big data.
Since the precise criterion for “big” data is a moving target, it is useful to examine how “big” data were generated and try to identify the common traits that contribute to their “bigness.”
There are at least three major factors that contribute to the bigness of big data.
- Ubiquity and variety of data capturing devices for different types of information
- Increase data resolution
- Super-linear scaling of data production rate with data producers
Adapted from Why is Big Data So Big? by Michael Wu, Ph.D.,Lithium’s Principal Scientist of Analytics