While the size of big data sets continues to increase to encompass larger and larger magnitudes of data, the ability to store and manipulate these data sets will inevitably outpace the physical memory and processing power available. Though moving from disk memory storage to much quicker In-Memory approaches to manipulating data is a great advancement, eventually even more slick methods will have to be employed to be able to keep in pace with the demand for big data solutions that data management systems will be faced with down the road.
Perhaps one way to look at the problem is to consider that data, rather than being represented as a string of bits in some memory buffer somewhere, is itself not such a tangible object. Rather, the idea of dealing with data may move from being so straight forward to being more of an implied process. In some sense, programmers have already toyed with this concept in a limited capacity.
Data Compression Used To Imply Data Exists When It Is Not There
The beauty of a process like data compression is that we rely on the implication that data exists, even if the data in question is not readily being expressed in a set of bits. Since we are relying on the implication that data exists, rather than expressing the data outright, this saves us potentially huge amounts of space in memory.
However, improving on how this process of implying data works is a key to saving even more memory. If we think of memory being a box, and data being the sand in our box, manipulating the expression of the amount of sand in our box opens up more room in our box for more sand. In other words, a compressed data set is an implied memory storage process, since the process of decompression leads to re-expanding the set back to its original size.
Another process that leverages the idea of implied data is matrix multiplication. Multiplying a 1 by n row matrix with an n by 1 column matrix produces an n by n square matrix. The result is implied, even before the multiplication is carried out, simply from what we know about matrix multiplication and the mechanics behind such a process.
Image Source: Pixabay
This in turn implies that space can be saved by storing the information of the end product as the two original matricies, where we only need to carry out the multiplication step, if and only if we need to view the implied information being storred.
Data Expansion Implying Data That Is There Does Not Really Exist
Another side of this coin is data expansion. A classic example of expanding information in a system that represents objects that are not really there is found in the physical phenomenon of gravitational lensing. As light bends around massive objects, the same galaxy is seen from multiple angles, even though only one instance of the galaxy truly exists.
In a similar manner, massive blocks of data that appears to be repeating over and over again in a big data set may itself be behaving like the information in a gravitational lensing situation. Compressing these instances of data once again allows us to imply that data exists in these large spaces without having to represent this data in its expanded form. In this sense, processing and manipulating the compressed data blocks increases our processing speed. It is also important to note that because big data sets are so large, the opportunity to take advantage of methods involving implied data increases.