Emergence of Artificial Intelligence means processing of huge data sets. These data sets has to be stored in some disks or some persistence storage. The huge volume of data has to be processed by the data scientist or big data analyst to come up with an algorithm or model that yields a measurable outcome.
What are the challenges in storing huge volume of data?
1) The first and foremost challenge is security
2) Bandwidth to take the data to and from cloud systems
3) Complexity in managing unstructured data
4) SMART IDENTIFICATION OF REDUNDANT DATA –
A self driving car can generate details about fuel consumption or remaining fuel or speed or time to reach destination, etc, very often. Many studies say that a self driving car can generate 20 to 30 GB of data every 1 hour very easily. Similar will be the case with any smart systems. Any system to be smart will generate data and the same data later will be used by ML tools to make the device learn on its own.
The recent unfortunate accidents with Tesla and Uber made me think on the data processing capacity of the smart machines. If the speed of processing of data have been faster could that accident be averted? If the right data was processed at the right time could the accident be averted? These are just some thoughts out of curiosity. In fact, curiosity is the mother of next advancement.
I think smart systems to be smarter needs the right data to be processed. More data doesn’t mean more exact outcome. Only the right data can give you more accurate predictions. One simple solution might be to avoid redundant data and the challenges surrounding it.
How can we smart reduce the redundant data?
Lets first understand the core fundamentals of Big data – 3 V’s
Even though there are different types of V’s, let’s take into scope the very fundamental or basic V’s.
Volume – The most important V which refers to the huge amount of data
Velocity – Refers to how often and how fast data is generated and sent to systems
Variety – Number of types of data or different types of data representing some characteristics
For the big data analyst to derive predictions and analyze possible outcomes the core fundamentals on big data should not be compromised in the name of reducing redundant data.
Possible solutions for redundant data challenge
Solution 1 – Applying Filters at data collection source
One simple solution can be adding the right filter on the devices which generate large data. A simple analytical solution installed with the device can reduce the redundant data inflow to the system while retaining the core fundamentals of big data.
Solution 2 – Applying calculus while storing the data
Differential calculus is just one piece in mathematics which helps in deriving the rate of change. If a rate of change on the data can be expressed in a simple formula using differential calculus then a huge set of data can be stored as one line item. And at the same time the same data can be replicated from the simple/complex formula that was stored.
Solution 3 – Combining solutions 1 and 2
Reduction in data at the source with right filters and storage of data by applying differential calculus will reduce many of the challenges that we discussed above like –
i) Less bandwidth utilization
ii) Less storage space
iii) More meaningful data
Many challenges are solved by finding solutions to the problem in other plane rather than the current plane.
Please post interesting challenge that you are trying to resolve.