Data analytics promises to deliver a clearer vision of the world, a view based on actual metrics unclouded by our preconceptions. It’s a worthwhile goal, but difficult to reach when navigating the waters of technological changes like AI and machine learning. There’s a lot at stake for businesses.
According to Donald Feinberg, Gartner vice president and distinguished analyst, “The continued survival of any business will depend upon an agile, data-centric architecture that responds to the constant rate of change.” Here’s a glimpse into some of the biggest changes ahead, Gartner’s predictions for 2019, from their Data & Analytics Summit of earlier this year.
Augmented Analytics and Augmented Data Management
Not surprisingly, organizations seek to automate time-consuming data prep tasks so they can free up data scientists for work that drives business improvements. Gartner predicts that within three years close to 50% of data prep processing will be automated.
One solution, augmented analytics, uses machine learning-derived algorithms to clean data and identify outliers and trends. This will enable companies to make better use of the volumes of big data they’ve amassed. And it will enable greater data democratization, because it will give less-technical staff access to data that currently must be pre-processed by data scientists.
According to industry expert Jen Underwood, “These solutions are ideal for common use cases such as prioritizing leads, detecting customer churn, performing win/loss analysis, predicting readmissions in healthcare or identifying insurance fraud.” For example, this article describes how an insurance company could use augmented analytics to customize pricing based on data from wearables like Fitbits. Applying predictive analytics to the data stream could help better assess risk.
Augmented data management (ADM) is another approach to automating the drudgery of cleaning data. It uses analytics techniques to identify incorrect or missing data, then automatically cleans or corrects it. ADM platforms also manage the metadata layer, documenting the path data has taken as it has been cleaned and processed. And they provide master data management of an organization’s data sources to seamlessly integrate them into one master view.
As data volumes grow, it’s not enough to simply connect individual databases. Like master data management solutions, data fabric solutions claim to integrate and unify data across myriad data sources, even providing real-time analytics in some cases. Although “fabric” is perhaps not the most accurate metaphor, the goal of these solutions is seamless, real-time integration of multiple data silos.
Beyond automation, another priority for organizations is agility: speeding up the data to decision cycle. That is the raison d’etre for continuous intelligence (CI) solutions. CI tools embed streams of real time data into analytics, so organizations can better react to shifts in the data as they occur in real-time. A decade ago, IT teams began moving application changes to production using a continuous delivery model. Now CI technologies promise to deliver real-time data to analytics systems continuously as well.
Explainable AI (XAI)
One striking thing about AI-powered platforms is that users will need some understanding of how the algorithms are applied. For example, insurers will be able to customize pricing for each individual based on some complex model but will need to be able to explain how the model arrived at a given result.
It’s likely that organizations and government entities will expect some level of transparency and trust from AI-powered systems. Consumers (like those individuals who end up with higher insurance rates) will also demand some recourse to decisions made with their data. Expect to see significant growth of explainable AI platforms. For example, just this month IBM Research launched an open source XAI collection, AI Explainability 360.
Graph databases like Neo4j and Amazon Neptune are useful for highly connected data like social networks and recommendation and search engines. In a graph database, relationships between data are as important as actual data points. Native-graph databases offer performance advantages over traditional RDBMSs. Query speeds are faster because they don’t require secondary keys to determine relationships. They typically use specialized query languages like Cypher or Gremlin, but queries are less complex, and easier to read and maintain than SQL. The agility and flexibility of graph databases make them ideal for use cases like real-time fraud detection and identity management.
Natural Language Processing Analytics (NLP)
NLP is another example of consumer technology driving user expectation for their BI tools. In a world that relies on Google Assistant, Alexa, Siri and others, it is not surprising that (according to Gartner) by next year, half of all analytic queries will be language-based. Soon it will no longer be enough to promise analytics users they don’t need to code SQL. BI products will also need natural language search and voice activation functionality.
You may have heard this quote on AI “Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.” In fact, a recent survey by O’Reilly found that almost 75% of enterprises were talking about AI, or thinking about it, but not actually doing it. Gartner sees this as an opportunity for organizations to build commercial AI software on top of open source frameworks like Google’s TensorFlow and Microsoft Cognitive Toolkit. These commercial solutions could better meet the unique needs of enterprises (like data lineage and reuse) and would be an easier first foray into AI.
Over the past ten years, blockchain has made plenty of trend lists. But it has not fully matured as a technology to the point where it can be deployed widely. Gartner is pragmatic about this, predicting it will be several more years before blockchain solutions emerge that can be cost-effectively integrated with systems. Nonetheless, with Facebook’s June announcement of its own cryptocurrency Libra, this technology may finally become mainstream.
Persistent Memory Servers
None of the software trends above could exist without improvements in hardware. Persistent memory (PM) is a new version of non-volatile random-access memory (NVRAM) technology that will be hugely disruptive to storage architecture and computing. One of 2019’s hottest startups, Formulus Black, maintains their new NVRAM chips will run any workload in memory. At a time when business users are pushing for faster access to analytics, PM technologies promise to drastically improve performance for heavy analytics processing. They’ll also prevent loss of data from power outages and save on data duplication costs.
A data fabric isn’t the only way to connect your siloed data. Learn more in our new whitepaper Transforming Your Data for Analytics: Three Options. Download your copy.