cutimage - Fotolia
Published: 17 Mar 2015
You can never have enough data and process it too quickly.
That's the essence of the drive toward real-time data analytics and the use of event processing and event-driven architecture. The data center is experiencing a mass amount of data and some of it must be analyzed immediately.
Most businesses try to improve cycle times to be more responsive to catching problems and spotting opportunities quicker, said IDC analyst, Maureen Fleming. Shifting to a faster, event-driven approach often makes sense.
There are numerous approaches to real-time data analytics, and many definitions of what constitutes real-time, said ESG analyst Nik Rouda. "We [conducted] a survey and found most respondents thought data must be updated in seconds for [it to be considered] real-time, while smaller groups preferred different sub-second times," Rouda said. No matter the specifics, faster always seems to be better.
"In the [Business Intelligence] world, we were always trying to stuff data into data warehouses faster and faster," said Claudia Imhoff, founder of the Boulder Business Intelligence Brain Trust (BBIBT), in Boulder, CO. Meanwhile, there was an entire world of complex event processing that they didn't know much about until it hit the big time with stream analytics. "Now they can take data as it streams and throw it against an analytical model for something like fraud detection, or some kind of customer pattern behavior model," she said.
In the last year, popular traditional relational databases, like Microsoft SQL Server, IBM DB2 Blu Acceleration, and Oracle 12c Database, have acquired in-memory options for analytics. This typically provides an order of magnitude or more performance improvement, Rouda said. "Some would argue that their approaches are retrofits that don't properly design for this use case," he said. The alternative would be younger NoSQL databases that are reputedly better suited to in-memory analytics, such as Datastax's Apache Cassandra.
In the Hadoop ecosystem, the transition from batch to real-time is evidenced by the fact that Apache Spark, a fast and general purpose engine for large-scale data processing, is challenging MapReduce for in-memory data analytics, thanks to its ability to use data on the Hadoop distributed file system and other data platforms, like Cassandra or Amazon S3. "This space is still maturing, but there is a lot of development, momentum, and interest," Rouda said.
Of course, for real-time performance, data must be captured at the speed in which it is generated. Depending on the data type, streaming the data can be accomplished with data pipes that use tools like Apache Flume or Apache Kafka, or even Amazon Kinesis.
Amazon Kinesis is important because it is cloud based, said Eric Dynowski, the CEO of Turing Group, an infrastructure as a service provider. However, it is more of an intermediary, enabling more and better processing on-demand. For Dynowski, Amazon Lambda is changing the data game. "Lambda provides a framework for analytics that is the first opportunity for an organization to simply implement a little bit of code or a whole neural network with Java Script and artificial inteliigence completely on demand. You can have an application program interface and a real-time response without having a big application infrastructure," he said.
Lambda is even more radical because it is completely scalable and demand-driven, Dynowski said. "IT has always been about systems of consuming -- something processes data and then waits for more data," he said. "Lambda is a quick flash of code that adds a new layer to how you analyze data when needed." Now, analytics can scale more realistically, based on events and what is happening.
Cloud: the new paradigm?
The Enterprise Big Data, Business Intelligence, and Analytics Trends survey from ESG reports that 21% of respondents want a public cloud for analytics, like Amazon, while 10% prefer a hybrid approach.
There are other online options, including Google Data Flow and a new permutation of Microsoft Azure, said Mike Gualtieri, principal analyst at Cambridge, MA-based Forrester Research. There are also powerful open source projects such as Apache Storm.
However, Amazon is head and shoulders above the other options, said Shlomo Swidler, CEO of Orchestratus, a cloud computing consulting firm. "Both Kinesis and Lambda appeal to audiences who value technology as a tool, not as an asset," he said. "In the long term, your technical needs will be served much better by allowing Amazon to worry about the infrastructure and the message passing, and instead focusing on the unique processing that your business requires."
Be fast and agile
Speed is crucial, but organizations are also concerned with agility.
Agile business intelligence (BI) development could allow you to more rapidly prototype and build new real-time data analytics applications, but it isn't necessarily linked to the [real-time] movement, Rouda said.
"Agile is actually one of the biggest challenges because over the last 20 years, the expertise to build robust, scalable, mission critical BI environments has grown, but we still don't know how to build them quickly, reliably, and inexpensively. And when requirements change, we don't know how to change them," said Boris Evelson, vice president and principal analyst at Forrester.
Self-service BI could help deliver some degree of agility, he said. "It should require very little help from IT professionals and should only require IT infrastructure."
The desire for real-time data analytics is related to the need to build speed and flexibility into applications, said IDC's Fleming. "That involves simplification of containers, and micro services that can be assembled in a more dynamic fashion," she said.
There are many drivers including:
- Real-time promotions, where there is a need to identify an opportunity to make a targeted promotional offer to improve conversion rates. Immediacy is an important part of the strategy.
- Maintaining a real-time view of inventory across omni-channel points of purchase.
- Preventive maintenance and outage detection associated with revenue and customer satisfaction, particularly in Internet of Things systems.
- Monitoring costs in cloud utility billing.
Challenges involve skills gaps in building these systems and in building models that detect problems, and opportunities and challenges in "building a responsive culture inside the business," Fleming said.
All of the work being done to improve analytics capabilities has observers excited about the possibilities.
"The most interesting part of this is the tense shift taking place with businesses moving from looking at the past [batch] to looking at the now [real-time]," Rouda said. "This enables all kinds of immediate responses to new information that otherwise weren't possible."
However, as real-time becomes more invasive, practitioners need to make sure it doesn't have a performance impact on operations, said BBIBT's Imhoff. "That is something that needs to be thoroughly considered ahead of time."
Today, providing faster analytics systems is easier said than done.
Too many initiatives don't have a strong business case. It is better to look for tangible benefits that can be achieved with quicker methods, which you can integrate into the whole enterprise environment later, said Forrester's Evelson. "Instead of aiming for a big bang, take baby steps, reach for tangible benefits, measure them and then build up from there," he said.
"I think the key to success is a relaxation of some of the old rules that we used to go by, like the need for a single enterprise BI architecture and platform. Or the idea that you must have a so-called single version of the truth," he added. "You should have these idealistic goals as the light at the end of the tunnel, but you also need to realize that you are on a journey and it will be hard to achieve 100% along the way."