This content is part of the Essential Guide: Using big data platforms for data management, access and analytics

Beyond gaming, GPU technology takes on graphs, machine learning

The Blazegraph database runs on graphics processing units to speed graph traversals. Machine learning in the form of Google TensorFlow is also a GPU target.

Graphics processing units are familiar to dedicated gamers and supercomputer programmers, but these specialized...

chips may find use in big data science applications. Recent developments in NoSQL databases and machine learning services point the way.

The potential of GPU technology to handle large data sets with complex dependencies led  Blazegraph to build Blazegraph GPU, a NoSQL-oriented graph database running on NVIDIA general-purpose GPUs. That follows on the heels of Google's endorsement of GPUs for work with its TensorFlow machine learning engine.

Blazegraph, formerly known as Systap, estimated that throughput for the software running on a cluster of 64 NVIDIA K40 GPUs equals 32 billion traversed edges per second -- edges represent the connections between graph nodes.

Graph databases have been used to map relationships in applications ranging from social media to national security. Very large graphs, however, often have encountered performance issues that GPU technology may help to address.

Connected antiquity

The approach has the potential to ease some computational tasks, according to a technologist with the ResearchSpace project at the British Museum, which has been working with graph databases to aid the efforts of cultural heritage researchers.

Barry Norton, development manager for ResearchSpace at the British Museum, said those early efforts with Blazegraph have been underway since June 2015. The first objective is to enhance the curation of artifacts from Salamis, an ancient Greek city state located on Cyprus, as part of a project known as Gravitate.

At the museum, 3D object scans, along with other information from different collections and records, are being combined in a graph database in the form of Resource Description Framework triple stores. The approach differs from relational database technology in that no fixed schema is required to begin work, according to Norton.

"We don't know in advance what the connections will be," he said. ResearchSpace has been employing software from an open source version of Blazegraph for the work. "There are tasks that we are especially excited to try [GPU technology] out on," he said. "They are offline tasks that take a good amount of compute time."

According to Norton, coreferencing, which runs first-cut comparisons of graph entities and objects -- and such objects amount to ''hundreds of thousands of millions'' in the British Museum collection -- could be a useful application for GPU technology, as well as ''alignment computations'' that resolve different terms used to describe the same object, person or place.

Graph cache thrash slash

Why do GPUs have potential over CPUs for some graph database tasks? The answer may be that, although graph databases provide a useful structure for representing relationships, they also must "hop and touch" an unpredictable number of data points in establishing those relationships.

As graphs get larger, the problem of making those hops and touches becomes greater, according to Brad Bebee, CEO at Blazegraph. In these instances, the memory bandwidth of general-purpose CPU cores can be relatively slow, he cautioned.

The benefit of the GPU is better bandwidth to main memory, allowing better parallel operations on graphs.
Brad BebeeCEO at Blazegraph

And, as jobs traverse larger and larger graphs, there is a constant requirement to page the memory. As a result, the CPU ends up waiting. This is what Bebee called "graph cache thrash."

"The benefit of the GPU is better bandwidth to main memory, allowing better parallel operations on graphs," Bebee said.

That GPU advantage has been exploited by a number of players in a field that, in Bebee's words, "sometimes now blurs with graph databases." That is machine learning -- one of the hottest fields within big data science at the moment. 

Watching the TensorFlow

Use of GPU technology is front and center in some important machine learning applications, according to David Schubmehl, an analyst at IT market research company IDC. Facebook, Baidu, Amazon and others are using clusters of GPUs in machine learning applications that come under the aegis of deep neural networks. These applications include image recognition, categorization and more, he said.

Late last year, Google's limited open sourcing of its TensorFlow machine learning library highlighted GPUs. Schubmehl said TensorFlow uses GPUs both in learning and production modes.

"The requirements for deep learning and cognitively enabled applications and analytics are increasingly driving use of large-scale [high-performance computing] server clusters," according to Schubmehl. These types of cluster rely more on GPUs, and require fewer servers in comparison to traditional server clusters, he said.

He pointed to Facebook's open sourcing of its Big Sur AI hardware design -- which uses GPUs -- as another example of broader endorsement of GPUs for machine learning. That move, which came last month, may augur more GPU sitings in big data science in the year ahead.

Next Steps

Find out how a graph database supports a medical data lake

Take a look at the future of predictive analytics and machine learning

Discover GPUs' uses for virtual desktops

Dig Deeper on AI hardware