Information about the overall goals of the Linked Data Benchmark Council

Home » Public » Graph Data Management

Organizations increasingly use data analysis to improve their effectiveness. So called "Big Data" data analysis problems have three characteristics: volume (the "Big" in Big Data), variety and velocity. The data being analyzed is often messy and unstructured and comes from multiple sources (variety) while it arrives in a continuous stream (velocity). In order to distil information from such data, often the required analysis depends on finding relationships between the different data items. Finding the relationships between data items makes many Big Data problems graph data management problems, because the various relationships between data items represent a graph.

People have been trying to use existing technologies, such as relational database systems for such graph data management problems. It is perfectly possible to represent and store a graph in a relational table, for instance as a table where every row contains an edge, and the start an end vertex of every edge are a foreign key reference (in SQL terms). However, what makes a data management problem a graph data management problem is that the data analysis is not only about the values of the data items in such a table, but about the connection patterns between the various pieces. Relational database systems and the SQL language were not designed for such tasks.

The need to perform graph data analysis has led to a proliferation of new graph-oriented data management technologies. Roughly speaking, there are two families of approaches. One are pure graph database systems, which elevate graphs to first class citizens in their data model, query languages, and APIs. These systems often provide specific features such as breadth-first search and shortest path algorithms, but also allow to insert, delete and modify data in a graph database using transactional semantics. A second kind of new system targets the need to compute certain complex graph algorithms, such as Community Finding, Clustering and PageRank, on huge graphs that may not fit the memory of a single machine, by making use of cluster computing.  The ecosystem is still heavily in motion and new systems are being developed as we write.

The mission of LDBC is to provide the first comprehensive benchmarks for graph database management, helping people in IT with a graph data management problem to better understand the nature of their problem, and better understand the pros and cons of the various technological options (including good old relational database technology). LDBC graph benchmarks are also designed to help guide future innovations in new graph database management systems.