The Good, The Bad, and the Hype about Graph Databases for MDM
- By Ben Rund
- March 14, 2017
Excitement about new technology entering the market is not uncommon. New trends constantly come and go, sometimes without even being noticed. For information leaders, business strategists, and emerging technology teams, it is critical to keep an eye on developing trends so they can apply best practices for their company and stakeholders.
However, some trends tend to be more hype than practicality.
Gartner came up with the concept of a hype cycle for emerging technologies to show how technologies move from innovation trigger to inflated expectations, a trough of disillusionment, slope of enlightenment, and finally to the plateau of productivity.
The master data management (MDM) space is no exception when it comes to such hype -- and the latest MDM buzz is graph databases.
Small startups are pushing graph databases as the end-all be-all for MDM because that's all they can offer. While graph offers some attractive benefits for an MDM solution, it's important to take a step back and consider the drawbacks as well.
This article offers practical and technical insights so you can make informed decisions about your MDM implementation. Let's start by examining the hype and explain the strengths as well as the drawbacks of graph databases that could negatively impact MDM efforts.
The graph databases are often pitched as the perfect solution for MDM. Graph does offer advantages to data consumption use cases that rely on relationship traversal. However, those use cases are limited. When compared to MDM solutions with a fixed, prebuilt data model (such as Oracle UCM or IBM's Advanced Edition), graph databases certainly provide some functional improvements (listed below). However, the flexibility of the technology itself is overhyped, given the nature of the problems MDM solves.
Many emerging vendors highlight their graph database with a persistence layer that allows them to do Facebook and LinkedIn-like relationship management. However, anyone who has ever been involved with an MDM project knows that maintaining data relationships in a persistence layer is not the objective, as it's not a major roadblock or pain point.
Let's zoom in on some of the good and bad aspects of graph databases.
Graph databases, such as Neo4j and Titan, claim these advantages:
- Flexibility: The data captured can be easily changed and extended for additional attributes and objects
- Search: You can run fast relationship-based searches such as "Which supplier provided the products owned by this group of customers?"
- Indexing: Graph databases are naturally indexed by relationships (the strength of the underlying model), providing faster access compared to relational data for data
However, there is room for improvement of graph databases within the context of MDM.
Here's what you need to know about graph database limitations.
Graph databases are not as useful for operational use cases because they are not efficient at processing high volumes of transactions and they are not good at handling queries that span the entire database. Because they are not optimized to store and retrieve business entities such as customers or suppliers, you would need to combine a graph database with a relational or NoSQL database.
Using a graph database alone is not an MDM solution. It does not give you MDM functionality. A graph database is just a data store and doesn't give you a business-facing user interface to query or manage relationships. Also, it will not provide advanced match and survivorship functionality or data quality capabilities.
Graph databases do not create better relationships. They simply provide speedy data retrieval for connected data. Improved search is great but not if the relationship wasn't captured effectively in the first place.
For the most common graph databases, you have to store all the data on one server. Some graph databases, for example, are limited to a single node and can't scale beyond a certain point.
Graph databases are not optimized for large-volume analytics queries typical of data warehousing. For instance, you wouldn't be able to answer a simple but multi-faceted question such as, "Who were all the customers with income over $100K between the ages of 35 and 50?".
Jim Webber, author of Graph Databases, writes "It is important to note the consequence of using graph databases. The query latency in a graph is proportional to how much of the graph you choose to explore in a query, and is not proportional to the amount of data stored."
Simply put, graph databases allow you to search through data related to an individual record (person, product, place, etc.) quickly. However, there's a catch. You won't be able to perform mass analytics queries across all the relationships and records.
In speaking with leading industry analysts, we also hear companies raise concerns about the security of open source graph database technologies. I expect this discussion to only grow in priority in the near future.
Getting Started: The Bottom Line
Use a comprehensive, end-to-end master data management (MDM) solution. If you want to consume relationships at high speed, absolutely put those relationships in a graph.
Ben Rund leads product marketing for information quality solutions at Informatica, which includes master data management, catalog procurement, data quality, and data as a service. His experience is built around all disciplines of communication, including journalism, PR consultancy, corporate marketing, field marketing, and product marketing. Prior to Informatica, Ben served as CMO of Heiler Software where he helped build the MDM for product data market and positioned Heiler Software as a leading PIM vendor. Ben studied economics and PR, and his passion is focused on the return of information. Projects such as pim-roi.com and his listing as top omnichannel influencer complete his expertise in the enterprise information management world.