Top 3 Ways to Determine If Your Machine-Learning Implementation is Debt Free

Debt of any kind -- if not addressed -- will only get worse over time. The same can be said about machine-learning systems. There is a crucial difference between hidden and technical debt. Technical debt can be addressed by refactoring code, removing dead code, reducing dependencies, introducing abstractions for easy maintainability, and so on. However, hidden debt is dangerous because it compounds silently.

The following are broad categories under which hidden debt has been identified in machine-learning implementations:

  • Boundary erosion
  • Data dependencies
  • Anti-patterns
  • Impact of dealing with changes in the real world

Does this sound like an issue your team is running into? Here are the top three ways you might end up finding yourself in machine-learning debt:

1. Boundary erosion

The practices of encapsulation and modular design in software engineering create strong abstraction boundaries to help maintain code. Therefore, code can be easily extended for enhancements without modifying existing code. Unfortunately, it's difficult to enforce abstraction boundaries for machine-learning systems by defining a specific intended behavior. This is due to entanglement, correction cascades, and undeclared consumers/users.

2. Data dependencies

According to Morgenthaler et al in their paper on managing technical debt at Google (see note at end of this article), dependency debt is an important factor contributing to code complexity and technical debt in software development. Thankfully, modern day compilers and linkers are able to detect and help fix such dependencies. Data dependencies have a similar impact in machine learning system but are difficult to detect. Unstable data dependencies, underutilized data dependencies, and static analysis of data dependencies are some of the data-related reasons why hidden debt is created in machine learning systems.

3. Anti-Patterns

Code that is dedicated to training a model and to prediction is significantly smaller than various other types of code and can leave your machine learning system "in debt." Examples include glue code (where several otherwise incompatible components are quickly put together into a single implementation) or dead experimental code paths (where code is written for rapid prototyping to gain quick turnaround times in machine learning implementations).

How to Get out of Machine-Learning Debt

Models created using machine-learning algorithms are consumed in business applications that interact directly with the real world. In turn, they typically follow the unstable nature of the real world, resulting in hidden debt within your machine learning systems. Such situations in machine-learning implementations warrant a trusted data partner who can provide the capabilities to help prevent the non-obvious, hidden debt that is created unintentionally on a predictive journey. This way, a data-driven organization can embark unencumbered on the path of digital transformation.

Note: David Morgenthaler , Misha Gridnev , Raluca Sauciuc , Sanjay Bhansali, "Searching for Build Debt: Experiences Managing Technical Debt at Google," Proceedings of the Third International Workshop on Managing Technical Debt, pp.1-6, June 2012, Zurich, Switzerland; downloadable from http://research.google.com/pubs/pub37755.html and https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

About the Author

Paul Pallath, Ph.D. is chief data scientist and director, advanced analytics at SAP.


Subscribe to Upside

Q&A with Jill Dyché

Find out what's keeping teams up at night and get great advice on how to face common problems when it comes to analytic and data programs. From head-scratchers about analytics and data management to organizational issues and culture, we are talking about it all with Q&A with Jill Dyche.

View Recent Article

Submit Your Questions to Jill

Powered by TDWI. Advancing All Things Data
A Division of 1105 Media, Inc.