Benefits and drawbacks of representing and analyzing source code and software engineering artifacts with graph databases

Authors Rudolf Ramler
Georg Buchgeher
Claus Klammer
Michael Pfeiffer
Christian Salomon
Hannes Thaller
Lukas Linsbauer
Editors Dietmar Winkler
Stefan Biffl
Johannes Bergsmann
Title Benefits and drawbacks of representing and analyzing source code and software engineering artifacts with graph databases
Booktitle oftware Quality: The Complexity and Challenges of Software Engineering and Software Quality in the Cloud - Proc. SWQD 2019
Type in proceedings
Publisher Springer
Series Lecture Notes in Business Information Processing
Volume 338
ISBN 978-3-030-05766-4
DOI 10.1007/978-3-030-05767-1_9
Month January
Year 2019
Pages 125-148
SCCH ID# 18065
Abstract

Source code and related artifacts of software systems encode valuable expert knowledge accumulated over many person-years of development. Analyzing software systems and extracting this knowledge requires processing the source code and reconstructing structure and dependency information. In analysis projects over the last years, we have created tools and services using graph databases for representing and analyzing source code and other software engineering artifacts as well as their dependencies. Graph databases such as Neo4j are optimized for storing, traversing, and manipulating data in the form of nodes and relationships. They are scalable, extendable, and can quickly be adapted for different application scenarios. In this paper, we share our insights and experience from five different cases where graph databases have been used as a common solution concept for analyzing source code and related artifacts. They cover a broad spectrum of use cases from industry and research, ranging from lightweight dependency analysis to analyzing the architecture of a large-scale software system with 44 million lines of code. We discuss the benefits and drawbacks of using graph databases in the reported cases. The benefits are related to representing dependencies between source code elements and other artifacts, the support for rapid prototyping of analysis solutions, and the power and exibility of the graph query language. The drawbacks concern the generic frontends of graph databases and the lack of support for time series data. A summary of application scenarios for using graph databases concludes the paper.