CAIT stands for Code Analysis and Information Toolkit and is a new product that we have been working on here at Synergex. It is a tool that analyzes DBL code and can be used to present useful information about that code, or perhaps even makes changes within the code for some pre-defined purpose.
We already have a working prototype, and we’d like to get your feedback. So when you’re finished reading below, please head on over to the survey and tell us what you think.
CAIT uses Synergy compiler technology to analyze the DBL source code. Specifically, CAIT uses the same “Analysis Engine” that is used by the DBL compiler when you build your software. CAIT is a command-line executable application (cait.exe) that will be available on the Windows platform.
When a developer executes CAIT, it in turn uses MSBUILD to manage the actual analysis of the code, via CAIT-specific build targets. The CAIT executable launches MSBUILD under the hood.
CAIT is currently implemented with the traditional Synergy compiler and can be used to analyze traditional Synergy code. We anticipate also developing a .NET version which will have the ability to analyze Synergy .NET code, but this will most likely happen after the initial release of the traditional Synergy tool.
Much of the work already completed in the traditional Synergy version of CAIT applies directly to the .NET environment. Still, significant additional work is required to extend the tooling to know about .NET specific things. For example, the .NET version must know about generics and lambdas and requires special processing to be able to understand hoisted variables, and so on.
For now, we are considering the Windows platform for CAIT, although in many cases, it is possible to bring source code from other platforms to a Windows system for CAIT analysis.
It may be possible to make it available on Linux in the future. The primary reason for this initial platform restriction is that it requires an up-to-date C++ compiler, and the primary source of input data is a Visual Studio solution and project files. However, we do support an alternate input format via a JSON file.
Currently, running CAIT requires that a developer manually manipulate command-line tools. Ultimately, we plan to add tooling into Visual Studio to allow you to launch a CAIT analysis of the current solution via the Visual Studio UI.
CAIT writes its output files in a folder configurable via an environment variable. By default, the folder is named $(SolutionDir)caitout.
The output from CAIT is a collection of text files containing detailed information about different areas of the codebase. These text files are CSV files and are organized into sub-folders based on the projects in the original Visual Studio solution.
Once you have used CAIT to analyze your codebase, you use a second custom tool designed for some specific purpose. This custom tool processes the CAIT output files and takes some action, perhaps modifying the codebase in some way, or producing useful output.
The code to be analyzed is presented to CAIT in the form of a Visual Studio solution that includes the entire codebase for the application. That solution should ideally be in a buildable state, with any appropriate operating system .DEFINES in place if necessary.
The fact that the code must be in a Visual Studio solution does mean that strong prototyping is in play, and ideally, the code needs to build successfully in the strongly prototyped environment.
To ease the process, you do have the ability, through the compiler settings in the Visual Studio project properties pages, to set all of the available -qrelax switches to make it significantly easier to build your code with strong prototyping enabled.
Depending on the state of a codebase, the requirement to build in a strongly prototyped environment may be a barrier-to-entry for some.
If the Visual Studio solution is fully configured with all source files but is not in a buildable state, CAIT will still analyze the code, but the completeness and accuracy of the output files will be somewhat reduced, perhaps severely.
Currently, the use case we are working towards is to provide customers with an advanced “Code Explorer” experience by making it possible to visualize your codebase in a product called Sourcetrail, which is an open-source, cross-platform source code explorer.
We have produced a custom tool called the “Sourcetrail indexer” that transforms the data from the CAIT CSV files into a Sourcetrail project file. You can then use the Sourcetrail application to open the project file and explore various details of your codebase.
For example, the Sourcetrail UI exposes things like programs, subroutines, functions, namespaces, classes, methods, parameters, structures, and more. You can navigate around the source code hierarchy, following links such as drilling from a namespace into a class, then into a method, then perhaps explore all the places that reference the method from throughout the application.
We know that some customers already have homegrown custom tools that provide their developers with similar functionality. Still, we envision this CAIT-based tool to be far more comprehensive in the information it exposes.
The utility could, for example, be useful when onboarding new developers, helping them understand the hierarchy of the code, and what calls what. And in an appropriately configured environment, Sourcetrail also displays your actual source code within its UI as you navigate around, with some limited language awareness, such as basic color coding.
Here is a screenshot of the Sourcetrail UI having just opened a project based on the analysis of a typical traditional Synergy application:
Clicking on the “namespaces” icon above, the UI changes to display a summary of all namespaces that are defined in the codebase:
Clicking once again on a particular namespace moves to a view that presents the members of the namespace, things like classes, enumerations, structures, etc.:
Clicking on a namespace member, here the AutoLine class, presents information about that class:
Here we can see that the class has a constructor method, and four other members, two of these members are private fields, and two are public properties. Also, notice that the class inherits from another class called AutoItem, so we can see the parent-child relationship called out in the UI.
Clicking on the StrokeColor property exposes information about the property:
Here we can see that the property has a get method and a set method.
Like a web browser, the Sourcetrail UI provides back and forward buttons, so if we use the back button to return to the AutoLine class, then click on the parent class AutoItem:
We can see that two other classes called AutoImage and AutoText also inherit from the same base class.
Just to prove that you get similar functionality when working with non-OO code, if we use the home button to return to the original screen, then click on the functions link, we see an alphabetized display of all of the external subroutine and functions that are present in the codebase:
And of course, clicking on a routine shows us information about that routine:
Here we can see that the routine has a named record called WORK, two unnamed records, and several labels defined withing its code. We can drill into the records to view information about fields, and click on the labels to view different areas of the code.
Of course, you don’t need to navigate around by clicking. You can search for things by name, or type in the name of something to go directly to it. And notice the application also has a tabbed UI, so if you see something that you are interested in, you can open that item in a new tab and then continue exploring in the original tab.
You can also use the tool to discover everything that uses some particular entity:
Here we are examining a class named File, and we can see that it is referenced by three other classes, Directory, FileTools, and ScrollWindow, and also by three external routines, CheckSubAccountBatch, Email, and GoToUserFolder.
This is just a brief overview of the capabilities of code visualization via Sourcetrail.
A second use case that we are also considering is to create a custom tool that would help identify dead code. More specifically, external subroutines and functions that are no longer being called.
From many years of experience working with Synergy developers and their codebases, particularly the developers in our Professional Services Group (PSG), we know that in a typical Synergy codebase, that may have been developed over several decades, it is typical for there to be a significant number of routines that are no longer in use. In some extreme cases, we have seen numbers as high as 50% of all external routines falling into this category. This is not that surprising, because if you remove an XCALL to something from a piece of code, you have no idea whether you just removed the very last XCALL to that routine, so the routine remains.
This is not a huge problem for the application, it’s just a little more disk space used by an ELB, but where this issue really does matter is if you set about doing some large-scale rework to the application. Maybe you’re going to strip off an old character-based UI and replace it with something more modern, or perhaps you decide to expose all of the functionality of your application via web services. In these scenarios, the fact that you have dead code absolutely matters because you don’t want to waste time refactoring the UI out of the code, or exposing its business logic if the code is never used!
This is a perfect example of where a CAIT tool could answer a complex question in a relatively short space of time. You might think that identifying dead code is something that can easily be done manually. Simply search for calls to each routine, and if you don’t find any, remove the routine.
But it’s not as simple as that. In addition to XCALL and % function call, external routines can also be called in other ways, including by xfServerPlus, and more problematically, via mechanisms like XSUBR, XADDR, and the RCB API, where the name of a routine being called is represented by a name or address in runtime data, not by a compile-time identifier.
A tool using CAIT data could look for these patterns, identifying all XSUBR calls, and tracing back through parameters, assignment statements, and literal values, looking for signs of a particular routine being used in this way.
In addition to the two use cases described above, we have also been discussing other possible use cases for a tool like CAIT. Here is a list of some of the other potential use cases that we have already identified:
Each of these scenarios requires different amounts of time and effort, based on the complexity of each task. There is no generic engine; once a new use case identified, someone must design and build custom tooling to perform appropriate analysis of the data and produce the necessary output.
We think that writing a custom tool to deliver some of the simpler use cases may require perhaps a developer-month of effort, but some of the more complex (and likely more useful) use cases may require multiple developer months of effort to implement.
This is currently undefined. It is likely that we will deliver the code analysis engine via a Windows installer experience, perhaps even as part of one-off the existing installations like the SDI installer.
With regard to the initial code visualization use case which revolves around the use of the open-source Sourcetrail product, which already has a WiX installer. We could fork their repository and build a custom version of their existing installation that also delivers the additional CAIT components. We’d like your feedback. Would you expect a fully integrated experience? Or would you prefer to download and install Sourcetrail yourself, and have us simply plug into it?
When it comes to the long-term distribution of CAIT and its related tools, again, we’re not quite sure and we’re looking for input.
It is possible that the tooling for some of the more complex use cases may require very large amounts of computing resource to complete, so one thought is that we may consider exposing the tools via a cloud-based web portal, where you would upload the files from your initial codebase analysis, and then select the operation that you would like to execute on that data, and at some point, seconds, or minutes, or hours later, you would receive the results. This is just one possibility, and we’re definitely looking for input here also.
The CAIT analysis tool is already written and we have been testing it against several actual customer codebases that we already have access to at Synergex. We have learned a lot from this process, we have found and fixed bugs, and we have already been able to work on performance optimization in some areas.
At this stage, having used CAIT to produce Sourcetrail projects from several customer codebases, our confidence level that the tool is detecting and recording the information that it needs to is relatively high. But we know there are likely be some edge cases that we have not encountered yet, so the more testing that can be done, the better shape we will be in.
We are looking forward to receiving your input on CAIT, on what has already been done, and what we already think CAIT can be used for, but even more importantly, on other use cases that we may not have even thought of yet. Please submit your feedback as soon as possible.