Defining a DSL for Extracting Software Architecture as Living Documentation

Wouldn't it be great if you could automatically generate any architecture diagram you wanted from an existing codebase and the diagram was…

Wouldn’t it be great if you could automatically generate any architecture diagram you wanted from an existing codebase and the diagram was guaranteed to be 100% accurate?

That’s what I’ve been exploring in this series of posts and also in a professional context at the same time. One of the most fundamental parts of this challenge is accurately extracting information from a codebase.

Previously I explored how to do this in TypeScript with ts-morph . That example was simple, but I did also verify the approach on a real codebase.

One barrier with this approach is having to create a custom extraction script for every codebase. If every team has to build and maintain their own, it’s unlikely to succeed. So what if we use a DSL to minimize the effort?

This is one post in a series. You can find the other posts here: https://medium.com/nick-tune-tech-strategy-blog/software-architecture-as-living-documentation-series-index-post-9f5ff1d3dc07

Architecture model elements DSL

I use the term ModelElement to describe elements of a system that we want to include in the model that we extract from the system (the graph of dependencies).

For teams following a domain-driven design approach, one modelling concept they might use is a repository. What kind of rule can we define to extract all repositories? The name of the type ends in Repository would be one option:

export const repositoryDefinition = defineModelElement({
  type: 'Repository',
  detection: typeSuffix('Repository'),
  ...
})

Coding agents can be very powerful here. They can analyse flows and identify rules to define with the DSL

For other concepts, it might be possible to rely more on the type and less on string matching (which I don’t like because it’s brittle). For example, all Domain Events implement the DomainEvent interface.

export const eventPublishedDefinition = defineModelElement({
  type: 'DomainEvent',
  detection: extendsOrImplements(DomainEvent),
  ...
})

Type safety should always be preferred, but we don’t always have the luxury in an existing codebase when we’re starting.

What about if we don’t even have a common type nor a consistent naming convention? We might have to start writing more brittle rules like the following based on the file directory and the existence of certain methods.

export const useCaseDefinition = defineModelElement({
  type: 'UseCase',
  detection: directoryPattern('**/use-cases/**', {
    hasMethod: { anyOf: ['apply', 'handle', 'execute', 'invoke', 'trigger'] },
  }),
})

This is gives us feedback that we should improve consistency in our codebase. But for an old legacy system that doesn’t change or an initial starting point, it might be something we can live with.

Attributes/decorators can also be useful for identifying elements. The following would match any method that has a @HTTPEndpoint(path, method, ...) decorator.

export const httpEndpointDefinition = defineModelElement({
  type: 'HttpEndpoint',
  detection: decorator(HTTPEndpoint, 'method'),
  customFields: {
    path: fromDecoratorArgs('path'),
    method: fromDecoratorArgs('method'),
    ...
  },
})

In this example we also see customFields — these are rules for extracting information from the element that we store one the node in our graph. This example shows that every HttpEndpoint in our graph should have apath and method property.

A scalable approach

The type of DSL library presented above would be a reasonable investment for a single codebase but more justifiable and necessary to share across multiple codebases that can easily define their own model elements and conventions.

But as I discussed in Enterprise-wide Software Architecture as DDD Living Documentation, if we want DDD-style architectural living documentation at a whole system scale, its probably going to be better to bake in certain concepts like HTTP endpoints and domain events so that we can build more powerful downstream tooling and UIs based on these standards.

To achieve that, we will need to decouple the definition of our model elements — these are standardised and provided as a platform component — from the conventions for identifying them in a given codebase. This way, different codebases with different conventions can produce output in the same format (i.e. an abstraction layer).

This is something I haven’t started working on yet in a professional context, so it’s more speculative than the previous examples. But it’s not too difficult to imagine what this might looks like:

const rules = {
  Repository: {
    detection: typeSuffix('Repository')
  },
  HTTPEndpoint: {
    detection: decorator(HTTPEndpoint, 'method'),
    customFields: {
      path: fromDecoratorArgs('path'),
      method: fromDecoratorArgs('method'),
    }
  },
  ...
} satisfies AwesomePlatformModelElementDetectionRules

awesomePlaformLibrary.configureModelElementDetectionAndExtraction(rules)

Here, the type system demands that values forRepository , HttpEndpoint and other standard concepts must be provided. They are mandatory properties on the type AwesomePlatformModelElementDetectionRules (please don’t use that name in your real code)

In fact, we can do even better than this. In organizations that adopt a paved road/path approach with standardized tech stacks and application templates, we could even have default rules.

awesomePlaformLibrary.configureModelElementDetectionWithDefaults()

And what might those defaults be? One example is the JMolecules approach. The platform provides a standard library containing types like @Repository and @HttpEndpoint which the default rules will look for.

It doesn’t have to be decorators, you can decide.

What if we forget to follow our standards?

What if we forget to put the decorators on a method or inherit the standard base class? Our living documentation will be inaccurate. So this approach still relies on good discipline and developers remembering to do things. More of a risk than I am comfortable to live with.

I see a few different ways to solve this. One is prevention (like using scaffolding to add new model elements) and the other is verification.

Verification is where I see potential to use architectural unit testing libraries that ensure our conventions are applied so we can have confidence in our living documentation to truly be 100% correct.

For example, we can try to set an architectural rule like all objects in our domain model must be annotated with either @Aggregate , @ValueObject , and so on. Again, the default rules could be provided for free by the platform to make the paved road experience frictionless.

I haven’t tried this yet, but it’s on my todo list. If you have achieved this or crashed and burned I’d be keen to hear from your experiences regardless.

Key Points

A DSL is a cool way to define the conventions in a codebase. It decouples the definitions of the conventions from the extraction mechanics. It’s cleaner in a single codebase but can be even more worthwhile when providing a reusable library to use in multiple codebases.

But a super flexible DSL might not actually be the most effective or reliable solution. As a minimum, it’s probably better to define some standard concepts.

And to really build a highly reliable and accurate, system-wide, architectural living documentation, it’s worth looking into provide defaults where possible. A familiar challenge for platform engineers: where to allow freedom and flexibility, and where to standardize for the greater good.