Architects really care about taxonomies. I.e. unambiguously classifying, decomposing and grouping things. It is at the heart of how architects go about carving up problems – and solutions to these problems – into repeatable patterns and reference architectures. And yet – ironically – there are a couple of things that are often poorly understood in the architecture world:
- The role and responsibilities of an architect. (I’ll cover this another day.)
- What belongs in architectural documentation.
I’m going to cover the latter here!
If your job title is Systems Architect, Solution Architect, Cloud Architect or Technical Architect, you will no-doubt have been asked to produce some sort of architecture design documentation, scoped to the delivery of some sort of system or solution. This architecture design document has many names, including:
- High Level Design (HLD)
- System / Software / Solution Architecture Document (SAD)
- Technical Design Document (TDD)
- Architecture Design Document (ADD)
- Technical Architecture Document (TAD)
You get the idea!
It is generally well accepted that the above documentation is distinct from so-called “Low Level Design (LLD)” or “Detailed Design”. It is also generally accepted that these documents are not architectural.
All well and good.
But here’s the problem: organisations – and architects – are somewhat inconsistent about what concerns are architectural, and consequently, what sort of content belongs in architectural documentation.
I’ve lost count of how many times I’ve seen this problem. But over many years, through many roles containing the word “architect” (including Technical Architect, Systems Architect, Solutions Architect, Technical Architect, Infrastructure Architect, Cloud Architect, and Enterprise Architect), and through many courses and books, I have assembled enough knowledge and wisdom to confidently tell you what you should include in architectural documentation. And what you shouldn’t!
The Purpose of Architectural Documentation
Here I’m going to describe the purpose and value of architectural documentation relating to systems or solutions. I’m intentionally avoiding the myriad of artefacts that one might produce as an Enterprise Architect. While the principles and approaches to Enterprise Architecture are similar, the documentation is typically a very different focus. And furthermore, TOGAF has a lot to say about Enterprise Architecture artefacts. So, don’t expect any TOGAF in this blog!
The Primary Goal of Architecture Documentation
To document the architecture so that others can successfully understand it, build it, use it, maintain it, and evolve it.
First, some key tenets:
- Every system (beyond the most trivial) has an “architecture”.
- The architecture is the organised collection of components. These components have relationships and behaviour.
- The documented architecture is intentionally abstracted. It hides implementation details that are only pertinent to the inner workings of a component. But it exposes the details that are external to any given component at a sensible level of granularity. I.e. the observable properties of each component and interface: what they do, the value they bring, how they are consumed, how they behave, the resources they require, the dependencies they have, and the quality attributes they influence.
- Most systems are too complex to depict in any single view. Trying to capture all of the key attributes of a system into one view would generally result in a massive depiction, with too many layers of granularity to be comprehensible.
What Are Quality Attributes?
Above, I mentioned that we should document how the architecture influences the quality attributes of the system. But what are quality attributes? Simply put:
They are the fundamental quality characteristics that determine whether a system is fit for purpose, beyond simply being able to perform specific functions according to business rules.
Common quality attibutes include:
|Modifiability||The ease with which a system can change, whilst minimising cost and impact on other systems. The lifetime TCO of the system is therefore strongly correlated to its modifiability.|
|Integrability||The ease with which different components or systems can communicate with one another, both now and in the future.|
|Conceptual Integrity||Overall consistency in the design. The extent to which a common approach is used to do a given thing, throughout the design. This contributes to operability, maintainability, ease of understanding, modifiability, etc.|
|Portability||The ease with which the system can run on different platforms. Could be considered an aspect of modifiability. E.g. avoiding lock-in; making use of open source and open standards.|
|Testability||The ability to detect defects through testing. I.e. the extent to which our tests are likely to reveal a defect before it is detected in production. Modular, loosely-coupled architectures with well-defined public interfaces are easier to test; particularly for automated unit and system testing.|
|Deployability||The characteristic of being able to make changes and deploy at the frequency desired, with minimal disruption, and with as minimal effort as possible.|
|Energy Efficiency||The quality of minimising energy consumption whilst also meeting specification.|
|Reliability / Availability||The extent to which the system is able to function as designed, within defined operating windows, and with an agreed maximum amount of non-availability. Reliability is the extent to which a service is resistent to failure events. Availability is the quantification of reliability over a time frame.|
|Performance||The ability of a system to perform its function within an acceptable time frame. This is the attribute that is typically most compromised by design decisions that improve other quality attributes. For example, performance is typically negatively impacted by any sort of layering, brokering, etc. Thus, we want to make this trade-off consciously.|
|Scalability||The extent to which the system can handle increasing demand, whilst still performing adequately, and without needing signficant manually-introduced changes to the infrastructure or architecture. Often, tactics that improve scalability will also improve availability and performance.|
Elasticity is related to scalability. It is the ability of the system to expand or shrink as required, to meet demand at a given point in time. Elasticity allows significant TCO reduction, when running applications on infrastructure that charges based on consumption – like public cloud!
|Security||The extent to whcih the system is able to protect its data from unauthorised access while still providing access to people and systems that are authorised.|
|Usability||The ease with which a user can accomplish a desired – and valid – task.|
|Operability||The extent to which we can monitor and respond to issues with the system, and the extent to which we can minimise human operator overhead. For example:|
– system health transparency, SLIs, SLOs, alerting, monitoring.
– automatic fault detection and response, such as through autohealing of microservices.
– automatic scaling in response to demand.
|TCO / Value||The lifetime value of the system, relative to its cost. TCO is often closely linked to all the other attributes. For example:|
– A commercially licensed product that is licensed by CPU may prevent scalability and elasticity.
– A system that has poor usability will have little value.
– A system that has poor operability will be expensive to maintain.
– A system that is insecure may suffer from an expensive breach.
– A system with poor availability will again be costly to maintain, but will also be costly in terms of poor customer experience.
– A system that is not easily tested will have a high number of defects late in the SDLC, or even into production. And as we all know, it’s more costly to fix defects later, rather than earlier.
– A system that is unmodifiable is likely to have poor lifetime value; or, will be very expensive to update to meet some sort of compliance requirement.
I’m not going to get into these quality attributes in detail. I’ll save that for another blog.
But for now, it is sufficient to say that these quality attributes dictate whether a system is fit-for-purpose, beyond simply performing functional tasks correctly. For this reason, they are often referred to as non-functional requirements (NFRs). However, many would argue that NFR is a misnomer, since if quality attribute requirements are not met, then arguably the system is not functional. For example, consider a user adding an item to a basket. If it takes one minute to add the item, then the function worked correctly. But the performance is clearly inadequate. Arguably, the system is also not available. Thus: the system is not functional.
Consequently, quality attributes are the primary concern of the architect. If any of the system qualities are inadequate, then the system is arguably a failure. And it is the system design – the architecture – that dictates what qualities the system will have. Thus, the desired quality attributes are often referred to as the architecturally significant requirements (ASRs). Generally speaking, you can consider ASRs and quality attribute requirements to be the same thing.
If the architecture is wrong, then the system will be a failure. Incorrect architecture decisions will often prohibit achieving the ASRs. And, unlike functional defects, bad architectural choices are often very difficult – and very expensive – to fix.
Broader Rationale for Architecture Documentation
Let’s get in to the reasons why you should document your architecture:
|Documentation is architecture. Architecture is documentation.||Architecture documentation is not a retrospective exercise. The process of building the documentation forms a big part of architecting the system. Using a prescribed document template ensures a repeatable approach to architecting the system. It helps ensure there are no gaps in considerations or design. If the architecture documentation is complete (to the extent that anything is ever “complete”), then the architecture design is complete too.|
|System / Solution Specification||The architecture documentation serves as the design that is used for all subsequent non-architectural design, implementation and deployment activities. The architecture document describes the target state and how to get there.|
|Knowledge Capture||The documentation captures the design, the external properties of the system, the behaviour, and every significant decision point. It describes how the architecture will achieve the business goals. (There is no other place where this will happen!) |
Additionally, it captures the rationale for every decision that was made. Having this rationale is crucial. Without it, decisions will be revisited again and again, wasting time and resources. And worse, without the rationale, a decision may be reversed later, without fully understanding the implications. This could be very costly!
Finally, this documentation facilitates analysis, forensics, architecture reviews, and problem resolution.
|Education and Onboarding||The architecture documentation serves as the single best source of information for anyone that needs to know about the system. And this could be across a wide array of different stakeholder types. For example:|
– A new developer joining the project
– A tester
– An operator or SRE engineer who needs to support the system
– A customer or sponsor of the system
– An architect who needs to evolve the system
Furthermore, having quality architectural documentation offloads work from the architect. The architect can point stakeholders to the documentation as the initial response to questions with known answers.
What Concerns Are Architectural?
Drawing the line between architectural concerns and non-architectural concerns can be difficult. But let me help by giving you a load of examples! For the avoidance of doubt, if consideration is NOT architectural, then it doesn’t belong in (say) a High Level Design (HLD), but it might be appropriate for a Low Level Design (LLD).
|Quality attribute requirements / ASRs||Y||E.g. performance, availability, scalability, operability, modifiability, integrability.|
These are the primary concerns of the architect. The architecture fundamentally influences whether the ASRs can be met.
|Functional requirements||N||Functional requirements are not typically architectural. The business logic – implemented in the code – that achieves these requirements, is not architectural. These requirements should be captured separately.|
|Cost / TCO||Y||This is a fundamental quality attribute. It is dependent on the architecture.|
|Hosting venue||Y||E.g. on-prem? Public cloud? Which cloud?|
The architect must consider many factors, such as whether the workload is viable in cloud, the strategy for any existing data centres, software licensing constraints, ability to leverage scalability and elasticity, capability and skills of the organistion, etc.
|Hosting services and products||Y||E.g. selection of the most suitable products for a type of workload, and the best reference architecture for connecting those components.|
|Off-the-shelf application selection||Y||Buy vs build? Bespoke or commodity? Commercial or open source? Leveraging SaaS? Software choices constrain the architecture. Licensing is a significant factor.|
|Integration patterns||Y||E.g. synchronous vs asynchronous. Proxied? Circuit-breaker pattern? User-facing? Batch? Latency? Ability to decouple.|
|Interfaces||Y||The architect is concerned with the purpose of the interface, use cases, interaction styles, technology choices, the effect of a message over the interface, payload representation (e.g. XML, JSON, protocol buffers, binary), authentication and authorisation requirements, compatibility, etc.|
|Interface message format / specification||N||Whilst the architect will be concerned with the payload representation, the architect is not concerned with the specific message schema or specification.|
|High availability patterns||Y||HA requirements are driven by ASRs such as availability, RTO and RPO. Furthermore, availability is closely related to scalability and performance.|
|Instance counts||N||The architect cares whether a component is redundant, highly available, and whether it can scale. The architect should be concerned about system limitations, such as stateful applications which cannot scale horizontally.|
However, the architect should not be concerned with the specific number of instances of a scalable component.
|DR patterns||Y||The architect is concerned whether the RTOs and RPOs can be met; whether geographical separation is sufficient; the latency implications of such geographic separation; the licensing implications of multiple instances; how synchronous data replication over distance could impact performance; and so on.|
|Operational runbooks||N||Runbook execution steps are not architectural. They are operational.|
|IaC technology||Y||The architect is concerned with the extent to which infrastructure can be deployed in an automated and repeatable fashion. The architect will always be concerned with consistency of tooling; cost of tooling; availability of skills. The technology choice fundamentally constraints implementation choices.|
|IaC implementation / code||N||The actual code IaC is not architectural; it is implementation. Implementation could be swapped out for another implementation.|
|Installation steps||N||The general approach to installation and deployment is architectural (e.g. CI/CD pipeline, Canary testing, A/B deployment, etc). |
However, the specific steps to install the solution are operational, not architectural.
|Application component organisation (e.g. monolith vs microservices)||Y||This is architectural. It influences considerations like decoupling; ability to scale components independently; availability; cost; data ownership; ability to perform rolling updates; A/B testing; Canary deployments; etc.|
|Code||N||Code is implementation, not architecture. However, doing some basic due diligence (possibly even PoC work) to validate that your architecture is within the realms of feasibility… That is arguably architecture.|
|CPU architecture||Y||CPU architecture choices can have profound implications on quality attributes such as performance; but also on cost, as well as licensing implications.|
|Specific CPU specification (e.g. clockspeed)||N||This is an implementation detail that generally has very little impact on licensing.|
|Operating system flavour||Y||The OS choice will be aligned to the application software choices. It will also need to consider licensing, and the extent to which licensing prohibits using cloud capabilities like elasticity. Support models are a consideration, as well as lock-in, and availability of skills.|
|Operation system version, patch level, detailed configuration||N||Specific OS version and configuration are implementation details. They make very little difference to the quality attributes of the system.|
|Identity and Access Management approach and technologies||Y||Alignment to overall security requirements and security compliance, as well as considerations for broader identity management, and ability to integrate and consume.|
|Persona / Role identification||Y||Persona and role are coupled to use cases. Furthermore, usability and operability quality attributes are related to personas.|
|Role / permission mapping||N||Actual mapping to permissions is an implementation detail. It is the responsibility of the engineer.|
|Data schema strategy||Y||An architect will be concerned with qualities such as performance, availabiity and cost. Consequently, an architect will be concerned with whether the database will be for OLTP or OLAP; whether it is normalised or denormalised; whether it will be consistent or eventually consistent; whether it will be scalable; etc.|
|Data schema design||N||Considerations such as table design and field definitions should be left to data SMEs.|
|Technical debt||Y||Technical debt is the implied cost of additional work or rework that is incurred by choosing a suboptimal solution in the short term. Clearly, this is fundamentally about cost and other quality attributes. Whether to allow a tactical approach is fundamentally an arhitectural decision. The architect should understand and clearly articulate the implications of such decisions.|
Note: just because something isn’t architectural doesn’t mean it’s not important. Far from it! But it means that there are better alternatives for capturing non-architectural concerns, and more appropriate roles who should be capturing them. It’s about using an organisation’s resources wisely and appropriately.
Attributes of Good Documentation
Create it for the Reader!
- Think about who you are writing for. Who are you stakeholders? How will they consume it?
- It’s about the reader; it’s not about you! The documentation needs to be useful to the reader. It’s not a vehicle to show-off how smart you (think you) are.
- Don’t make assumptions about the level of knowledge of your readers. If you are making any assumptions, then explicitly state them.
- As an aside: not all stakeholders want a document. Whilst an artefact called (for example) HLD / SAD / TDD will generally be a full document, consider when you should be creating something shorter; or something suitable for presentation. For example, it is not typically feasible to review a 30/50/100 page architecture design document in an architecture governance session. For this purpose, it is typically much more effective to create an overview as slides. Furthermore, many stakeholders (such as a solution sponsor or CxO) will want a short presentation that provides an overview of the solution. Architects need to think of the stakeholders they need to influence, and the outcomes they want to achieve. If a senior stakeholder asks for a presentation, simply saying “Sorry, presentations are beneath me” is not really an option!
- Capture conclusions, decisions and rationale.
- Never simply record: “We could do x or we could do y.” If a decision has yet to be made, be explicit about it. You should record why the decision has not yet been made, and what needs to happen in order to allow the decision to be made. Be clear about who is responsible for progressing the decision, and when it needs to be made by.
- All diagrams should have a key.
- Be explicit about the meaning of arrow direction. E.g. does it mean direction of data flow? Does it mean direction of request?
- If you refer to anything that the reader might not understand, then define it.
- Provide a glossary for your acronyms.
- Save yourself some bother by externalising your definitions and acronyms into a separate artefact. This artefact can be crowdsourced, and you can refer to it in all your architecture documents. It will save you re-writing an almost-but-not-quite-identical glossary in dozens of documents!
- One way to be consistent – both across your own documents, but also across your architecture function – is to have architecture document templates.
- Furthermore, if you use a template, it will act as a guide for the things you need to capture. It will help to ensure you don’t miss something important.
An Architecture Design Outline
We’ve now covered:
- Why architecture documentation is important.
- The value it brings.
- The types of considerations that are architectural.
- The attributes of good documentation.
- That having a template is a good idea.
Here’s my generic template for an architecture document that can be used for various purposes, including:
- Solution design
- Systems design
- Cloud architecture
- Migration design
|Completion Guidance||This is guidance for the the author; i.e. the person who will create an instance of a document using this template. Typically, this will be some guidance text in each section, which should be removed ahead of document release.|
|Document control||Change history; sign-off / review status.|
|Purpose and Stakeholders||Which stakeholders should be interested in this document? What will it tell them? Why should they care?|
|Solution Overview||– Business goals and vision.|
– Scope of this solution. Be mindful that the scope of the solution may be smaller, or larger, than a particular project.
– High level overview of the solution, including some sort of context diagram.
– Upstream and downstream dependencies.
|ASRs and Strategic Alignment||– What are the strategic goals?|
– What are the architecturally significant requirements? (E.g. quality attributes.)
– What architecture principles are we aligning to, and how?
– What key design decisions have been made, with rationale.
– What critical user journeys (CUJs) have we identified that can be used as a qualitative and quantitive measure of the solution from the perspective of quality attributes?
|Architecture Phasing||As-is, to-be, and transition states|
|Technical architecture||A typical solution will cover many of the following:|
– Overall architecture
– Component view
– Application tier
– Persistence tier
– Interfaces and integration
– Data pipelines and orchestration
– Availability and DR
– Backup and archive
|Security||An overview of how security and compliance requirements are being met. E.g. |
– Approach to identity management, authentication and authorisation
– Role-based access control
– Privileged access
– Encryption and key management
|Environments and Deployment||– Deployment view|
– Environment strategy and pipeline for this solution
|Operations||– Monitoring, logging and alerting|
– CUJs –> SLIs –> SLOs
|Decommissioning||– What will be decommissioned? What can’t be decommissioned, and why?|
– Dependencies for decomm?
|Technical Debt||– What technical debt has been repaid?|
– What technical debt has been created?
|Glossary||It could be a link to an external glossary|
I hope you’ve enjoyed this article. Please do send me any feedback or questions.