The Architectural Kata by O'Reilly, February 2025

Presentation and Video

Executive Summary

Team members

Document Structure

Tip

Introduction
Provides a welcome message and a brief overview of our way of working. It also introduces the Business Case and includes a link to the original requirements.

Current System Overview
Presents our interpretation of the system architecture, identifying key challenges and opportunities for improvement. A crucial part of this section is the assumptions we made due to gaps in available information.

Proposed Architecture
Outlines our solutions to the identified challenges and opportunities. It documents the fundamental decisions necessary to successfully implement these changes. The section continues with a High-Level Architecture that integrates AI solutions into the existing system and detailed descriptions of the AI solutions themselves.

Final Words speak for themselves ;)

Introduction

Welcome

Welcome to the Software Architecture Guild Architectural Kata run by O'Reilly in February 2025.

We are the Software Architecture Guild — a group of seasoned software architects who have spent years working together, honing our craft, and developing a shared approach to software architecture. During this time, we leveled up our skills and learned invaluable lessons about what it takes to succeed in this field.

Our mission is to design and shape the foundation upon which software architectures evolve, AI-driven insights emerge, and intelligent solutions are built.

As architects, we understand AI's power and potential to transform industries, drive efficiencies, and unlock new possibilities. We are at the forefront of designing, securing, and optimizing software ecosystems to fully leverage AI capabilities and create adaptive, scalable, and intelligent systems.

In our work, we mainly utilize the following techniques:

Business Case

Background

Certifiable, Inc. is a recognized leader in software architecture certification, providing accredited certification to software architects primarily in the U.S. Due to recent regulatory changes, international markets, including the U.K., Europe, and Asia, now require software architects to be certified, significantly increasing the demand for certification services. With this anticipated expansion, Certifiable, Inc. is facing a substantial surge in certification requests—estimated to grow 5-10 times their current volume. The existing manual processes for test grading and certification management are proving to be inefficient and unsustainable at this scale.

Given this challenge, the company is exploring how Generative AI can be integrated into its current system to optimize operations, improve efficiency, and maintain high certification standards while managing cost constraints.

Market Opportunity

The global demand for certified software architects is accelerating due to government regulations and industry requirements.

Market Size & Growth: The U.S. alone has over 176,000 software architects, with 300,000 job openings. Internationally, the number of software architects is estimated to be around 600,000, and the industry is projected to grow by 21% over the next four years.
Revenue Potential: Certifiable, Inc. currently processes 200 candidates per week at a fixed certification cost of $800. With the expected increase in demand, this number could rise to 1,000-2,000 candidates weekly, translating into an annual revenue increase from certification fees alone.
Competitive Advantage: As the market leader, Certifiable, Inc. holds over 80% market acceptance in the U.S. and a dominant presence in international markets. Implementing AI-driven automation will strengthen its position globally as the most efficient and reliable certification provider.

Objective

The primary objective is to modernize the SoftArchCert system by leveraging Generative AI to streamline the certification process while maintaining accuracy, reliability, and cost-effectiveness.

Key Goals

Enhance Efficiency: Reduce manual workload by automating grading processes, candidate feedback, and test modifications.
Maintain Accuracy & Quality: Ensure AI-driven grading maintains the high standards required for certification validity.
Scale Operations: Support a five to tenfold increase in certification candidates without overwhelming human graders.
Ensure SLA Adherence & Global Expansion: Maintain certification processing times and guarantee SLA adherence as candidate volume scales and operations expand internationally.
Cost Optimization: Implement AI solutions that align with budget constraints while maximizing efficiency.

Stakeholders

This initiative will impact several key stakeholders:

Internal Stakeholders

Certifiable, Inc. Leadership: Executive management responsible for strategic direction and investment in AI-driven improvements.
Software Architect Graders: Expert software architects who grade aptitude tests and case studies; their workload and processes will be affected by AI-driven automation.
System Administrators: Responsible for maintaining certification systems and implementing AI-enhanced features.
Technology Team: Architects and developers responsible for integrating AI into the SoftArchCert system.

External Stakeholders

Certification Candidates: Software architects seeking certification who will experience a potentially faster, AI-enhanced grading process.
Accreditation Bodies (SALB & International Licensing Boards): Organizations responsible for maintaining certification integrity and compliance. Although not directly impacted by the AI-driven changes, they may exercise increased oversight to ensure that certification integrity and compliance standards are upheld. Any perceived decline in quality could trigger additional audits or adjustments to certification requirements.
Employers & Hiring Managers: Companies that rely on Certifiable, Inc. certifications for hiring and verifying software architects. While not directly impacted by system changes, the accelerated certification process is expected to increase the number of qualified architects in the job market, facilitating recruitment and hiring. However, any decline in certification standards or trust could have the opposite effect—diminishing the value of certification as a reliable indicator of candidate competence.

By addressing these stakeholder needs and aligning with market opportunities, Certifiable, Inc. can ensure its continued dominance in the certification industry while meeting the demands of an expanding global market.

Original requirements

For the original requirements, please follow Original Requirements

Current System Overview

High-Level Architecture

Functional Viewpoint

Describes the system’s functional elements, their responsibilities, interfaces, and primary interactions

Since we already have an established system, we believe the best way to describe its functionality is through User Journey Maps and a System Blueprint.

We have identified several personas who interact with the system and actively participate in the business process:

Candidate
A software architect seeking certification through Certifiable, Inc. Candidates must pass an aptitude test and an architecture submission. They rely on timely grading, accurate feedback, and certification validation to advance their careers.
Expert
An employed expert software architect responsible for grading certification exams and providing detailed feedback to candidates. They are freelance contractors paid per hour and play a crucial role in ensuring the integrity of the certification process.
Designated Expert
A senior expert software architect with additional responsibilities beyond grading. They can modify certification tests, create or update case studies, and ensure that certification standards evolve with industry practices.
Administrator
A Certifiable, Inc. staff member manages expert software architects, maintains system access, and ensures smooth certification operations. They oversee expert profiles and system credentials and handle operational issues that may arise during the certification process.
External HR
Hiring managers and recruiters from various companies rely on Certifiable, Inc.'s certifications to verify the qualifications of software architects. They use the certification database to validate credentials and make informed hiring decisions.

Candidate Journey Map

Workflow:

Registration & Payment — To access the aptitude test, the candidate registers on the Certifiable, Inc. platform, fills out the registration form, confirms their email, and pays for the certification test.
Aptitude Test (Test 1) — The candidate takes a timed multiple-choice and short-answer aptitude test. The multiple-choice questions are auto-graded, while expert software architects review the short-answer responses.
Test Results & Eligibility – If candidates score 80% or higher, they receive an invitation to the architecture submission test. If they fail, they receive detailed feedback and must start the process from the beginning to reattempt.
Architecture Submission (Test 2) – The candidate downloads a case study, designs a software architecture solution, and submits their work within two weeks.
Evaluation & Feedback – Expert software architects review the submission, grade it based on set criteria, and provide feedback.
Certification & Verification – If candidates pass both tests, they receive official certification stored in the database for employer verification. If they fail, they can reapply for Test 2.

Expert Journey Map

Workflow:

Profile Setup & Access – The expert software architect is onboarded by Certifiable, Inc. and gains access to the grading system. They can update their profile and set availability.
Test 1 Grading (Aptitude Test) – Experts manually review and grade short-answer responses. They provide detailed feedback and ensure grading accuracy based on established evaluation criteria.
Test 2 Grading (Architecture Submission)—Experts assess architecture submissions based on predefined rubrics. They spend an average of 8 hours per submission, ensuring a fair and precise evaluation. To help candidates understand their results, experts offer detailed explanations for incorrect answers, areas of improvement, and scoring justifications.
Test & Case Study Improvements – Designated experts periodically analyze test performance, identify problematic questions, and propose modifications or new case studies to keep the certification process relevant.

Designated Expert Journey Map

A Designated Expert has all the responsibilities of a regular Expert Software Architect, including grading aptitude tests, reviewing architecture submissions, providing feedback, and contributing to certification system updates.

Workflow:

Getting Access – The designated expert receives an elevated access role from the system administrator.
Review Suggested Improvements from Experts – They analyze feedback and improvement suggestions submitted by expert graders regarding test questions, case studies, and grading inconsistencies. They assess recurring issues in test performance data, such as frequently failed questions.
Maintain Tests - Designated experts update aptitude test questions based on expert feedback and industry advancements. They remove outdated or problematic questions and introduce new ones to reflect emerging software architecture trends. Changes are tested to ensure balance and difficulty consistency across certification exams.
Maintain Case Studies - Designated experts develop new architecture case studies to prevent content leaks and ensure the certification process remains challenging and relevant. They modify existing case studies to incorporate modern design patterns, industry best practices, and evolving regulatory requirements. Outdated or redundant case studies are deleted to maintain a streamlined and effective certification process.

Service Blue Print

A Service Blueprint is a detailed visual representation of a service process, illustrating interactions between users, system components, and backend processes. It provides a structured framework for understanding how a service functions by mapping out key elements such as customer actions, employee roles, supporting systems, and process flows.

Key takeaways:

Grading the Aptitude Test takes approximately 3 hours. This process involves two primary tasks: evaluating answers and providing comments for incorrect responses. Since no specific data is available, we assume an equal time distribution of 50% for grading and 50% for feedback.
Grading the Architecture Submission takes approximately 8 hours. This involves three key tasks: reviewing the candidate's submitted architecture, assessing it against predefined criteria, and writing detailed feedback. Without precise data, we assume an equal time distribution of 33% for understanding the submission, 33% for grading, and 33% for providing feedback.
Reducing candidate wait times is possible but entirely depends on the time required for test validation and grading. Therefore, wait time improvements cannot be addressed in isolation.

Context Viewpoint

Describes the relationships, dependencies, and interactions between the system and its environment (the people, systems, and external entities with which it interacts).

A detailed description of the system's logical structure can be found in a separate document. Here, we will point out the most critical aspects of the current structure.

Level 2 - Container diagram - Certification Platform

The Container diagram shows the high-level shape of the software architecture and how responsibilities are distributed across it. It also shows the major technology choices and how the containers communicate.

We propose a revised logical organization for the system, with slight modifications from the current structure. We aim to re-group components into the following categories to clarify future system changes better.

Note

If the existing structure differs from our assumptions, a prerequisite implementation step will be introduced before any proposed AI-related changes can be implemented. This step would involve aligning logical structure with the necessary structure to support AI integration.

Workflow:

Candidate Space
Responsible for interaction with Candidates. Includes Candidate Testing UI, Candidate Registration, Candidate Status, and Notification service. Here, Candidates can sign up, take tests, and receive notifications when test validation results are available.
Expert and Admin Space
Responsible for handling interactions with Experts, Designated Experts, and Administrators. Here, Experts and Designated Experts can collaborate to create and modify existing Tests and Case Studies, grades, submitted tests, and architecture solutions. Administrators and Experts can also manage Expert user profiles here.
Aptitude Test
Service responsible for organizing the Aptitude Test process. It delivers the test to Candidates and accepts their answers. It automatically grades multiple-choice questions and presents short answers for manual grading. It also accepts grades and feedback submitted by Experts.
Architecture Solution Exam
Service responsible for organizing the Case Study Test process. It randomly selects a Case Study for the Candidate and accepts their solution. It presents the submitted solution to the Expert for evaluation and accepts grades and feedback.
Certified Architects Public Space
Service responsible for generating, storing, and distributing Certificates to Candidates and external HRs. It also generates a notification with the results of the Architecture Solution Exam and Certificate information.

In addition to the new structure, we have made one key assumption. There is no information on how experts' time is tracked, who reviews it, or how their salaries are managed. While accounting is not our primary focus, understanding the time experts spend per test is crucial for the future changes we plan to introduce.

We assume that experts submit the time spent with the validated test or architecture submission.

Informational Viewpoint

Describes how the architecture stores, manipulates, manages, and distributes information.

Understanding what data is stored and where is critical for any system is especially relevant as we consider future AI-driven enhancements. Unfortunately, we have limited information about the data structures used in the system. The diagrams offer insights into the names of data objects and their relationships, but their exact contents remain unknown, requiring us to make assumptions.

Note

If the existing data model differs from our assumptions, it will introduce a prerequisite implementation step before any proposed AI-related changes can be implemented. This step would involve aligning the data model with the necessary structure to support AI integration.

The diagram below represents our best estimation of what the data model should look like.

We will not describe every object in the diagram, but we will focus on two key ones:

Graded Aptitude test submission: is a historical dataset that contains the following information:
- Aptitude test questions as they were at the time of test validation
- Multiple choice answers and grades
- Short Answers, Grades, and Expert Feedback
- We assume that Grades and Feedback are stored per each Question/Answer
- Expert ID and Time it took to validate the test and provide feedback
Graded Architecture submission: is a historical dataset which contains the following information:
- Case Study and Grading Criteria as they were at the time of test validation
- Grades based on each Criteria and Expert Feedback
- We assume that Grade and Feedback are stored per each Criteria
- Expert ID and Time it took to validate the test and provide feedback

We assume that data from these datasets is never deleted and contains submissions, grades, and feedback for 120,000 candidates who have already completed the certification process.

Cost Perspective

Evaluates the financial impact of architectural decisions, balancing implementation, operation, and scalability costs with business value.

Candidates pay $800 per certification. The validation process alone takes 11 hours of an Expert’s time, which is compensated at $50 per hour, resulting in a $550 validation cost, accounting for 68% of the test fee.

Additionally, there are other costs to consider, including hosting expenses, designated experts’ time for maintaining tests and case studies, and administrator salaries. While not all candidates will pass Test 1, potentially reducing average validation costs, relying on failure rates as a cost-saving strategy is impractical. Therefore, we assume the full validation cost applies to every candidate to ensure accurate financial planning.

Quality Perspective

Focuses on ensuring the system meets defined Service Level Agreements (SLAs) by maintaining reliability, performance, accuracy, and user expectations.

Here are the requirements provided:

As a recognized leader in certification, accuracy of tests, case studies, and grading is fundamental, and inaccurate grading can result in a candidate not getting or maintaining a job, which can impact a candidate's career.
Inaccurate or misleading certification exams and case studies can undermine the credibility of the company’s current standing in the marketplace, so the accuracy of the certification process is vital for the company's success.

Given the lack of additional information, we must make the following assumptions:

There are no existing quality control measures to ensure grading accuracy.
There is no formal appeals process that allows candidates to challenge grading errors made by Experts.

Assumptions

For the 3 hours an Expert spends on Aptitude Test validation, we assume the time is evenly split: 50% for grading and 50% for providing feedback.
For the 8 hours an Expert spends on Case Study validation, we assume the time is evenly distributed: 33% for understanding the submission, 33% for grading, and 33% for providing feedback.
We assume the system automatically tracks the time an Expert spends on validating tests and stores this information in a designated location.
We assume there is no established retention period, and the database stores graded answers and architecture submissions of 120,000 candidates who have already completed the certification process.
We assume that each created Case Study includes a comprehensive evaluation rubric containing a detailed set of assessment criteria.
We assume that for the Aptitude Test, the Grade and Feedback are recorded for each Question/Answer, whereas for the Case Study Test, they are recorded for each Criterion.
We assume the full validation cost of $550 applies to every candidate, regardless of pass or fail rates.
We assume there are no established quality control measures to verify grading accuracy.
We assume there is no formal appeals process that allows candidates to challenge grading errors made by Experts.

Challenges and Opportunities

Scalability is a Major Challenge

The company currently employs 300 Experts to validate tests for 200 candidates per week. Scaling up to 1,000 candidates per week would require either longer wait times (which is unacceptable) or hiring significantly more Experts. Hiring more Experts would also necessitate additional managerial roles and support staff (e.g., Administrators, Accountants, HR personnel), further increasing operational costs. As a result, the cost per test would continue to rise, negatively impacting profitability.

Opportunity: Investing in automation is essential to ensure the company's long-term viability

High-Cost Model

Currently, the company spends $550 per test validation, which accounts for 68% of the $800 certification fee. This is a significant expense, and the primary cost driver is the time Experts spend on validation. Reducing validation time is key to lowering costs, and AI can play a significant role in optimizing productivity.

Opportunity: AI-driven productivity enhancements can significantly reduce validation time, leading to lower costs per test and increased operational efficiency

Current Expert Compensation Model Discourages Efficiency

Experts are paid per hour, meaning there is no incentive for them to work faster or process more tests. AI assistance can only succeed if Experts are motivated to use it effectively.

A better approach would be a per-test payment model instead of hourly pay.

Currently, grading an Aptitude Test takes 3 hours, earning an Expert $150 ($50 per hour).
If AI-assisted grading reduces validation time to 1.5 hours, and we pay $100 per test, an Expert could validate two tests in the same 3-hour period, earning $200 instead of $150.
At the same time, the company’s cost per test would decrease from $150 to $100, improving efficiency and profitability.

Opportunity: Transitioning to a per-test payment model would incentivize Experts to work faster and maximize efficiency, benefiting both Experts and the company

High-Quality Expectations Limit Full Automation

Given the strict accuracy and reliability requirements, fully automating the grading process is not viable. A human must remain in control to make final grading decisions. Instead of replacing Experts, AI should function as an assistant, helping them validate tests faster and more accurately.

Requirement: AI should be used as an expert assistant, speeding up grading rather than replacing human decision-making

Lack of a Measurable Grading Quality Process

Despite high expectations for grading quality, there is no formalized process to measure it. Establishing a quality baseline is crucial before making system changes. Experts already make mistakes, and incorporating candidate feedback loops is essential for assessing grading accuracy. A human-only baseline must be established to track improvements as AI-assisted grading is introduced.

Requirement: A quality control process must be implemented before system improvements, ensuring that grading accuracy can be measured and improved over time

No Defined Process for Tracking Validation Time

There is no mention of how validation time is currently tracked, yet it is a key efficiency metric for AI-assisted improvements. A proper measurement system must be implemented to ensure progress in reducing validation time.

Requirement: Tracking validation time is critical for evaluating AI effectiveness and must be established before automation is introduced

Proposed Architecture

Decisions

Before we dive into architectural proposals, we need to align on fundamental decisions being made:

ADR 1: AI/ML Development Principles

AI/ML development within our architecture must adhere to the following principles:

Multiple AI/ML solutions must be developed in parallel rather than assuming a single model will be optimal.
Real-world testing must validate reliability, using empirical performance data to select the most effective solution.
Iterative evaluation and refinement must continuously ensure that models evolve based on actual usage conditions.
Decisions on model selection should be data-driven, prioritizing solutions that demonstrate superior real-world effectiveness.
Fallback mechanisms should be in place, allowing for seamless transitions if a model underperforms or becomes unreliable over time.

Embedding these principles into our AI/ML development lifecycle ensures we make data-driven, real-world-validated decisions while maintaining adaptability and resilience against unforeseen challenges.

ADR 2: Current Expert Compensation Model Discourages Efficiency

To encourage efficiency and align expert compensation with performance, the following changes are proposed:

Transition from an hourly compensation model to a per-evaluation payment structure, ensuring experts are rewarded based on completed reviews rather than time spent.
Introduce performance-based incentives, providing bonuses for experts who consistently deliver high-quality, accurate, and timely evaluations.
Encourage AI-assisted grading, allowing experts to use automated tools to accelerate their work while maintaining oversight and accuracy.

Implementing this new compensation model will significantly improve efficiency, scalability, and sustainability while ensuring grading remains accurate and fair.

ADR 3: AI/ML as an Assistant, Not a Fully Automated System

AI/ML usage within our system will adhere to the following principles:

AI/ML will only function as an assistant, never as a fully automated decision-maker.
Experts will retain complete control over all grading and certification decisions, with AI/ML acting in a supportive role.
AI-generated recommendations will require human validation before being acted upon.
Transparency in AI suggestions must be maintained, ensuring that experts understand how recommendations are derived.
AI-assisted tools should focus on increasing efficiency rather than replacing human expertise.

By embedding these principles, we ensure that AI remains a trusted assistant rather than an autonomous decision-maker, preserving the credibility and quality of our certification process.

ADR 4: Implementing a Quality Control Process Before System Improvements

To maintain high standards of accuracy and reliability, the following quality control measures will be implemented:

Anomaly Detection Process: Validate grades against similar past submissions to identify and correct deviations, ensuring consistency in grading standards.
Introduce Appeal Process: Implement a structured process allowing candidates to formally contest grades, capturing potential false negative errors and ensuring fairness.
Baseline Performance Metrics: Establish a benchmark for current expert grading accuracy and consistency to compare against future performance.
Quality Audits: Regular reviews of expert grading accuracy to identify inconsistencies, ensure adherence to defined evaluation criteria, and identify those needing additional training or recalibration.

By establishing a rigorous quality control process, we ensure that empirical data back all future system enhancements, improve expert grading accuracy, and uphold the credibility of our certification program.

ADR 5: Measure Validation Time to Assess AI Effectiveness

To assess AI effectiveness in validation processes, the following measures will be implemented:

Track Expert Validation Time: We assume expert validation time tracking is already in place. If not, it should be implemented to measure how long experts spend grading each submission.
Establish Baseline Metrics: Capture pre-automation validation times to compare against AI-assisted workflows.
Monitor AI Impact on Efficiency: Regularly evaluate whether AI reduces validation time while maintaining grading quality.
Optimize Processes Based on Data: Use time measurement insights to refine AI models and improve efficiency.

By measuring validation time, we establish a concrete framework for assessing AI effectiveness, ensuring automation leads to real efficiency gains without compromising grading quality.

ADR 6: AI Assistant Implementation Using Microkernel Architecture

We have decided to target the following architectural characteristics:

Cost: Lowering costs is a primary factor for introducing the AI Assistant. The cost should be lower than the current manual grading approach.
Evolvability: AI is continuously evolving, and the architecture must support easy integration of new components or replacement of existing ones.
Simplicity: AI systems are inherently complex, so our architecture should not introduce unnecessary additional complexity.

Based on our targeted architectural characteristics, the Microkernel architecture was chosen:

Cost: The Microkernel architecture enables modular AI components, reducing overall system complexity and minimizing operational costs.
Evolvability: The plug-in architecture supports easily integrating new AI models or replacing existing ones as AI technologies evolve.
Simplicity: The Microkernel approach maintains a clear separation between the core system and AI extensions, ensuring that complexity remains manageable.

Adopting a Microkernel architecture ensures that the AI Assistant remains adaptable, scalable, and maintainable, aligning with long-term business and technical goals while minimizing risks associated with AI deployment.