Case Study · Architecture
Repo Intelligence Platform
A platform that turns software repositories into queryable knowledge graphs, enabling AI systems to understand architecture, dependencies, and code relationships.
- Status
- ARCHITECT
- Year
- 2026
- Role
- Solo build
- Stack
- .NETPostgreSQLGraphRAG
Problem
Modern codebases have outpaced human comprehension. A single service may depend on dozens of others, span multiple repositories, and involve teams across time zones. The questions engineers actually ask — what breaks if I change this API, who owns the database this service writes to, what architectural decisions shaped this module — are difficult to answer even with full source access.
Traditional tools were not built for these questions. Search engines index text but not intent. Documentation drifts from implementation within months. Architecture diagrams, when they exist, capture a moment in time. Institutional knowledge accumulates in the heads of long-tenured engineers and leaves with them.
The result is an industry-wide tax on every architectural change. Hidden risk accumulates because the blast radius of a change is not visible. Onboarding takes months because the conceptual map is unwritten. Code reviews focus on local correctness because reviewers cannot easily evaluate systemic effects.
We need tools that reason about code the way engineers do — not as text, but as a connected system of intent, ownership, and consequence.
Architecture Overview
The platform is structured as a pipeline. Source repositories are ingested by an indexer that extracts entities and relationships. A graph builder assembles these into a persistent knowledge graph stored in PostgreSQL. On top of the graph, a retrieval layer combines semantic similarity with structural traversal. AI agents use this retrieval layer to reason over architectural context, and a developer-facing interface exposes queries through a natural-language surface.
The architecture separates concerns deliberately. Indexing, graph construction, retrieval, and reasoning are independent stages that can be improved, replaced, or scaled independently. The knowledge graph is the canonical artifact; everything else is a layer that consumes or produces graph data.
The tradeoffs favor observability and simplicity over framework complexity. A single canonical store means a single source of truth. A pipeline shape means each stage is independently debuggable. A separation between retrieval and reasoning means retrieval improvements propagate to every agent without code changes.
The cost is rigidity. Adding new entity types or relationship semantics requires changes that propagate through the pipeline. The platform is not optimized for the simple case — a single-file lookup still pays the cost of full graph traversal. We accept this tradeoff because the value of the platform lies in the questions that no existing tool can answer, not in the questions that grep already handles.
Architecture Diagram
The pipeline below shows the seven stages from source code to developer-facing answer. Each stage transforms the representation of the repository in a specific way: from files on disk to entities, from entities to a graph, from a graph to retrieved context, and from retrieved context to a model response.
Reading the diagram left to right, the platform trades a small number of well-defined transformations for the ability to reason about software at the architectural level.
The Repository Indexer is the entry point. It walks the repository, identifies source files, configuration, infrastructure definitions, and documentation, and emits a stream of structured entities. This stage is intentionally narrow: it extracts facts but does not interpret meaning.
The Knowledge Graph Builder consumes the entity stream and assembles a graph. Entities become nodes; relationships between entities become edges. The builder resolves duplicates, infers implicit relationships from co-location and naming conventions, and applies domain-specific heuristics.
The GraphRAG Layer sits between the graph and the agents. It accepts a natural-language query, converts it to an embedding, and runs both vector similarity search and structural traversal in parallel. The two result sets are merged, ranked, and assembled into a context window. AI Agents consume this context and produce answers grounded in the graph.
Knowledge Graph Model
The Repository acts as the root entity within the graph.
Every repository is decomposed into a collection of connected entities that describe both implementation details and architectural intent.
The graph currently models:
- Services
- APIs
- Databases
- Classes
- Methods
- Configuration
- Teams
- Documents
These entities are extracted from source code, infrastructure definitions, repository metadata, documentation, and ownership information.
Relationships provide the real value.
Instead of simply knowing that a service exists, the graph captures how that service interacts with APIs, databases, classes, and operational ownership structures.
As the graph evolves, additional entity types can be introduced without changing the retrieval layer, allowing the model to expand alongside the software ecosystem.
GraphRAG Retrieval Flow
Traditional vector search retrieves documents that are semantically similar to a query.
This works well for isolated pieces of information but struggles when answers depend on relationships across multiple parts of a system.
GraphRAG combines semantic retrieval with graph traversal.
When a developer submits a question, the query is converted into an embedding and used to retrieve relevant graph nodes.
The retrieval process then expands through connected entities, collecting architectural context that would not be discovered through vector similarity alone.
For example, a question about an API may surface the owning service, dependent databases, related documentation, and responsible team.
The resulting context package is ranked, assembled, and delivered to the language model for reasoning.
This approach produces responses that are grounded in system structure rather than isolated text fragments.
AI Agent Workflow
The agent layer provides specialized reasoning capabilities on top of the repository knowledge graph.
Instead of using a single general-purpose agent, the platform routes requests through specialized agents optimized for different engineering workflows.
The Code Agent focuses on implementation details, source code navigation, and dependency analysis.
The Architecture Agent focuses on system boundaries, service relationships, and architectural reasoning.
The Documentation Agent focuses on knowledge discovery, ownership information, and historical engineering decisions.
Each agent shares access to the same underlying graph and retrieval infrastructure while maintaining workflow-specific reasoning strategies.
This separation improves response quality while keeping the platform extensible as new engineering workflows emerge.
Technology Decisions
The platform prioritizes simplicity, observability, and extensibility over framework complexity.
| Decision | Choice | Rationale |
|---|---|---|
| Backend Platform | .NET | Strong tooling, performance, and maintainability |
| Data Store | PostgreSQL | Mature ecosystem, relational storage, extensibility |
| Retrieval Layer | GraphRAG | Combines semantic search with structural reasoning |
| Knowledge Representation | Knowledge Graph | Models relationships explicitly |
| LLM Integration | Claude | Strong reasoning and architecture understanding |
The overall philosophy is to use mature infrastructure wherever possible and introduce AI-specific components only where they provide measurable value.
Challenges
Building a repository intelligence platform introduces challenges that are not typically encountered in traditional software systems.
Knowledge Extraction
Extracting entities from source code is straightforward.
Extracting meaningful relationships is significantly harder.
Many architectural relationships exist only implicitly through conventions, configuration files, deployment patterns, or institutional knowledge.
Creating a graph that accurately reflects real-world architecture requires combining static analysis, metadata extraction, and domain-specific heuristics.
Context Quality
Retrieval quality determines response quality.
A language model can only reason over the context it receives.
The challenge is not generating answers but assembling the right architectural context before reasoning begins.
This led to a strong focus on retrieval design, graph traversal strategies, and context ranking mechanisms.
Future Roadmap
The current platform focuses on repository-level intelligence.
The long-term vision is broader.
Cross-Repository Intelligence
Future versions will connect multiple repositories into a unified engineering knowledge graph.
This will enable architectural reasoning across services, platforms, teams, and organizational boundaries.
Continuous Graph Evolution
The graph should evolve automatically as repositories change.
Future work includes event-driven indexing, incremental graph updates, and automated relationship validation to keep architectural knowledge continuously synchronized with the underlying codebase.