February 15, 2026

From Analyst Agents to Energy Caching: How the Pipeline Evolved

architectureengineering

The current architecture of Astrology Bot isn't the first version. Before it, there was a different approach — simpler and cheaper — that had to be abandoned. Not because of cost — because of instability.

The Old Approach: GPT-5-mini and Interpretive Freedom

The first version ran on GPT-5-mini. The pipeline was straightforward:

◆User asks a question
◆Planner builds a plan: which chart elements need to be analyzed (4–10 points)
◆For each point — a separate call: spec generation, data extraction from the chart, analysis
◆Synthesizer assembles everything into a final response

GPT-5-mini didn't need rigid structure — it consumed generous specs and produced something. In places, it was decent. It was cheap.

The problem was elsewhere.

Why It Had to Go

The same chart element was interpreted differently depending on the question. Ask about relationships — the model emphasizes the 7th house. Ask about personality — and the same relationships are suddenly described through Venus, with different emphases and different conclusions. Both answers could be correct in their own way, but they contradicted each other.

For an astrological service, this is unacceptable. If someone asks two questions and gets two different descriptions of the same element in their chart — trust in the system collapses. Not because the answers are bad, but because they're inconsistent.

The second problem was the lack of accumulation. Every question generated everything from scratch. The tenth question cost the same as the first. No "learning" about a specific chart was happening.

The Key Insight: Chart Energy Is Static

At some point it became obvious: the energy of the 3rd house in a natal chart is the same regardless of the question. Rulers, aspects, planets in the house, tensions and supports — none of this changes depending on whether the person is asking about money or relationships.

Which means it can be described once — in detail, structured, covering both modes of manifestation (as external pressure and as a personal instrument) — and reused in any analysis. I write about these energy descriptions separately — it's the most labor-intensive part of the project.

This insight turned the architecture upside down.

The Current System: 4 Processing Flows

Now every question passes through an intent classifier — a lightweight Sonnet call that determines the request type. Then it's routed to one of four flows:

GENERAL_OVERVIEW — "tell me about the personality," "what's interesting in this chart?" A fixed plan of 12 houses, all texts already cached (generated during chart initialization). They load instantly, leaving only synthesis — the most challenging part in terms of prompts.

FULL_ANALYSIS — a specific question about a topic: "how's my career?", "tell me about relationships." The planner builds a plan of 4–10 points (houses and planets), the system checks the cache, generates only the missing texts, then synthesizes the response. Every text generated for this analysis is saved and will be available for future questions.

FOLLOW_UP — clarification or continuation within the current topic. Doesn't require a new plan, but carries in context all the session's energy texts, the core personality, and the entire conversation. Can pull in additional data if the discussion drifts into an adjacent topic.

SIMPLE_CHAT — greetings, general questions, "what can you do?" A single model call, no astrology.

Two Models, Divided Roles

Preparatory agents — the classifier, planner, energy text generators — always run on Sonnet. It's a fast, inexpensive model that handles structured tasks well.

The final response to the user — the synthesis — goes through the model chosen by the user themselves: Sonnet (faster and cheaper) or Opus (deeper and more expensive). It's at the synthesis stage that the quality difference shows: Opus is better at spotting non-obvious connections between themes and builds a more cohesive portrait.

What This Achieved

Consistency. One chart element — one interpretation. The third house is described once and doesn't change from question to question. If the model produced a quality energy description — that text works in any context.

Accumulation. Every question "warms up" the chart. The first analysis is the most expensive because the system generates new texts. The second is cheaper. The tenth — cheaper still. The chart grows richer with every question.

The switch to Claude. The old approach ran on GPT-5-mini, which didn't need rigid structure. The new one requires precise adherence to the energy description format — and this is exactly where it turned out that GPT can't handle it, while Sonnet does just fine.