Skip to content

Schema Design

This document captures the thinking process for the core data model and workflow design in RetroCast. It is a more elaborate version of the concepts overview page and is the primary place to build a mental model of the codebase.

Goals

RetroCast is designed to handle two broad use cases:

  • casting arbitrary planner output into canonical Routes
  • evaluating those outputs for one target or many targets

Route

In RetroCast, a Route is an AND/OR tree of Molecule and Reaction nodes.

Route -> Molecule -> Reaction -> Molecule -> Reaction -> ...
graph TD
    r["Route"] --> m0["Molecule (target)"]
    m0 --> rx0["Reaction"]
    rx0 --> m1["Molecule"]
    rx0 --> m2["Molecule"]
    m1 --> rx1["Reaction"]
    rx1 --> m3["Molecule"]
    rx1 --> m4["Molecule"]

Basic schema

class Molecule(BaseModel):
    smiles: SmilesStr
    inchikey: InChIKeyStr
    product_of: Reaction | None = None
    annotations: dict[str, Any] = Field(default_factory=dict)


class Reaction(BaseModel):
    reactants: list[Molecule]
    mapped_reaction_smiles: ReactionSmilesStr | None = None
    template: str | None = None
    reagents: list[SmilesStr] | None = None
    solvents: list[SmilesStr] | None = None
    annotations: dict[str, Any] = Field(default_factory=dict)


class Route(BaseModel):
    target: Molecule
    annotations: dict[str, Any] = Field(default_factory=dict)
    schema_version: str = "2"

Identical molecules in different positions (e.g. same building block used in two branches) are different nodes; whence a Route is a tree, not just a DAG. Primarily because enforcing a 1 molecule = 1 node would introduce operational (serialization, signatures) complexity without any clear/obvious benefit beyond just marginally smaller disk usage.

Route path

RetroCast uses deterministic paths to refer to molecules and reactions inside a Route. The full grammar lives in Route Node IDs; but here's a useful cheat sheet:

  • rc:m:/ root target molecule
  • rc:r:/ root reaction
  • rc:m:/0 first reactant under rc:r:/
  • rc:r:/0 reaction producing rc:m:/0
  • rc:m:/1/0 first child under rc:m:/1

these IDs are derived in memory and are not serialized. internally, they should be represented as typed addresses, not repeatedly parsed strings:

class RoutePath(BaseModel):
    kind: Literal["m", "r"]
    indices: tuple[int, ...] = ()

    @classmethod
    def parse(cls, value: str) -> RoutePath: ...

    @classmethod
    def target(cls) -> RoutePath: ...

    @classmethod
    def root_reaction(cls) -> RoutePath: ...

    def id(self) -> str: ...
    def depth(self) -> int: ...

    def is_molecule(self) -> bool: ...
    def is_reaction(self) -> bool: ...

    def produced_by(self) -> RoutePath: ...
    def product(self) -> RoutePath: ...
    def reactant(self, index: int) -> RoutePath: ...

semantics:

  • RoutePath.target() is rc:m:/
  • RoutePath.root_reaction() is rc:r:/
  • RoutePath.parse("rc:m:/1/0").produced_by() is rc:r:/1/0
  • RoutePath.parse("rc:r:/1/0").product() is rc:m:/1/0
  • RoutePath.parse("rc:r:/1/0").reactant(2) is rc:m:/1/0/2

for convenience, we define the following subtypes:

def validate_reaction_id(value: str) -> str:
    path = RoutePath.parse(value)
    if not path.is_reaction():
        raise ValueError("reaction id must identify a reaction node, e.g. 'rc:r:/1/0'")
    return value

def validate_molecule_id(value: str) -> str:
    path = RoutePath.parse(value)
    if not path.is_molecule():
        raise ValueError("molecule id must identify a molecule node, e.g. 'rc:m:/1/0'")
    return value

ReactionId = Annotated[str, AfterValidator(validate_reaction_id)]
MoleculeId = Annotated[str, AfterValidator(validate_molecule_id)]

Route Signatures

Route signatures give us a canonical way to talk about route structure without carrying around the whole tree or comparing nested objects by hand. They are the basis for route comparison: full-route equality, reaction equality, prefix matching to depth k, and subtree containment. The core idea is Merkle-like: the signature of a parent is built from its own identity plus the signatures of its children. Signatures are:

  • order-invariant over reactant ordering
  • preserve multiplicity when the same reactant appears more than once
  • and can be parameterized by match level when needed.

Molecule Identity

We use InChiKeys as molecular identity. RetroCast currently supports three levels of InChiKey specificity:

  • retrocast.chem.InChIKeyLevel.FULL - full 27-char InChIKey
  • retrocast.chem.InChIKeyLevel.NO_STEREO - 27-char InChIKey generated without stereochemical information
  • retrocast.chem.InChIKeyLevel.CONNECTIVITY - first 14 chars, connectivity layer only

Most users should use the default FULL level, but sometimes a model developer might wish to disambiguate planner's failure to account for proper stereochemistry from more fundamental failures to find the right connectivity (wherefore he might use NO_STEREO). Or might want to ignore isotope/protonation differences (wherefore CONNECTIVITY).

class Molecule(BaseModel):
    ...

    def key(self, match_level: InChIKeyLevel = InChIKeyLevel.FULL) -> str:
        return reduce_inchikey(self.inchikey, match_level)
    def signature(self, match_level: InChIKeyLevel = InChIKeyLevel.FULL) -> str:
        return stable_hash(self.key(match_level))

Reaction Identity

At the most basic structural level, a reaction identity is defined by the structures of reactants and product. Defining key and signature method on a Reaction class is not possible without having a pointer to the Reaction product. There are two options:

  • treat Reaction as a Route-specific occurrence object (with an explicit pointer to its product), but that requires writing custom serialization logic and ensuring loaded Reaction objects are always "hydrated" with proper parent references. Violates the spirit of SRP and I'm a bit WebDev-brained not to think of the Data-View split analogy, so instead
  • we create a ReactionView model that provides a required route-contextual representation of a Reaction.
class ReactionView:
    route: Route
    path: RoutePath
    value: Reaction

    def product(self) -> MoleculeView: ...
    def reactants(self) -> list[MoleculeView]: ...

    def key(self, match_level: InChIKeyLevel = InChIKeyLevel.FULL) -> tuple:
        return (
            "rxn",
            self.product().key(match_level),
            tuple(sorted(r.key(match_level) for r in self.reactants())),
        )

    def signature(self, match_level: InChIKeyLevel = InChIKeyLevel.FULL) -> str:
        return stable_hash(self.key(match_level))

for consistency in api design and ease of subtree comparison, we also define a similar view model for Molecule.

class MoleculeView:
    route: Route
    path: RoutePath
    value: Molecule

    def key(self, match_level: InChIKeyLevel = InChIKeyLevel.FULL) -> str:
        return self.value.key(match_level)

    def produced_by(self) -> ReactionView | None: ...

    def subtree_key(self, match_level=InChIKeyLevel.FULL, *, depth=None):
        if self.value.product_of is None or depth == 0:
            return ("mol", self.key(match_level))

        next_depth = None if depth is None else depth - 1
        reaction = self.produced_by()
        child_sigs = sorted(
            reactant.subtree_signature(match_level, depth=next_depth)
            for reactant in reaction.reactants()
        )

        return (
            "mol",
            self.key(match_level),
            reaction.key(match_level),
            tuple(child_sigs),
        )

    def subtree_signature(self, match_level=InChIKeyLevel.FULL, *, depth=None):
        return stable_hash(self.subtree_key(match_level, depth=depth))

Route Identity

With the primitives above, full route equality is established by the subtree signature of the target with unlimited depth. i.e. route.signature() is an alias for route.molecule_at("rc:m:/").subtree_signature(). A generic exact subtree equality is route.molecule_at(path).subtree_signature().

We often might be interested in asking how far along the plan (starting from the target) do any two Routes agree? Such prefix matching is simply a subtree signature of fixed depth k rooted at the target.

class Route(BaseModel):
    ...

    def key(
        self,
        match_level: InChIKeyLevel = InChIKeyLevel.FULL,
        *,
        depth: int | None = None,
    ) -> tuple:
        return self.target().subtree_key(match_level, depth=depth)

    def signature(
        self,
        match_level: InChIKeyLevel = InChIKeyLevel.FULL,
        *,
        depth: int | None = None,
    ) -> str:
        return stable_hash(self.key(match_level, depth=depth))

Content Signatures

The structural key / signature methods answer the basic route question: same molecules, same reaction graph. They do not read mapped reaction SMILES, templates, reagents, solvents, condition labels, or annotations.

Sometimes we want a stricter comparison: same structure, plus selected reaction content. For that, routes and route-bound views expose content_key / content_signature methods. The caller chooses which reaction fields matter:

route.content_signature(fields=("mapped_reaction_smiles",))
route.content_signature(fields=("template", "reagents", "solvents"))

Content signatures follow the same Merkle shape as structural signatures: molecule identity, reaction identity, selected reaction content, and unordered child signatures. They also support match_level and route-prefix depth.

Route Embedding

Route embedding asks whether one route occurs inside another route.

Route.signature() and MoleculeView.subtree_signature() answer exact equality for a chosen root. Embedding is looser in two ways: the query target may match an internal molecule in the container route, and the container may continue below a query leaf when the audit allows leaf extension.

retrocast.curation.embedding uses route views, paths, molecule keys, reaction signatures, and subtree signatures to produce audit traces:

class EmbeddingMatch:
    query_path: RoutePath
    container_path: RoutePath
    matched_reactions: int
    leaf_extensions: tuple[LeafExtension, ...]

    @property
    def root_shifted(self) -> bool: ...
    @property
    def leaf_extended(self) -> bool: ...

def find_route_embeddings(
    query: Route,
    container: Route,
    match_level: InChIKeyLevel = InChIKeyLevel.FULL,
    *,
    allow_leaf_extension: bool = True,
) -> tuple[EmbeddingMatch, ...]: ...

def route_embeds_at(
    query: MoleculeView,
    container: MoleculeView,
    match_level: InChIKeyLevel = InChIKeyLevel.FULL,
    *,
    allow_leaf_extension: bool = True,
) -> EmbeddingMatch | None: ...

The detailed matching rules live in Route Embedding. The important schema-design point is small: exact subtree equality is a signature check; embedding is a route-to-route matching check.

Workflows

Raw planner payloads can be adapted into Route objects, which constitute fully Tier-0 valid routes, or Candidate objects, which preserve failed slots for proper accounting of Solv-0. Canonical Candidates can be collected into predictions for each Target/Task. A collection of Tasks is a Benchmark.

Direct adapting is useful for single-target pipelines. For benchmarking, one can use ingest which is adapting and collecting into Benchmark predictions.

Benchmark predictions are keyed by target id. These predictions can be scored with Tier-N validity checks against TaskConstraint records, resulting in Solv-N values. Predictions can also be compared against Target.acceptable_routes to obtain Top-K reconstruction accuracy.

1. Adapt

By default, adapt tries to turn raw planner output into canonical Route objects and if fails, returns None. While it's a reasonable default for regular planning, it inflates the Tier-0 validity of the planner's predictions for strict benchmarking. As such, adapt can be configured to return Candidate objects which either hold a Route or a FailureRecord that specifies target_{id,smiles,inchikey} for proper accounting of failed predictions.

class FailureRecord(BaseModel):
    code: ErrorCode
    message: str | None = None
    target_id: str | None = None
    target_smiles: SmilesStr | None = None
    target_inchikey: InChIKeyStr | None = None
    context: dict[str, Any] = Field(default_factory=dict)


class Candidate(BaseModel):
    rank: int
    route: Route | None = None
    failure: FailureRecord | None = None

Adapt modes

By default, adapt returns a FailureRecord if even a single SMILES is invalid. Model developers might sometimes be interested in the validity of predictions up to the corrupted SMILES, and AdaptMode allows them to relax adapters to return Route/Candidate objects that contain the longest-possible valid prefix Route with the corrupted SMILES node and all its children pruned out.

AdaptMode = Literal["strict", "prune"]

Adapt API

adapt_route(raw_route_payload, adapter, *, mode: AdaptMode = "strict") -> Route | None

# for route-first inspection and ad hoc use
adapt_routes(
    raw_payload,
    adapter,
    *,
    mode: AdaptMode = "strict",
    max_routes: int | None = None,
) -> list[Route]

# for benchmarking and honest solv/tier-N metrics
adapt_candidates(
    raw_payload,
    adapter,
    *,
    mode: AdaptMode = "strict",
    max_candidates: int | None = None,
) -> list[Candidate]

max_routes is a route-first convenience limit: failures are skipped and only successful Routes count toward the limit. max_candidates is the benchmark-safe limit: it processes the first N raw candidate slots and preserves failures, so Tier-0 validity and MRR remain honest.

Benchmark CLI ingestion uses the candidate-preserving path. Route-only adaptation remains available as a library convenience for ad hoc inspection.

2. Collect

Collection maps adapted outputs onto known targets.

CollectedCandidates = dict[str, list[Candidate]] # where str is target_id
CollectedRoutes = dict[str, list[Route]]

collect_candidates(
    candidates: Iterable[Candidate],
    task: Task,
) -> CollectedCandidates

collect_routes(
    routes: Iterable[Route],
    task: Task,
) -> CollectedRoutes

collection rules:

  • if candidate.route exists, place it by route.target
  • otherwise place it by candidate.failure.target_id / candidate.failure.target_inchikey

where Task is defined through Targets and TaskConstraints:

class Target(BaseModel):
    id: str
    smiles: SmilesStr
    inchikey: InChIKeyStr
    acceptable_routes: list[Route] = Field(default_factory=list)
    annotations: dict[str, Any] = Field(default_factory=dict)


class TaskConstraint(BaseModel):
    kind: str
    model_config = ConfigDict(extra="allow")


class StockTerminationConstraint(TaskConstraint):
    kind: Literal["retrocast.stock_termination"] = "retrocast.stock_termination"
    stock: str


class RequiredLeavesConstraint(TaskConstraint):
    kind: Literal["retrocast.required_leaves"] = "retrocast.required_leaves"
    smiles: list[SmilesStr]


class RouteDepthConstraint(TaskConstraint):
    kind: Literal["retrocast.route_depth"] = "retrocast.route_depth"
    max_depth: int | Literal["short", "medium", "long"]


class Task(BaseModel):
    name: str
    targets: dict[str, Target]
    default_constraints: list[TaskConstraint] = Field(default_factory=list)
    constraints: dict[str, list[TaskConstraint]] = Field(default_factory=dict)
    metric_label: str | None = None
    annotations: dict[str, Any] = Field(default_factory=dict)
    schema_version: str = "2"

    def effective_constraints(self, target_id: str) -> list[TaskConstraint]:
        by_kind = {constraint.kind: constraint for constraint in self.default_constraints}
        by_kind.update({constraint.kind: constraint for constraint in self.constraints.get(target_id, [])})
        return list(by_kind.values())


class Benchmark(Task):
    name: str
    description: str

route_depth integers are inclusive maximum depths. Named depth constraints are ranges:

  • short: depth 1-3
  • medium: depth 4-6
  • long: depth 7+

One benchmark is a Task. One daedalus query is also a Task, usually with one target.

Within one target, constraints are keyed by kind. A target-specific constraint with the same kind overrides the default constraint. RetroCast-owned constraints use the retrocast. namespace. Custom constraints should use a project namespace, e.g. ariadne.reaction_count.

metric_label controls the bracketed task name in Solv-N metrics, e.g. Solv-0[buyables]. If unset, the label is derived from effective target constraints in fixed order: stock label, leaf, depth (e.g. buyables+depth or buyables+leaf).

3. Ingest

Ingest is just the convenience alias for adapt + collect

# when the user wants only valid canonical routes
ingest_routes(raw_payload, adapter, task, *, max_routes: int | None = None) -> CollectedRoutes
# when the user wants an honest evaluation artifact
ingest_candidates(raw_payload, adapter, task, *, max_candidates: int | None = None) -> CollectedCandidates

max_candidates is applied per target during benchmark ingestion. It is intentionally first-N by raw planner rank; random candidate sampling is not part of benchmark ingestion because planner rank is part of the measured behavior.

4. Score

The ultimate method for scoring any synthesis plan is by passing it through the Solv-N filter, which is defined as:

Solv-i[task] = Tier-i validity + satisfaction of task constraints.

Tier-N validity is stored as TierResult. RouteValidity stores the validity of a Route as a whole and of all individual Reactions that compose it.

class CheckStatus(StrEnum):
    PASS = "pass"
    FAIL = "fail"
    NOT_EVALUATED = "not_evaluated"


class Tier(IntEnum):
    ZERO = 0
    ONE = 1
    TWO = 2
    THREE = 3

class CheckResult(BaseModel):
    code: str
    status: CheckStatus = CheckStatus.FAIL
    message: str | None = None
    details: dict[str, Any] = Field(default_factory=dict)


class TierResult(BaseModel):
    status: CheckStatus
    checks: list[CheckResult] = Field(default_factory=list)


class ReactionValidity(BaseModel):
    reaction_id: ReactionId  # rc:r:/1/0
    tiers: dict[Tier, TierResult] = Field(default_factory=dict)


class RouteValidity(BaseModel):
    tiers: dict[Tier, TierResult] = Field(default_factory=dict)
    reactions: list[ReactionValidity] = Field(default_factory=list)

Task constraint satisfaction is stored separately:

class ConstraintResult(BaseModel):
    status: CheckStatus
    checks: list[CheckResult] = Field(default_factory=list)

A Candidate that undergoes scoring becomes ScoredCandidate.

class ScoredCandidate(BaseModel):
    rank: int
    route: Route | None = None
    failure: FailureRecord | None = None

    validity: RouteValidity = Field(default_factory=RouteValidity)
    constraints: ConstraintResult = Field(
        default_factory=lambda: ConstraintResult(status="not_evaluated")
    )

    matches_acceptable: bool = False
    matched_acceptable_index: int | None = None

    def has_route(self) -> bool: ...

    def failed_adaptation(self) -> bool: ...

    def tier_result(self, tier: Tier | int) -> TierResult: ...

    def reaction_tier_result(
        self,
        reaction_id: ReactionId,
        tier: Tier | int,
    ) -> TierResult | None: ...

    def satisfies_validity(self, tier: Tier | int) -> bool: ...

    def satisfies_task(self) -> bool: ...

    def satisfies_solv(self, tier: Tier | int) -> bool: ...

ScoredCandidates are collected into TargetResult, which in turn is collected into Evaluation.

class TargetResult(BaseModel):
    target: Target
    effective_constraints: list[TaskConstraint]
    candidates: list[ScoredCandidate] = Field(default_factory=list)


class Evaluation(BaseModel):
    task: Task
    tiers: list[Tier] = Field(default_factory=list)
    acceptable_match_level: InChIKeyLevel = InChIKeyLevel.FULL
    acceptable_route_match: AcceptableRouteMatch = AcceptableRouteMatch.EXACT
    targets: dict[str, TargetResult] = Field(default_factory=dict)
    schema_version: str = "2"

An intentional violation of single-responsibility principle

In principle, Tier-0 validity should be assessed at the score workflow stage. The cleanest design would then be if adapt always returned an equivalent of Candidates (or a Route was extended to hold the information of Candidate), but that would result in subpar UX for every use case outside of benchmarking. As a result, we intentionally allow for a slight leakage of responsibility between adapt and score (i.e., adapt without `--preserve-failed-candidates returns Tier-0 valid Routes.)

Score API

Because we're far from having a universal Tier-2 validity checker, it is natural to expect multiple solutions emerging from different research groups. To enable modularity of scoring, we define a TierChecker protocol.

class TierChecker(Protocol):
    tier: Tier
    name: str

    def check_route(self, route: Route) -> RouteValidity: ...

Similarly, we define a protocol for task constraint satisfaction:

class TaskConstraintChecker(Protocol):
    kind: str

    def check_route(
        self,
        route: Route,
        constraint: TaskConstraint,
    ) -> list[CheckResult]: ...

This results in the following API:

def score_candidate(
    candidate: Candidate,
    *,
    target: Target,
    constraints: Sequence[TaskConstraint],
    tier_checkers: Sequence[TierChecker],
    constraint_checkers: Sequence[TaskConstraintChecker],
    acceptable_match_level: InChIKeyLevel | None = None,
    acceptable_route_match: AcceptableRouteMatch = AcceptableRouteMatch.PREFIX,
) -> ScoredCandidate: ...


def score_target(
    candidates: Sequence[Candidate],
    *,
    target: Target,
    constraints: Sequence[TaskConstraint],
    tier_checkers: Sequence[TierChecker],
    constraint_checkers: Sequence[TaskConstraintChecker],
    acceptable_match_level: InChIKeyLevel | None = None,
    acceptable_route_match: AcceptableRouteMatch = AcceptableRouteMatch.PREFIX,
) -> TargetResult: ...


def score(
    predictions: Mapping[str, Sequence[Candidate]],
    task: Task,
    *,
    tier_checkers: Sequence[TierChecker] = (),
    constraint_checkers: Sequence[TaskConstraintChecker] = (),
    acceptable_match_level: InChIKeyLevel | None = None,
    acceptable_route_match: AcceptableRouteMatch = AcceptableRouteMatch.PREFIX,
) -> Evaluation: ...

How task constraint checkers work?

There are two components. Task.effective_constraints(target_id) holds information on what constraint should be satisfied. That constraint is passed to .check_route within the score workflow. For example:

constraints = benchmark.effective_constraints("target-1")
# [
#   StockTerminationConstraint(stock="buyables-stock"),
#   RequiredLeavesConstraint(smiles=["CCO"]),
# ]

for constraint in constraints:
    checker = registry[constraint.kind]
    checks.extend(checker.check_route(route, constraint))

For example, for a regular retrosynthesis benchmark (task constraint is stock termination):

For a stock-only task:

benchmark = Task(
    name="mkt-cnv-160",
    targets=targets,
    default_constraints=[
        StockTerminationConstraint(stock="buyables-stock"),
    ],
)

score(
    predictions,
    task=benchmark,
    constraint_checkers=[
        StockTerminationChecker(
            stocks={"buyables-stock": buyables}, # buyables is Set[InChIKeyStr]
            match_level=InChIKeyLevel.FULL,
        ),
    ],
)

For a bidirectional search problem (stock-plus-required-leaf task):

benchmark = Task(
    name="mkt-cnv-160-leaf",
    targets=targets,
    default_constraints=[
        StockTerminationConstraint(stock="buyables-stock"),
    ],
    constraints={
        "target-1": [
            RequiredLeavesConstraint(smiles=["CCO"]),
        ],
    },
)

and scoring needs checkers for both kinds:

score(
    predictions,
    task=benchmark,
    constraint_checkers=[
        StockTerminationChecker(
            stocks={"buyables-stock": buyables},
            match_level=InChIKeyLevel.FULL,
        ),
        RequiredLeavesChecker(match_level=InChIKeyLevel.FULL),
    ],
)

This architecture allows for definition of custom TaskConstraint for a given Task and so long as you define a complementary TaskConstraintChecker, you can use the general score workflow.

Notably, the score workflow enforces that requirement. If you try to pass a Task with constraint of kind ariadne.reaction_count, but did not specify a corresponding kind checker, you'd get a runtime error. RetroCast treats unverifiable constraints as unsolved by default.

4. Analyze

analyze should derive benchmark-style metrics from Evaluation.

class MetricSummary(BaseModel):
    value: float
    count: int
    ci_low: float | None = None
    ci_high: float | None = None


class AnalysisReport(BaseModel):
    metrics: dict[str, MetricSummary] = Field(default_factory=dict)
    by_stratum: dict[str, dict[str, MetricSummary]] = Field(default_factory=dict)
analyze(
    evaluation: Evaluation,
    *,
    ks: Sequence[int] = (1, 5, 10, 50),
    prefix_depths: Sequence[int] = (1, 2, 3),
    stratify_by: Callable[[TargetResult], str | None] | None = None,
) -> AnalysisReport

Top-K reconstruction metrics are emitted only for targets with acceptable_routes; if no target has acceptable_routes, reconstruction metrics are omitted. Reconstruction diagnostics use Evaluation.acceptable_match_level, so route, root-reaction, and prefix comparisons stay on the same molecular identity basis. Evaluation.acceptable_route_match records whether headline acceptable-route reconstruction used target-rooted prefix matching or exact full-route identity. Its model default is EXACT so legacy artifacts without the field are interpreted according to their original scoring semantics; new scoring writes the selected mode explicitly.