Writing a Custom Adapter¶
The adapter is the "air gap" between a model's internal representation and RetroCast's canonical Route schema. If you are integrating a new model, you will likely need to write a new adapter.
When do I need a custom adapter?
You need a custom adapter when integrating a new retrosynthesis model whose output format differs from existing adapters. RetroCast already supports 10+ models—check the supported adapters first!
The Adapter Contract¶
A RetroCast adapter is route-first. It inherits from BaseAdapter and implements
two explicit responsibilities:
iter_raw_entries(...): split a provider artifact into raw route-like entriescast(...): convert one raw route-like payload into one canonicalRoute
from collections.abc import Iterator
from typing import Any
from retrocast.adapters.base_adapter import BaseAdapter, RawRouteEntry
from retrocast.models.chem import Route, TargetIdentity
class MyModelAdapter(BaseAdapter):
def iter_raw_entries(
self,
raw_data: Any,
*,
source_key: str | None = None,
) -> Iterator[RawRouteEntry]:
for row_index, raw_route in enumerate(raw_data, start=1):
yield RawRouteEntry(
payload=raw_route,
source_key=source_key,
source_row_index=row_index,
source_order=row_index,
)
def cast(
self,
raw_route: Any,
*,
expected_target: TargetIdentity | None = None,
ignore_stereo: bool = False,
) -> Route:
"""
Args:
raw_route: One raw route-like payload from the provider artifact.
expected_target: Optional benchmark or workflow target to validate against.
ignore_stereo: Whether SMILES canonicalization should strip stereochemistry.
Returns:
One valid canonical Route object.
"""
target_molecule = ...
return Route(target=target_molecule, metadata={})
iter_raw_entries(...) is where corpus traversal lives. cast(...) should not
know or care whether the caller is adapting a flat corpus, a target-local payload,
or a benchmark-keyed prediction map.
Common Architecture Patterns¶
Most retrosynthesis models output data in one of three patterns. RetroCast provides helper functions (retrocast.adapters.common) to handle the heavy lifting for these patterns, including recursion and cycle detection.
Use the helpers whenever possible
The built-in helpers (build_molecule_from_bipartite_node, build_molecule_from_precursor_map) handle canonicalization, cycle detection, and validation automatically. Don't reinvent the wheel!
Pattern A: Bipartite Graph Recursion¶
Used by: AiZynthFinder, SynPlanner, Syntheseus
Helper: build_molecule_from_bipartite_node
In this pattern, the output is a nested JSON tree where Molecule nodes point to Reaction nodes, which point to reactant Molecule nodes.
To use this, your raw data structure must conform (via duck typing or Protocol) to the BipartiteMolNode interface: it must have smiles (str), type ("mol"), and children (list of reaction nodes).
from collections.abc import Iterator
from retrocast.adapters.base_adapter import BaseAdapter, RawRouteEntry
from retrocast.adapters.errors import adapter_target_mismatch
from retrocast.adapters.common import build_molecule_from_bipartite_node
from retrocast.models.chem import Route
class MyAdapter(BaseAdapter):
def iter_raw_entries(self, raw_data, *, source_key=None) -> Iterator[RawRouteEntry]:
validated = MyRawOutput.model_validate(raw_data)
for row_index, tree_root in enumerate(validated.trees, start=1):
yield RawRouteEntry(payload=tree_root, source_key=source_key, source_order=row_index)
def cast(self, raw_route, *, expected_target=None, ignore_stereo=False):
# The helper handles the recursive tree construction # (1)!
target_mol = build_molecule_from_bipartite_node(raw_route)
# Verify the root matches the expected target when one is provided
if expected_target is not None and target_mol.smiles != expected_target.smiles:
raise adapter_target_mismatch(...)
return Route(target=target_mol, metadata={})
- The helper automatically handles canonicalization and cycle detection
Pattern B: Precursor Map¶
Used by: Retro*, DreamRetro, SynLlama
Helper: build_molecule_from_precursor_map
In this pattern, the output is a flat list of reactions or a string representation (e.g., P >> R1.R2 | R1 >> R3). The tree structure is implicit in the connectivity.
You simply need to parse the raw format into a Python dictionary mapping Product SMILES -> [Reactant SMILES, ...].
from retrocast.adapters.base_adapter import BaseAdapter
from retrocast.adapters.common import build_molecule_from_precursor_map
from retrocast.models.chem import Route
class MyAdapter(BaseAdapter):
def cast(self, raw_route, *, expected_target=None, ignore_stereo=False):
# 1. Parse your model's specific string format
# Input: "target >> int_1.int_2 | int_1 >> sm_1.sm_2"
# Output: {"target": ["int_1", "int_2"], "int_1": ["sm_1", "sm_2"]}
precursor_map = self._parse_custom_string(raw_route["route_string"]) # (1)!
if expected_target is None:
raise ValueError("expected_target is required for this adapter")
# 2. Build the tree
# The helper walks the map recursively starting from the target SMILES # (2)!
target_mol = build_molecule_from_precursor_map(expected_target.smiles, precursor_map)
return Route(target=target_mol, metadata={})
- Implement this parsing method specific to your model's format
- The helper handles the recursive tree walking and validation
Pattern C: Custom / Mixed¶
Used by: DirectMultiStep (DMS), ASKCOS
Helper: None (roll your own)
Some models have unique structures that don't fit the above patterns (e.g., graphs defined by edge lists, or recursive trees that don't strictly alternate molecule/reaction nodes).
Reference implementation
See retrocast.adapters.dms_adapter for a complete example of implementing a custom recursive builder with cycle detection.
Flat Completion Corpora: Ursa LLM¶
UrsaLlmAdapter is the reference example of a route-first adapter for flat raw
artifacts. Its input is a completion corpus containing <synthesis_step> XML
blocks, usually as .jsonl or .jsonl.gz, not a benchmark-keyed results.json.gz
mapping.
- canonical adapter key:
ursa-llm - raw artifact shape: list of records with
completiontext andmeta.product_smiles - adapter seam:
iter_raw_entries(...)yields oneRawRouteEntryper completion rowcast(...)parses one completion string into one canonicalRoute
When meta.product_smiles is present, Ursa raw completions can be adapted into a
canonical route corpus directly, without a benchmark and without a preprocessing
step that re-keys the file by target. benchmark alignment happens later in the
explicit collect workflow.
Implementation Guidelines¶
Critical requirements
Your adapter must handle:
- Canonicalization - Use
retrocast.chem.canonicalize_smilesfor all SMILES - Cycle detection - Ensure no molecule appears twice in a path
- Target validation - Verify the root matches
expected_target.smileswhen an expected target is provided
Adaptation Errors¶
Adapter error code values are lowercase machine contracts. Keep adapter names in context["adapter"] lowercase so callers can aggregate by ADAPTER_MAP key. Human-facing messages should use proper model capitalization, such as DreamRetro, DMS, or MolBuilder.
For the shared exception shape, wrapping rules, and workflow-level failure accounting, see Error Handling.
An adapter may raise these expected errors from cast:
| exception | code | when it is raised | caller policy |
|---|---|---|---|
AdapterSchemaError |
adapter.schema_invalid |
the raw route payload or entry does not match the adapter's declared input schema | workflow records the failure and continues |
AdapterLogicError |
adapter.target_mismatch |
a transformed route root does not match the benchmark target | adapter logs/skips that route when other routes may still be usable |
AdapterLogicError |
adapter.cycle_detected |
route graph revisits the same molecule in one path | adapter logs/skips that route when the failure is route-local |
AdapterLogicError |
adapter.node_type_invalid |
a bipartite graph node has the wrong role, such as molecule where reaction is required | adapter logs/skips that route when the failure is route-local |
AdapterLogicError |
adapter.node_missing |
graph edges reference a node absent from the raw lookup tables | adapter logs/skips that route when the failure is route-local |
AdapterLogicError |
adapter.route_string_empty |
a string-based route payload is empty after parsing | adapter logs/skips that route when the failure is route-local |
AdapterLogicError |
adapter.route_string_invalid |
a string-based route payload has malformed step boundaries or missing reactants/products | adapter logs/skips that route when the failure is route-local |
AdapterLogicError |
adapter.route_transform_failed |
fallback for route-local transform failures that do not fit a narrower code | adapter logs/skips that route when the failure is route-local |
UnsupportedAdapterFeatureError |
adapter.unsupported_feature |
a valid request asks for a feature the adapter does not support | caller should treat this as a fatal configuration/request failure |
ChemError |
chem.invalid_smiles, chem.runtime_error |
raw route molecules cannot be canonicalized or processed by RDKit | adapter logs/skips that route when the failure is route-local |
Adapter resolution happens before cast and raises AdapterResolutionError (adapter.unknown or adapter.resolution_missing) when the CLI or manifest cannot select an adapter.
Use retrocast.adapters.errors.adapter_schema_error() and adapter_target_mismatch() where they fit; they keep messages, codes, and context consistent.
During ingest, workflow boundaries aggregate target-level AdapterError and ChemError failures by error.code. In the ingest CLI path, that summary is persisted in manifest statistics.failures_by_code.
Adapter Error Examples¶
These examples use tiny fake payloads to show what each route-local code means.
adapter.cycle_detected: the route graph loops back to a molecule already in the same path.
{
"smiles": "CCO",
"children": [
{
"smiles": "CC=O",
"children": [{"smiles": "CCO", "children": []}]
}
]
}
adapter.node_type_invalid: a graph slot contains the wrong kind of node. Bipartite adapters expect molecule -> reaction -> molecule.
adapter.node_missing: an edge/reference points to an id or SMILES key that is absent from the raw lookup tables.
{
"uuid2smiles": {"root": "CCO", "rxn1": "CC=O>>CCO"},
"node_dict": {"CCO": {"type": "chemical", "smiles": "CCO"}},
"pathways": [[{"source": "root", "target": "rxn1"}]]
}
Here rxn1 resolves to CC=O>>CCO, but node_dict["CC=O>>CCO"] is missing.
adapter.route_string_empty: a string-based route field is empty after trimming/splitting.
adapter.route_string_invalid: the route string exists, but its grammar is malformed.
RetroStar and DreamRetro-style route steps need product, metadata/reagents, and reactants, e.g. CCO>0.9>CC=O.[H][H].
adapter.target_mismatch: the adapter produced a valid route, but the root is not the benchmark target.
adapter.schema_invalid: the raw payload does not match the adapter's top-level expected schema.
If routes must be a string, this is schema-invalid.
adapter.route_transform_failed: fallback for route-local transformation failures that do not fit a sharper bucket. If this shows up often in manifests or logs, promote the recurring case to a dedicated code.
1. Define Pydantic Schemas¶
Always define Pydantic models for the raw input format. This separates validation logic from transformation logic and ensures bad data is rejected early.
2. Canonicalization¶
RetroCast relies on exact SMILES matching.
- Always canonicalize raw SMILES using
retrocast.chem.canonicalize_smiles - The standard helpers (
build_molecule_...) do this automatically - If writing a custom builder, you must call it explicitly
- Always check that the root of your built tree matches
target.smiles
3. Cycle Detection¶
Retrosynthetic graphs must be acyclic trees.
- The standard helpers include cycle detection (raising
AdapterLogicErrorwithadapter.cycle_detectedif a node appears twice in a path) - If writing a custom builder, maintain a
visitedset during recursion
4. Metadata¶
Do not discard model-specific data (scores, template IDs, etc.). Store it in the metadata dictionary of the Molecule or ReactionStep. This data is preserved throughout the pipeline and can be used for custom analysis later.
Registration¶
Once your adapter logic is written, you must register it so the CLI can find it.
Testing Your Adapter¶
RetroCast provides a strict test harness to ensure your adapter behaves correctly. Create a test file inheriting from BaseAdapterTest.
from tests.adapters.test_base_adapter import BaseAdapterTest
from retrocast.adapters.my_adapter import MyAdapter
class TestMyAdapter(BaseAdapterTest):
@pytest.fixture
def adapter_instance(self):
return MyAdapter()
@pytest.fixture
def raw_valid_route_data(self):
# Return a sample JSON blob that represents a valid output
return {"smiles": "CCO", "tree": ...}
@pytest.fixture
def raw_unsuccessful_run_data(self):
# Return data representing a failed prediction (empty list, success=False, etc.)
return {"success": False}
@pytest.fixture
def raw_invalid_schema_data(self):
# Return malformed data that should fail Pydantic validation
return {"malformed": True}
# ... implement target identity fixtures ...
Run the tests using pytest:
What the test suite validates
The BaseAdapterTest automatically verifies that your adapter:
- Correctly parses valid data
- Rejects invalid schemas with
AdapterSchemaError - Correctly identifies target mismatches
- Handles failed predictions gracefully