Skip to content

GenBio Toolkit

Python wrappers for GenBio AIDO Inference Engine, Molecular Design Platform, and 3rd-party APIs.

Quickstart

Prerequisites

  • GCP access to the virtual-lab-01 project
  • gcloud CLI and kubectl installed
  • Python 3.10+

1. Install

git clone git@gitlab.genbio.ai:virtual-cell-system/genbio-sdk.git
cd genbio-sdk
uv venv
source .venv/bin/activate
uv sync --extra dev

Or with pip:

git clone git@gitlab.genbio.ai:virtual-cell-system/genbio-sdk.git
cd genbio-sdk
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

2. Connect to backend services

Set your kubectl context to the dev/staging cluster:

gcloud container clusters get-credentials virtual-lab-test --region us-central1 --project virtual-lab-01

Open terminal tabs and start port-forwards:

# Tab 1 — Inference engine (infra namespace)
kubectl port-forward svc/inference-engine-staging-head-ilb -n infra 8000:8000

# Tab 2 — Structure utils (mdp-dev namespace)
kubectl port-forward svc/structure-utils-api-serve-svc -n mdp-dev 8001:8000

# Tab 3 — Structure generation (mdp-dev namespace)
kubectl port-forward svc/structure-generation-api-serve-svc -n mdp-dev 8002:8000

# Tab 4 — Datalake (datalake-dev namespace)
kubectl port-forward svc/aido-datalake-service -n datalake-dev 8003:80

3. Set environment variables

Copy the example environment file and fill in your keys:

cp .env.example .env
Variable Required Used by
AIDO_INFERENCE_ENGINE_URL Yes All AIDO model APIs except structure generation
AIDO_INFERENCE_ENGINE_API_KEY Yes All AIDO model APIs except structure generation
STRUCTURE_GENERATION_URL Yes Structure generation API
STRUCTURE_UTILS_URL Yes Structure generation API
AIDO_DATALAKE_URL Yes All AIDO Datalake APIs (genbio.data)
OPENAI_API_KEY Yes dna2_track_search
S2_API_KEY No Semantic Scholar APIs (increases rate limits)
export AIDO_INFERENCE_ENGINE_URL=http://localhost:8000
export AIDO_INFERENCE_ENGINE_API_KEY=not-needed-for-port-forward
export STRUCTURE_UTILS_URL=http://localhost:8001
export STRUCTURE_GENERATION_URL=http://localhost:8002
export AIDO_DATALAKE_URL=http://localhost:8003

Note: AIDO_INFERENCE_ENGINE_API_KEY is only required when accessing the inference engine through the public gateway. Port-forwarded connections bypass the gateway, but the SDK still reads the env var — set it to any non-empty value.

Set the variables in your shell:

set -a; source .env; set +a

Usage Examples

Protein embedding

from pprint import pprint
from genbio.toolkit import protein_embedding

result = protein_embedding("AYTNSFTRGVYYPDKVFRSSVLHS")
pprint(result)

Cell embedding

from pprint import pprint
from genbio.toolkit import cell_embedding_small

result = cell_embedding_small("tests/test_assets/temp_adata.h5ad")
pprint(result)

Structure prediction

import py3Dmol
from genbio.toolkit import structure_prediction

result = structure_prediction(
    query_1=(
        "MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG"
        "QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDL"
        "PSRTVDTKQAQDLARSYGIPFIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGC"
        "VKIKKCIIM"
    ),
    query_1_type="proteinChain",
    query_2="C[C@H]1CN(CCN1C2=NC(=O)N(C3=NC(=C(C=C32)F)C4=C(C=CC=C4F)O)C5=C(C=CN=C5C(C)C)C)C(=O)C=C",
    query_2_type="ligand",
)
print(result["model_name"])  # AIDO.StructurePrediction

# Visualize the predicted structure
view = py3Dmol.view(width=600, height=400)
view.addModel(result["output"]["cif_data"], "cif")
view.setStyle({"cartoon": {"color": "spectrum"}})
view.addStyle({"hetflag": True}, {"stick": {"colorscheme": "greenCarbon"}})
view.zoomTo()
view.write_html("structure_prediction.html")

Structure generation

import py3Dmol
from genbio.toolkit import structure_generation

result = structure_generation(
    target_seq=(
        "MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG"
        "QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDL"
        "PSRTVDTKQAQDLARSYGIPFIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGC"
        "VKIKKCIIM"
    ),
    gcp_output_dir="gs://mdp-dev-bucket/structure_diffusion/outputs/test",
)
print(result["successful_samples"])  # 1

# Visualize the generated structure
view = py3Dmol.view(width=600, height=400)
view.addModel(result["cif_string"][0], "cif")
view.setStyle({"cartoon": {"color": "spectrum"}})
view.zoomTo()
view.write_html("structure_generation.html")

Histone signal over a genomic region

from pprint import pprint
from genbio.data import gene_histone

intervals = gene_histone(
    cell_type="K-562",
    mark="H3K27ac",
    chrom="chr7",
    start=5_530_000,
    end=5_560_000,
)
pprint(intervals[:3])
# [{'end': 5530150, 'start': 5530000, 'value': 0.12},
#  {'end': 5530300, 'start': 5530150, 'value': 0.45},
#  ...]

Gene mapping

from pprint import pprint
from genbio.toolkit.gene_mapping_api import biomart_gene_mapping

result = biomart_gene_mapping(gene_ids=["TP53", "BRCA1"])
pprint(result)
from pprint import pprint
from genbio.toolkit.semantic_scholar_apis import search_papers

result = search_papers(query="protein language models", limit=5)
pprint(result)

Available Functions

AIDO Model APIs (genbio.toolkit.aido_models_apis)

Category Function Description
Protein protein_embedding Protein sequence embeddings (AIDO.Protein-16B)
protein_stability Stability prediction for protein sequences
protein_protein_interaction Predict if proteins interact, identify binding sites, and compute residue-residue interaction scores
reactome_pathway_query Reactome pathway prediction from protein sequences
Cell cell_embedding_small Cell embeddings (AIDO.Cell-3M)
cell_embedding_medium Cell embeddings (AIDO.Cell-10M)
cell_embedding_large Cell embeddings (AIDO.Cell-100M)
cell_type_annotation Cell type annotation by tissue
cell_type_annotation_supported_tissues List supported tissues for annotation
cell_age_predictor Predict cell age from expression
embedding_gene_vocab Gene vocabulary for cell embeddings
age_predictor_gene_vocab Gene vocabulary for age predictor
DNA dna_embedding_small DNA sequence embeddings (small model)
dna_embedding_large DNA sequence embeddings (large model)
dna2_flashzoi_rep1 DNA2 FlashZoi track predictions
dna2_track_search Semantic search over DNA2 track descriptions
predict_tracks_v3 DNA3-AG-524K genomic track predictions (human or mouse)
dna_v3_embeddings DNA3-AG-524K sequence embeddings
Tissue tissue_embedding_small Spatial tissue embeddings (AIDO.Tissue-3M)
tissue_embedding_large Spatial tissue embeddings (AIDO.Tissue-60M)
RNA ncrna_embedding Non-coding RNA embeddings
mrna_embedding mRNA/CDS embeddings
rna_translation_efficiency_muscle Translation efficiency prediction (muscle)
rna_secondary_structure RNA secondary structure prediction
rna_protein_abundance_hsapiens Protein abundance prediction (human)
rna_splice_site_query Splice site prediction (600bp sequences)
gsrna_activity_query gsRNA activity prediction (21bp sequences)
Structure structure_prediction 3D structure prediction (protein, protein-ligand)
structure_generation Generate binders for protein target sequences
Interactome interactome_query Gene interactome neighbor search
interactome_query_gene_vocab Gene vocabulary for interactome
Perturbation perturbation_effect_query Perturbation effect prediction
perturbation_effect_query_cell_lines List supported cell lines
knockout_effect_query Gene knockout effect prediction
knockout_effect_query_gene_vocab Gene vocabulary for knockout
knockout_effect_query_readout_genes Readout genes for knockout
gene_knockdown_query In-silico gene knockdown expression prediction

AIDO Datalake APIs (genbio.data)

Category Function Description
Gene gene_histone Raw bigWig histone-mark signal over a genomic interval

Gene Mapping (genbio.toolkit.gene_mapping_api)

Function Description
biomart_gene_mapping Map/convert gene identifiers via Ensembl BioMart
get_orthology_table Orthology table for human, mouse, marmoset, macaque

Semantic Scholar (genbio.toolkit.semantic_scholar_apis)

Function Description
search_papers Search papers by keyword query
get_paper_details Get details for a specific paper
get_paper_citations Get papers that cite a given paper
get_paper_references Get references cited by a given paper

Project Structure

genbio-sdk/
├── src/genbio/
│   ├── toolkit/             # AIDO Inference Engine + MDP wrappers (genbio.toolkit)
│   ├── data/                # AIDO Datalake wrappers (genbio.data)
│   └── utils.py             # Shared HTTP utilities
├── tests/                   # Test suite
├── scripts/
│   ├── api-examples/        # curl examples for inference/MDP endpoints
│   ├── data-examples/       # curl examples for datalake endpoints
│   └── stress-tests/        # Inference engine stress tests
├── pyproject.toml
└── .env.example

Testing

pytest tests/ -v