GenBio Toolkit¶
Python wrappers for GenBio AIDO Inference Engine, Molecular Design Platform, and 3rd-party APIs.
Quickstart¶
Prerequisites¶
- GCP access to the
virtual-lab-01project gcloudCLI andkubectlinstalled- Python 3.10+
1. Install¶
git clone git@gitlab.genbio.ai:virtual-cell-system/genbio-sdk.git
cd genbio-sdk
uv venv
source .venv/bin/activate
uv sync --extra dev
Or with pip:
git clone git@gitlab.genbio.ai:virtual-cell-system/genbio-sdk.git
cd genbio-sdk
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
2. Connect to backend services¶
Set your kubectl context to the dev/staging cluster:
gcloud container clusters get-credentials virtual-lab-test --region us-central1 --project virtual-lab-01
Open terminal tabs and start port-forwards:
# Tab 1 — Inference engine (infra namespace)
kubectl port-forward svc/inference-engine-staging-head-ilb -n infra 8000:8000
# Tab 2 — Structure utils (mdp-dev namespace)
kubectl port-forward svc/structure-utils-api-serve-svc -n mdp-dev 8001:8000
# Tab 3 — Structure generation (mdp-dev namespace)
kubectl port-forward svc/structure-generation-api-serve-svc -n mdp-dev 8002:8000
# Tab 4 — Datalake (datalake-dev namespace)
kubectl port-forward svc/aido-datalake-service -n datalake-dev 8003:80
3. Set environment variables¶
Copy the example environment file and fill in your keys:
cp .env.example .env
| Variable | Required | Used by |
|---|---|---|
AIDO_INFERENCE_ENGINE_URL |
Yes | All AIDO model APIs except structure generation |
AIDO_INFERENCE_ENGINE_API_KEY |
Yes | All AIDO model APIs except structure generation |
STRUCTURE_GENERATION_URL |
Yes | Structure generation API |
STRUCTURE_UTILS_URL |
Yes | Structure generation API |
AIDO_DATALAKE_URL |
Yes | All AIDO Datalake APIs (genbio.data) |
OPENAI_API_KEY |
Yes | dna2_track_search |
S2_API_KEY |
No | Semantic Scholar APIs (increases rate limits) |
export AIDO_INFERENCE_ENGINE_URL=http://localhost:8000
export AIDO_INFERENCE_ENGINE_API_KEY=not-needed-for-port-forward
export STRUCTURE_UTILS_URL=http://localhost:8001
export STRUCTURE_GENERATION_URL=http://localhost:8002
export AIDO_DATALAKE_URL=http://localhost:8003
Note:
AIDO_INFERENCE_ENGINE_API_KEYis only required when accessing the inference engine through the public gateway. Port-forwarded connections bypass the gateway, but the SDK still reads the env var — set it to any non-empty value.
Set the variables in your shell:
set -a; source .env; set +a
Usage Examples¶
Protein embedding¶
from pprint import pprint
from genbio.toolkit import protein_embedding
result = protein_embedding("AYTNSFTRGVYYPDKVFRSSVLHS")
pprint(result)
Cell embedding¶
from pprint import pprint
from genbio.toolkit import cell_embedding_small
result = cell_embedding_small("tests/test_assets/temp_adata.h5ad")
pprint(result)
Structure prediction¶
import py3Dmol
from genbio.toolkit import structure_prediction
result = structure_prediction(
query_1=(
"MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG"
"QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDL"
"PSRTVDTKQAQDLARSYGIPFIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGC"
"VKIKKCIIM"
),
query_1_type="proteinChain",
query_2="C[C@H]1CN(CCN1C2=NC(=O)N(C3=NC(=C(C=C32)F)C4=C(C=CC=C4F)O)C5=C(C=CN=C5C(C)C)C)C(=O)C=C",
query_2_type="ligand",
)
print(result["model_name"]) # AIDO.StructurePrediction
# Visualize the predicted structure
view = py3Dmol.view(width=600, height=400)
view.addModel(result["output"]["cif_data"], "cif")
view.setStyle({"cartoon": {"color": "spectrum"}})
view.addStyle({"hetflag": True}, {"stick": {"colorscheme": "greenCarbon"}})
view.zoomTo()
view.write_html("structure_prediction.html")
Structure generation¶
import py3Dmol
from genbio.toolkit import structure_generation
result = structure_generation(
target_seq=(
"MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG"
"QEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDL"
"PSRTVDTKQAQDLARSYGIPFIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGC"
"VKIKKCIIM"
),
gcp_output_dir="gs://mdp-dev-bucket/structure_diffusion/outputs/test",
)
print(result["successful_samples"]) # 1
# Visualize the generated structure
view = py3Dmol.view(width=600, height=400)
view.addModel(result["cif_string"][0], "cif")
view.setStyle({"cartoon": {"color": "spectrum"}})
view.zoomTo()
view.write_html("structure_generation.html")
Histone signal over a genomic region¶
from pprint import pprint
from genbio.data import gene_histone
intervals = gene_histone(
cell_type="K-562",
mark="H3K27ac",
chrom="chr7",
start=5_530_000,
end=5_560_000,
)
pprint(intervals[:3])
# [{'end': 5530150, 'start': 5530000, 'value': 0.12},
# {'end': 5530300, 'start': 5530150, 'value': 0.45},
# ...]
Gene mapping¶
from pprint import pprint
from genbio.toolkit.gene_mapping_api import biomart_gene_mapping
result = biomart_gene_mapping(gene_ids=["TP53", "BRCA1"])
pprint(result)
Literature search¶
from pprint import pprint
from genbio.toolkit.semantic_scholar_apis import search_papers
result = search_papers(query="protein language models", limit=5)
pprint(result)
Available Functions¶
AIDO Model APIs (genbio.toolkit.aido_models_apis)¶
| Category | Function | Description |
|---|---|---|
| Protein | protein_embedding |
Protein sequence embeddings (AIDO.Protein-16B) |
protein_stability |
Stability prediction for protein sequences | |
protein_protein_interaction |
Predict if proteins interact, identify binding sites, and compute residue-residue interaction scores | |
reactome_pathway_query |
Reactome pathway prediction from protein sequences | |
| Cell | cell_embedding_small |
Cell embeddings (AIDO.Cell-3M) |
cell_embedding_medium |
Cell embeddings (AIDO.Cell-10M) | |
cell_embedding_large |
Cell embeddings (AIDO.Cell-100M) | |
cell_type_annotation |
Cell type annotation by tissue | |
cell_type_annotation_supported_tissues |
List supported tissues for annotation | |
cell_age_predictor |
Predict cell age from expression | |
embedding_gene_vocab |
Gene vocabulary for cell embeddings | |
age_predictor_gene_vocab |
Gene vocabulary for age predictor | |
| DNA | dna_embedding_small |
DNA sequence embeddings (small model) |
dna_embedding_large |
DNA sequence embeddings (large model) | |
dna2_flashzoi_rep1 |
DNA2 FlashZoi track predictions | |
dna2_track_search |
Semantic search over DNA2 track descriptions | |
predict_tracks_v3 |
DNA3-AG-524K genomic track predictions (human or mouse) | |
dna_v3_embeddings |
DNA3-AG-524K sequence embeddings | |
| Tissue | tissue_embedding_small |
Spatial tissue embeddings (AIDO.Tissue-3M) |
tissue_embedding_large |
Spatial tissue embeddings (AIDO.Tissue-60M) | |
| RNA | ncrna_embedding |
Non-coding RNA embeddings |
mrna_embedding |
mRNA/CDS embeddings | |
rna_translation_efficiency_muscle |
Translation efficiency prediction (muscle) | |
rna_secondary_structure |
RNA secondary structure prediction | |
rna_protein_abundance_hsapiens |
Protein abundance prediction (human) | |
rna_splice_site_query |
Splice site prediction (600bp sequences) | |
gsrna_activity_query |
gsRNA activity prediction (21bp sequences) | |
| Structure | structure_prediction |
3D structure prediction (protein, protein-ligand) |
structure_generation |
Generate binders for protein target sequences | |
| Interactome | interactome_query |
Gene interactome neighbor search |
interactome_query_gene_vocab |
Gene vocabulary for interactome | |
| Perturbation | perturbation_effect_query |
Perturbation effect prediction |
perturbation_effect_query_cell_lines |
List supported cell lines | |
knockout_effect_query |
Gene knockout effect prediction | |
knockout_effect_query_gene_vocab |
Gene vocabulary for knockout | |
knockout_effect_query_readout_genes |
Readout genes for knockout | |
gene_knockdown_query |
In-silico gene knockdown expression prediction |
AIDO Datalake APIs (genbio.data)¶
| Category | Function | Description |
|---|---|---|
| Gene | gene_histone |
Raw bigWig histone-mark signal over a genomic interval |
Gene Mapping (genbio.toolkit.gene_mapping_api)¶
| Function | Description |
|---|---|
biomart_gene_mapping |
Map/convert gene identifiers via Ensembl BioMart |
get_orthology_table |
Orthology table for human, mouse, marmoset, macaque |
Semantic Scholar (genbio.toolkit.semantic_scholar_apis)¶
| Function | Description |
|---|---|
search_papers |
Search papers by keyword query |
get_paper_details |
Get details for a specific paper |
get_paper_citations |
Get papers that cite a given paper |
get_paper_references |
Get references cited by a given paper |
Project Structure¶
genbio-sdk/
├── src/genbio/
│ ├── toolkit/ # AIDO Inference Engine + MDP wrappers (genbio.toolkit)
│ ├── data/ # AIDO Datalake wrappers (genbio.data)
│ └── utils.py # Shared HTTP utilities
├── tests/ # Test suite
├── scripts/
│ ├── api-examples/ # curl examples for inference/MDP endpoints
│ ├── data-examples/ # curl examples for datalake endpoints
│ └── stress-tests/ # Inference engine stress tests
├── pyproject.toml
└── .env.example
Testing¶
pytest tests/ -v