3rd Party Tools¶
Gene Mapping¶
genbio.toolkit.gene_mapping_api.biomart_gene_mapping ¶
biomart_gene_mapping(gene_ids: list[str], dataset: str = 'hsapiens_gene_ensembl', filter_type: str = 'external_gene_name', host: str = 'www.ensembl.org', include_go: bool = False) -> pd.DataFrame
Map and convert gene identifiers using Ensembl BioMart.
Notes
This function uses the Ensembl BioMart web service to convert between different gene identifier types and retrieve gene annotations. BioMart provides access to comprehensive gene information including genomic coordinates, gene descriptions, and Gene Ontology (GO) annotations.
Common use cases: - Convert between identifier types (e.g., gene symbols to Ensembl IDs) - Retrieve gene annotations and descriptions - Get genomic coordinates (chromosome, start/end positions) - Map genes to Gene Ontology terms - Cross-reference between databases (Ensembl, Entrez, HGNC)
The function queries Ensembl's public BioMart service and returns extended gene information. When include_go=True, GO annotations are included, which will create multiple rows per gene (one for each GO term).
For more information about BioMart and available datasets, visit: https://www.ensembl.org/biomart
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gene_ids
|
list[str]
|
List of gene identifiers to map or convert (e.g., ['TP53', 'BRCA1', 'EGFR']). The identifier type should match the filter_type parameter. |
required |
dataset
|
str
|
Ensembl BioMart dataset name (default: 'hsapiens_gene_ensembl'). Common datasets: - 'hsapiens_gene_ensembl' - Human genes - 'mmusculus_gene_ensembl' - Mouse genes - 'drerio_gene_ensembl' - Zebrafish genes - 'rnorvegicus_gene_ensembl' - Rat genes - 'dmelanogaster_gene_ensembl' - Fruit fly genes - 'celegans_gene_ensembl' - C. elegans genes |
'hsapiens_gene_ensembl'
|
filter_type
|
str
|
Type of input gene identifiers (default: 'external_gene_name'). Common filter types: - 'external_gene_name' - Gene symbols (e.g., 'TP53', 'BRCA1') - 'ensembl_gene_id' - Ensembl gene IDs (e.g., 'ENSG00000141510') - 'entrezgene_id' - NCBI Entrez gene IDs (numeric) - 'hgnc_id' - HGNC IDs (for human genes) - 'uniprot_gn_id' - UniProt gene names |
'external_gene_name'
|
host
|
str
|
Ensembl BioMart host server (default: 'www.ensembl.org'). Alternative mirror servers: - 'useast.ensembl.org' - US East mirror - 'asia.ensembl.org' - Asia mirror The function will automatically fall back to mirrors if primary host fails. |
'www.ensembl.org'
|
include_go
|
bool
|
Include Gene Ontology term IDs in results (default: False). When True, adds 'go_id' column but creates multiple rows per gene (one row for each GO term associated with the gene). When False, returns one row per gene without GO annotations. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
pandas DataFrame with gene mapping and annotation results, containing columns: |
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
When include_go=False (default): Returns one row per gene. |
DataFrame
|
When include_go=True: Genes appear in multiple rows if associated with |
DataFrame
|
multiple GO terms. |
DataFrame
|
Returns empty DataFrame if no matching genes are found in BioMart. |
genbio.toolkit.gene_mapping_api.get_orthology_table ¶
get_orthology_table() -> pd.DataFrame
Load orthology mapping table for human, mouse, marmoset, and rhesus macaque genes.
Notes
This function loads a pre-compiled orthology table containing gene mappings between four species: human, mouse, common marmoset, and rhesus macaque. The table includes Ensembl gene IDs, NCBI gene IDs, and gene symbols for each species where orthologous genes have been identified.
Pulls directly from https://raw.githubusercontent.com/AllenInstitute/GeneOrthology/refs/heads/main/csv/mouse_human_marmoset_macaque_orthologs_20231113.csv
Common use cases: - Convert gene identifiers between model organisms - Find mouse orthologs for human disease genes - Identify conserved genes across primate species - Cross-reference experimental results between species - Filter for genes with established orthologs in specific organisms
Returns:
| Type | Description |
|---|---|
DataFrame
|
pandas DataFrame with orthology mappings containing 14 columns: |
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
|
DataFrame
|
The table contains approximately 18,000 human genes with their orthologs. |
DataFrame
|
Missing ortholog information is represented as NaN in the DataFrame. |
Semantic Scholar¶
genbio.toolkit.semantic_scholar_apis.search_papers ¶
search_papers(query: str, fields: str | None = None, limit: int = 10) -> dict[str, Any]
Search for papers by keyword query.
Notes
This function accesses the Semantic Scholar Academic Graph API paper search endpoint. It returns papers matching the query string, ranked by relevance. The search supports natural language queries and returns paginated results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search query string (e.g., "COVID-19 vaccines", "neural networks"). Supports natural language queries. |
required |
fields
|
str | None
|
Comma-separated list of fields to return (e.g., "paperId,title,authors"). If None, defaults to "paperId,title,authors,year,abstract,url". Available fields include: paperId, externalIds, url, title, abstract, venue, year, referenceCount, citationCount, influentialCitationCount, isOpenAccess, fieldsOfStudy, s2FieldsOfStudy, publicationTypes, publicationDate, journal, authors, citations, references, embedding. |
None
|
limit
|
int
|
Maximum number of results to return (default: 10, max: 100). |
10
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary with the following fields: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
genbio.toolkit.semantic_scholar_apis.get_paper_details ¶
get_paper_details(paper_id: str, fields: str | None = None) -> dict[str, Any]
Get detailed information about a specific paper.
Notes
This function accesses the Semantic Scholar Academic Graph API paper details endpoint. It returns comprehensive information about a single paper identified by its ID. Supports multiple ID formats including S2 paper ID, DOI, ArXiv ID, MAG ID, ACL ID, PubMed ID, and Corpus ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paper_id
|
str
|
Paper identifier in one of the following formats: - S2 paper ID: e.g., "649def34f8be52c8b66281af98ae884c09aef38b" - DOI: e.g., "DOI:10.1038/s41586-020-2012-7" - ArXiv ID: e.g., "ARXIV:2106.15928" - PubMed ID: e.g., "PMID:33268865" - Corpus ID: e.g., "CorpusID:3658586" - MAG ID: e.g., "MAG:112218234" - ACL ID: e.g., "ACL:W12-3903" |
required |
fields
|
str | None
|
Comma-separated list of fields to return. If None, defaults to "paperId,title,authors,year,abstract,url, citationCount,referenceCount,publicationDate". See search_papers() for available fields. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary containing the requested fields for the paper. |
dict[str, Any]
|
Returns None if the paper is not found. |
genbio.toolkit.semantic_scholar_apis.get_paper_citations ¶
get_paper_citations(paper_id: str, fields: str | None = None, limit: int = 100) -> dict[str, Any]
Get citations for a specific paper.
Notes
This function accesses the Semantic Scholar Academic Graph API citations endpoint. It returns papers that cite the specified paper, along with citation contexts (the text snippets where the citation appears).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paper_id
|
str
|
Paper identifier (see get_paper_details for supported formats). |
required |
fields
|
str | None
|
Comma-separated list of fields to return for each citing paper. If None, defaults to "paperId,title,authors,year". See search_papers() for available fields. |
None
|
limit
|
int
|
Maximum number of citations to return (default: 100, max: 1000). |
100
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary with the following fields: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
genbio.toolkit.semantic_scholar_apis.get_paper_references ¶
get_paper_references(paper_id: str, fields: str | None = None, limit: int = 100) -> dict[str, Any]
Get references cited by a specific paper.
Notes
This function accesses the Semantic Scholar Academic Graph API references endpoint. It returns papers that are cited by the specified paper, along with citation contexts (the text snippets where the reference appears).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
paper_id
|
str
|
Paper identifier (see get_paper_details for supported formats). |
required |
fields
|
str | None
|
Comma-separated list of fields to return for each cited paper. If None, defaults to "paperId,title,authors,year". See search_papers() for available fields. |
None
|
limit
|
int
|
Maximum number of references to return (default: 100, max: 1000). |
100
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dictionary with the following fields: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|