AI-Powered Biomedical
Knowledge Extraction

Horizon Science Lab reads thousands of Alzheimer's disease research papers, extracts entities and relations using LLMs, and generates novel testable hypotheses by finding connections no single paper contains.

How it works

5

Data Sources

PubMed, ClinicalTrials, OpenFDA, bioRxiv, ChEMBL

6

Pipeline Stages

Normalization through Hypothesis Generation

4

LLM Models

GPT-4o, GPT-4o-mini, Embeddings, Clustering

2

Services

Science Lab + Gen5 Intelligence

From Raw Papers to Testable Hypotheses

A fully automated pipeline that reads, understands, and connects biomedical research at scale.

Stage 0

Terminology Normalization

Auto-discovers synonym clusters across 40 years of literature using GPT-4o-mini + embeddings + k-means. Maps evolving terminology to canonical names.

Stage 1

Entity Extraction

GPT-4o-mini identifies compounds, proteins, diseases, biomarkers, genetic variants, adverse events, and trial metadata from paper abstracts.

Stage 2

Entity Resolution

Maps free-text names to canonical IDs (CHEMBL, UniProt, MeSH) using dictionaries, aliases, and Stage 0 synonym mappings. Zero LLM cost.

Stage 3

Relation Extraction

GPT-4o understands context, negation, and causality to extract typed relations: inhibits, treats, fails_to_treat, biomarker_of, risk_factor_for, and more.

Stage 4

Event Emission

Relations become structured events consumed by the Gen5 intelligence layer for anomaly detection, pattern discovery, and automated rule execution.

Stage 3b

Hypothesis Generation

Finds drug repurposing signals, failed mechanism reuse opportunities, and indirect connections. GPT-4o generates mechanistic pathways with testable predictions.

Gen5 Intelligence Layer

A second service sits downstream, applying real-time analytics on the research event stream and feeding pattern signals back into hypothesis generation.

Anomaly Detection

4-metric z-score analysis with sigmoid scoring detects unusual activity in the research event stream.

Pattern Discovery

AI-powered k-means clustering on event embeddings finds recurring signals across research domains.

Prediction Engine

Ensemble forecasting (linear regression, moving average, exponential smoothing) on event trends.

Rule Engine

AI-generated rules with 3-tier promotion system: experimental, stable, production. Self-healing health monitor.

Gen5 patterns feed back into Science Lab as a 4th hypothesis trigger — when entities statistically cluster together in the event stream but have never been directly linked in any paper, the system generates a hypothesis explaining why.

The Feedback Loop

Science Lab extracts relations and emits events. Gen5 consumes those events and discovers patterns. Science Lab reads Gen5's patterns as hypothesis triggers. New hypotheses lead to new relations, new events — the cycle continues. Together they find connections that neither could find alone.

Science LabEventsGen5 PatternsHypotheses