Senior/Staff Applied Scientist, Multimodal Representation Learning (Oncology)
Pathos
Drug development shouldn’t be guesswork, not when patients are waiting.
Pathos is building a next-generation biotech with AI at the core. Not as a feature, but as the operating system for how medicines get developed. We believe most drugs don’t fail because the science was wrong. They fail because they were tested in the wrong patients, with the wrong assumptions, in trials that couldn’t answer the real question: who benefits, and why?
Pathos exists to change that. We’re building the largest foundation model in oncology and pairing it with proprietary AI systems, deep oncology expertise, and 200+ petabytes of multimodal data linked to patient outcomes, so we can make development decisions with more precision, much earlier.
This is not theoretical. We’re well-capitalized and have the leadership to build a generational company. We invest in and advance our own clinical-stage programs, using our AI platform to sharpen trial design, patient selection and biomarker strategy. So therapies reach the patients most likely to benefit, sooner.
If you’re driven by purpose, energized by complexity, and want to apply AI, biology, or both to redefine the future drug development, come build Pathos with us.
About the role:
Where Frontier AI Meets Frontier Biology to Deliver Frontier Medicine
We are hiring specialized scientists to accelerate development of our Oncology Foundation Model (OFM) stack. This is not a generic “model tinkering” role. The person in this seat will help define and build the modeling strategy that turns multimodal oncology data (clinical text/EHR, genomics, transcriptomics, pathology imaging, and derived features) into useful representations and predictive capabilities that directly support drug discovery and development.
You’ll operate at the intersection of:
- Frontier AI (representation learning, multimodal learning, alignment, evaluation)
- Messy biomedical reality (clinical endpoints, censoring, confounding, missingness, batch effects)
- Mechanism + translation (models that can be interrogated, stress-tested, and connected to biology and outcomes)
This role complements (not duplicates) the computational biology roles that focus on our program-facing biomarker analyses and trial decisions.
What You Will Do
Foundation model development
- Design and implement multimodal pretraining and fine-tuning strategies for oncology data (e.g., contrastive objectives, masked modeling, multitask learning, retrieval-augmented training, late/early fusion variants).
- Build model components that improve cross-modality grounding (e.g., aligning clinical narratives with molecular state and pathology signals).
- Develop robust approaches for missing-modality settings (train-time and inference-time), ensuring the OFM remains useful when only subsets of modalities exist.
Clinical + molecular fluency
- Work with domain partners to define prediction targets and representation tests that matter: response, durability, toxicity, survival, progression, resistance, subtype stability, etc.
- Incorporate oncology-specific realities into modeling and evaluation (censoring, treatment lines, temporal leakage, cohort shift, annotation noise).
Evaluation, benchmarking, and scientific rigor
- Create evaluation harnesses that go beyond leaderboard metrics: ablations, cohort-shift tests, missingness stress tests, temporal generalization, calibration, and failure-mode analysis.
- Define and maintain benchmark suites that reflect Pathos priorities and are reproducible across model iterations.
- Partner with engineering to support scalable training/inference (multi-node GPU training, data pipelines, throughput optimization), while keeping scientific intent front-and-center
Translation enablement
- Package model outputs so they can be consumed by internal science teams: embeddings, uncertainty estimates, interpretable signals, retrieval tools, and model cards that clearly state what’s reliable vs. not.
- Collaborate with computational biologists, translational scientists, and clinicians to ensure the OFM supports mechanism discovery and patient stratification workflows
Who You Are
Minimum Qualifications
- Advanced degree (PhD strongly preferred) in ML/AI, CS, Statistics, Computational Biology, Bioinformatics, or a related field, or equivalent industry experience with a strong publication/impact record.
- Deep hands-on experience with modern deep learning (PyTorch), including training large models and debugging optimization issues.
- Demonstrated ability to design representation learning / foundation model approaches and evaluate them rigorously (not just “train and report AUCs”).
- Comfort operating in ambiguous problem spaces with a bias toward execution and iteration.
Strongly Preferred
- Multimodal foundation model experience (any of: clinical + omics, imaging + text, multimodal retrieval, alignment, late fusion/mixture-of-experts).
- Real experience with at least one of the following domains (enough to reason about the data-generating process and pitfalls):
- Clinical text / EHR (notes, longitudinal events, coding systems, leakage traps)
- Molecular/omics modeling (RNA/DNA/variant features, batch effects, multi-cohort generalization)
- Pathology imaging (WSI feature learning, weak supervision, MIL, slide-level endpoints)
Nice to Have
- Distributed training and systems experience (FSDP/DeepSpeed, multi-node performance profiling)
- Experience with alignment methods (preference learning, instruction tuning, evaluation frameworks for reliability/robustness).
- Publications in relevant venues (NeurIPS/ICML/ICLR/ACL/MLHC) and/or impactful open-source work.
Location
This is a hybrid role, requiring up to 3 days per week onsite, in our NYC Headquarters.