
The announcement marks a landmark public-private effort to expand U.S. research capacity and speed scientific discovery. A combined investment of roughly $152 million supports the Open Multimodal AI Infrastructure to Accelerate Science, known as OMAI.
The project, led by the Allen Institute for AI (Ai2), will deliver openly available models, training data, code, and documentation. NSF contributes $75 million through Mid-Scale Research Infrastructure, while NVIDIA provides $77 million in advanced systems and software, including HGX B300 with Blackwell Ultra GPUs and the NVIDIA AI Enterprise platform.
This partnership lowers barriers to state-of-the-art infrastructure for universities, labs, startups, and independent researchers. The goal is clear: let scientists validate results, reproduce experiments, and combine text, images, graphs, and tables into actionable insight.
The article will next examine leadership roles, the technology backbone, and early application areas that show how this effort translates into real-world progress in science and research.
Inside the NSF-NVIDIA announcement: funding, leadership, and the OMAI project
A combined public grant and private technology pledge sets a new baseline for shared research infrastructure. This effort brings roughly $152 million in total support to launch the Open Multimodal AI Infrastructure (OMAI) project.
Investment breakdown
The national science foundation committed $75 million through its Mid-Scale Research Infrastructure program. Industry added about $77 million in systems, software, and expertise to seed a durable, scalable foundation.
Leadership and roles
The Allen Institute leads OMAI, with Noah A. Smith as principal investigator and Hanna Hajishirzi as co-principal investigator. Academic partners include the University of Washington, University of Hawai‘i at Hilo, University of New Hampshire, and University of New Mexico.
Program context and vision
The Mid-Scale Research Infrastructure program bridges individual grants and national facilities to advance U.S. leadership science goals. The project covers training, deployment, and governance practices to help researchers and scientists use shared systems at scale.
Technology and quotes
Industry-provided HGX B300 systems with Blackwell Ultra GPUs and enterprise software will accelerate training and inference, lowering time-to-insight for scientific teams.
“These investments will secure U.S. leadership in science and technology,” said Brian Stone.
“AI is the engine of modern science that will empower U.S. scientists with ‘limitless intelligence,'” said Jensen Huang.
“Fully open AI is a necessity to enable collaboration and sustain global leadership science,” said Ali Farhadi.
Building an open multimodal infrastructure to accelerate science
This initiative builds a shared stack that makes powerful multimodal tools available to research teams nationwide.
What “open” includes
The project will release a full suite of models, model weights, training data, code, documentation, and developer tools. Open releases enable independent verification, reuse, and community audits. That transparency helps researchers trust results and reproduce experiments.
Technical backbone
The compute layer relies on HGX B300 systems with Blackwell Ultra GPUs and the enterprise software stack to handle large training runs. These b300 systems deliver throughput and reliability needed for heavy scientific workloads.
Multimodal research capability
Multimodal large language models will unify text, images, graphs, and tables. Teams can cross-check experiments, generate visualizations, and link findings to prior literature. Models trained on domain sources will be tuned for scientific tasks.
Access and scaling
Universities, national labs, and startups will get low or no-cost access to remove barriers to adoption. Open models and docs foster trust and community-led improvement while the architecture helps the infrastructure accelerate science by shrinking training cycles.
“Open releases, robust systems, and shared tools form the bedrock of reproducible, data-driven research.”
Scientific impact: applications, training, and collaboration across U.S. institutions
Early OMAI work targets practical science use cases where multimodal systems can shorten the path from idea to experiment. This section outlines where the project will show near-term returns and how teams will work together.
Early focus areas: materials discovery and protein function prediction
Materials discovery benefits when models fuse experimental data, publications, and structure–property links. That integration helps prioritize promising candidates and cut lab cycles.
Protein function prediction improves when models trained on literature, sequences, structures, and assay results generate testable hypotheses. This accelerates hypothesis design and validation.
From challenges to capabilities: addressing weaknesses in current large language models
The project tackles key challenges by emphasizing domain alignment, citation grounding, and evidence tracking within large language systems used in scientific research.
Models trained on vetted, open-access literature show fewer hallucinations and greater interpretability. Scientists can then trace provenance and assess reliability.
“Open releases and rigorous training create tools scientists trust,”
Collaboration across the University of Washington, University of Hawai‘i at Hilo, University of New Hampshire, and University of New Mexico will validate results, share benchmarks, and scale best practices.
Researchers and scientists will gain access to open models, datasets, and tools to embed intelligence into lab workflows. The project aims to measure impact by improved research efficiency, transparent data use, and rigorous cross-institution validation.
Conclusion
OMAI’s launch creates a durable bridge between public investment and cutting-edge compute to speed scientific discovery.
The national science foundation provided $75 million while leading industry added $77 million in systems and software to seed this open multimodal infrastructure. Led by the Allen Institute and partners, the project delivers open models, training data, code, and documentation to help researchers work faster and more transparently.
Purpose-built systems aim to reduce training and inference time for large language workloads. Early wins in materials discovery and protein prediction show how the approach can turn technology and data into real scientific outcomes.
With sustained investment, sound governance, and community contributions, this project positions U.S. science for continued leadership, innovation, and accelerated discovery.