At the forefront of responsible AI research, the Calmon Lab at Harvard John A. Paulson School of Engineering and Applied Sciences was tackling one of the most pressing challenges in AI. They were trying to align large language models (LLMs) with human values and safety standards. Their work focused on improving the performance of chain-of-thought (CoT) reasoning in commonly used models, such as DeepSeek-R1 and Llama, by applying inference-time alignment methods.

However, their progress was hindered by infrastructure limitations. The Harvard cluster was overwhelmed with demand, and running state-of-the-art models required access to several NVIDIA H100 GPUs. These delays significantly limited their ability to efficiently experiment on large models, slowing the overall pace of their research.