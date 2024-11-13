The field of AI is making impressive technological breakthroughs. For example, DeepMind’s AlphaFold 3 can predict molecular structure and interaction with extraordinary accuracy. And OpenAI’s GPT-4o can reason in real time.

Despite these advancements, AI is still not human. AI does not intrinsically care about reason, loyalty or safety. It has one goal: to complete the task for which it was programmed.

Therefore, it is up to AI developers to build in human values and goals. Otherwise, misalignment occurs and AI systems can produce harmful outputs that lead to bias, discrimination and misinformation.

Present-day alignment efforts work to keep weak AI systems in line with human values and goals. But AGI and ASI systems could be exponentially riskier, harder to understand and more difficult to control. Current AI alignment techniques, which rely on human intelligence, are likely inadequate for aligning AI systems that are smarter than humans.

For example, reinforcement learning from human feedback (RLHF) is a machine learning technique in which a “reward model” is trained with direct human feedback. OpenAI used RLHF as its main method to align its GPT-3 and GPT-4 series of models behind ChatGPT, all considered weak AI models. Significantly more advanced alignment techniques will be necessary to help ensure that superintelligent AI systems possess similar levels of robustness, interpretability, controllability and ethicality.