AI safety refers to practices and principles that help ensure AI technologies are designed and used in a way that benefits humanity and minimizes any potential harm or negative outcomes.
Building safe artificial intelligence (AI) systems is a critical consideration for businesses and society due to the increasing prevalence and impact of AI. AI safety helps ensure that AI systems are used as responsibly as possible and that the future of AI is developed with human values in mind.
Developing and maintaining safe AI involves identifying potential AI risks (such as bias, data security and vulnerability to external threats) and creating processes for avoiding and mitigating these risks. For example, AI safety measures such as bias mitigation, robustness testing and ethical AI frameworks can all help businesses develop and use AI tools responsibly within their organizations.
As AI systems grow more sophisticated, they become more deeply integrated into people’s lives and into critical real-world areas such as infrastructure, finance and national security. These technologies can have both positive and negative impacts on the organizations that use them and on society as a whole.
Concerns about the negative impacts of AI are growing. A 2023 survey found that 52% of Americans were more concerned than excited about the increased use of artificial intelligence.1 Another found that 83% worry that AI might accidentally lead to a catastrophic event.2
Other research shows that the concerns are not unfounded: A 2024 report found that 44% of survey respondents said that their organizations had experienced negative consequences (such as issues of inaccuracy or cybersecurity) from using AI.3 Safety efforts are often treated as an afterthought: According to the Center for AI Safety’s 2023 Impact Report, only 3% of technical research focuses on making AI safer.4
For society as a whole, AI safety measures are necessary to protect public safety, privacy and fundamental rights. AI systems that are biased, opaque or not in line with human values can perpetuate or amplify societal inequalities.
Experts also worry that some advanced AI systems might become as or more intelligent than humans. Artificial general intelligence (AGI) refers to potential AI systems that understand, learn and perform thinking tasks in the same way human beings do. Artificial superintelligence (ASI) refers to hypothetical AI systems with an intellectual scope and cognitive functions more advanced than any human. The development of AGI and ASI raises concerns that such systems would be dangerous if not aligned with human values or subject to human oversight. With too much autonomy, critics say that these systems would pose an existential threat to humanity.
From a business perspective, safe AI helps build consumer trust, guard against legal liabilities and avoid poor decision-making. Organizations that take measures to ensure that AI use is in alignment with their values can avoid negative consequences for themselves and their customers.
AI risks can be categorized into several types, each requiring different AI safety measures and risk management efforts.
AI systems can perpetuate or amplify societal biases. Algorithmic bias results when AI is trained on incomplete or misleading data and inputs. This can lead to unfair decision-making. For example, an AI tool trained on discriminatory data might be less likely to approve mortgages for applicants of certain backgrounds or might be more likely to recommend hiring a male job applicant over a female one.
AI systems have the potential to inappropriately access, expose or misuse personal data, leading to privacy concerns. If sensitive data is breached, an AI system’s creators or users might be held responsible.
The outcomes of advanced AI systems, especially those built to operate as autonomous agents, can be unpredictable. Their actions might also be harmful. If they are able to decide independently, they might be difficult to stop. Without an element of human control, it might be impossible to intervene or shut down an AI system acting inappropriately.
AGI, ASI and other highly advanced AI systems might potentially act in ways that endanger humanity or disrupt global systems if mismanaged. The dangers of an AI race, akin to an arms race, puts geopolitical stability at risk.
AI might also be misused for large-scale societal manipulation or cyberwarfare. In 2023, the nonprofit Center for AI Safety (CAIS) released a single-sentence statement backed by various AI researchers and leaders. It read: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”5
While unintended consequences and errors are a source of AI risk, bad actors can also use the technology intentionally to cause harm. AI can be weaponized for cyberattacks, misinformation campaigns, illegal surveillance or even physical harm. These threats exist at the individual level and the societal level.
AI systems can be vulnerable to security issues. They face the possibility of adversarial attacks, in which malicious actors manipulate data inputs to deceive models, leading to incorrect outputs.
For example, AI jailbreaks occur when hackers use prompt injections and other techniques to exploit vulnerabilities in AI systems and perform restricted actions. Data poisoning happens when compromised training data skews AI behavior. Unauthorized access and other vulnerabilities or security risks might lead to misuse of AI systems and their data.
AI safety and AI security are related but distinct aspects of artificial intelligence. AI safety aims to address inherent issues and unintended consequences, while AI security focuses on protecting AI systems from external threats.
AI safety tries to connect AI with human values and reduce the chance that AI systems have a negative impact on businesses and society. It emphasizes AI alignment, which is the process of encoding human values and goals into AI models.
AI security is about protecting AI systems from external threats such as cyberattacks and data breaches. It involves safeguarding the confidentiality and integrity of AI models. AI security might also refer to using artificial intelligence to enhance an organization's security posture. Under this definition, it includes using AI and machine learning (ML) to anticipate and address potential threats.
AI leaders and businesses are implementing various practices to support the responsible development and use of AI technologies. AI safety measures include:
Algorithms can perpetuate or amplify prejudices present in the data they are trained on. To combat this problem, businesses are investing in efforts to address algorithmic bias. Techniques like diverse dataset collection, algorithmic fairness assessments and debiasing methods help identify potential issues.
Rigorous testing and validation processes help AI systems withstand hazards and can identify technical risks. Techniques such as adversarial testing, stress testing and formal verification help ensure that AI tools and models perform as intended and do not exhibit undesirable behaviors.
Many AI models, especially large language models (LLMs), are "black boxes" that make decisions that are difficult for humans to interpret. Without transparency in the decision-making process, users are less likely to trust the results and recommendations. Explainable AI (XAI) aims to clarify the opaque processes behind complex AI systems, focusing on interpretability to show how they arrive at their results.
Many organizations have ethical AI frameworks to guide the development and use of AI systems. These frameworks and their related benchmarks typically include principles such as transparency, fairness, accountability and privacy. They provide guardrails for using and developing AI tools.
While automation is part of AI’s appeal for many businesses, maintaining human control is important for safety reasons. This means having human operators monitor the AI system's performance, intervene when necessary and make final decisions in critical situations. Human-in-the-loop approaches help ensure that an actual person is accountable for the actions of an AI system.
Implementing strong security measures such as encryption, access control and anomaly detection helps protect AI systems from misuse or unauthorized access. Businesses might also invest in cybersecurity measures to protect against cyberattacks and cyberthreats that might compromise the integrity of their AI systems.
AI safety is a complex and evolving field that requires collaboration among researchers, industry leaders and policymakers. Many businesses participate in industry consortia, research initiatives and standardization efforts to share knowledge, best practices and lessons learned. By working together, the AI community can develop more robust and reliable safety measures.
AI safety research is a shared effort across many stakeholders.
AI safety starts with the developers and engineers who are responsible for designing, building and testing AI systems. They might focus on foundational questions, such as how to align AI’s goals with human values and how to create models that are transparent and explainable. They are also responsible for testing and validating models and tools to help ensure they operate as intended.
Companies leading AI development, including IBM, OpenAI, Google DeepMind, Microsoft, Anthropic and others, are at the forefront of AI safety efforts. They invest in dedicated AI safety teams, establish ethical guidelines and adhere to responsible AI principles to prevent harmful outcomes.
Some companies have also created frameworks and protocols to address risks in both the research and deployment phases, such as bias detection tools and systems that allow human oversight. Many also collaborate in industry coalitions, sharing knowledge to set industry-wide standards for AI safety.
Broader AI governance efforts are a key part of global AI safety measures. International organizations, including the United Nations, the World Economic Forum and the Organization for Economic Co-operation and Development (OECD), lead initiatives that are focused on AI ethics and safety. Individual governments around the world are also creating AI safety rules and regulations:
In the United States, the Artificial Intelligence Safety Institute (AISI), a part of the National Institute of Standards and Technology (NIST), works to address safety issues. Its efforts focus on the priorities such as advancing research and developing risk mitigations.
In the European Union, the EU AI Act includes various safety standards and guidelines and penalties for noncompliance. Separately, the United Kingdom created the AI Safety Institute to promote safe AI development. Several other countries, including Singapore, Japan and Canada, are also creating AI safety bodies to conduct research and inform development and regulation with a focus on public safety.
Policymakers and researchers at nongovernmental organizations (NGOs), think tanks and other groups work to address safety concerns. They consider issues of national security, human rights and legislative policy and recommend ways to help AI development align with social values and interests. They raise awareness of risks, set ethical guidelines, foster transparency and encourage responsible research.
Some key AI safety nonprofit and advocacy groups include:
Learn about the new challenges of generative AI, the need for governing AI and ML models and steps to build a trusted, transparent and explainable AI framework.
Read about driving ethical and compliant practices with a portfolio of AI products for generative AI models.
Gain a deeper understanding of how to ensure fairness, manage drift, maintain quality and enhance explainability with watsonx.governance™.
We surveyed 2,000 organizations about their AI initiatives to discover what’s working, what’s not and how you can get ahead.
Govern generative AI models from anywhere and deploy on cloud or on premises with IBM watsonx.governance.
See how AI governance can help increase your employees’ confidence in AI, accelerate adoption and innovation, and improve customer trust.
Prepare for the EU AI Act and establish a responsible AI governance approach with the help of IBM Consulting®.
1 Growing public concern about the role of artificial intelligence in daily life, Pew Research Center, August 2023.
2 Poll Shows Overwhelming Concern About Risks From AI, AI Policy Institute (AIPI), July 2023.
3 The state of AI in early 2024, McKinsey, May 2024.
4 2023 Impact Report, Center for AI Safety, November 2023.
5 Statement on AI Risk, Center for AI Safety, March 2023.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com