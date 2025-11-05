i. Harm

It refers to the potential of a user input or model’s output to cause harm—either directly to a person, group or system, or indirectly through misinformation or bias.

For example, how to create a harmful chemical

ii. Social bias

An unfair treatment of people based on their identity, background or personal traits such as someone’s abilities just because of their gender or ethnicity.

For example, people from rural areas are uneducated or a loan approval model rejects more applications from certain ethnic groups due to biased training data.

iii. Violence

Any content that supports or instructs harmful actions against individuals, groups or property. It includes acts such as encouraging dangerous acts.

For example, bully a classmate.

iv. Personal information

The potential for adverse consequences can arise from the unauthorized access, disclosure, alteration or destruction of personally identifiable Information (PII). This data can include name, address, contact details.

v. Groundedness or hallucination

When a model generates output that is factually incorrect, logically inconsistent or unrelated to the input prompt.

For example,

Prompt: Tell me about Paris.

Output: Paris is a city underwater ruled by dolphins.