Try the tool for free

Try it
Domain 1 · Discrimination & Toxicity

1.2Exposure to toxic content

AI exposing users to harmful, abusive, unsafe or inappropriate content. May involve AI creating, describing, providing advice, or encouraging action. Examples of toxic content include hate-speech, violence, extremism, illegal acts, child sexual abuse material, as well as content that violates community norms such as profanity, inflammatory political speech, or pornography.

Applicable legal frameworks

Québec

Article 10.1 (harcèlement), article 5 (vie privée)

Quebec quasi-constitutional law prohibiting discrimination based on protected grounds. Relevant for AI system biases in hiring, credit granting, housing, and services.

International

NIST AI RMF 1.0Recommandation

Manage 4.1 (suivi post-déploiement)

Voluntary AI risk management framework structured around four functions: Govern, Map, Measure, Manage. A common reference in AI governance.

UE

AI Act (European Union)Si exposition UE

Article 50 (transparence des contenus générés)

European regulation establishing a harmonized framework for AI, based on a risk-based approach (unacceptable, high, limited, minimal risk). Relevant for Quebec organizations doing business in the EU.

Quebec sector examples

Services publics

Services publicsVille ou MRC

Un agent conversationnel municipal génère des réponses contenant des stéréotypes ou du langage inapproprié pour certains groupes en raison d'un filtrage insuffisant.

Éducation

ÉducationCégep, commission scolaire

Un assistant pédagogique IA déployé dans un cégep produit ponctuellement du contenu inapproprié à destination de mineurs lorsque détourné par des prompts adverses.

Recommended mitigations

  • 2.4Content Safety Controls

    Technical systems and processes that detect, filter, and label AI-generated content to identify misuse and enable content provenance tracking.

  • 3.1Testing and Audits

    Systematic internal and external evaluations that examine AI systems, infrastructure, and compliance processes to identify risks, verify safety, and ensure performance meets standards.

  • 3.3Access Management

    Operational policies and verification systems that govern who can use AI systems and for what purposes, to prevent safety circumvention, deliberate misuse, and deployment in high-risk contexts.

  • 3.5Post-Deployment Monitoring

    Processes for continuous monitoring of AI behavior, user interactions, and societal impacts after deployment to detect misuse, emerging dangerous capabilities, and harmful effects.

  • 4.2Risk Disclosure

    Formal reporting protocols and notification systems that communicate information on risks, mitigation plans, safety assessments, and significant AI-related activities to enable external oversight and inform stakeholders.

Documented risks (116)

Entries from the AI Risk Repository (MIT) classified under this subdomain. Original content in English.

Entity
Intent
Timing

116 entries

Risk CategoryCui2024

02.01.00Harmful Content

"The LLM-generated content sometimes contains biased, toxic, and private information"

AIIntentionalPost-deployment
Risk Sub-CategoryCui2024

02.01.02Toxicity

"Toxicity means the generated content contains rude, disrespectful, and even illegal information"

AIIntentionalPost-deployment
Risk Sub-CategoryCui2024

02.08.01Toxic Training Data

"Following previous studies [96], [97], toxic data in LLMs is defined as rude, disrespectful, or unreasonable language that is opposite to a polite, positive, and healthy language environment, including hate speech, offensive utterance, profanities, and threats [91]."

AIIntentionalPre-deployment
Risk CategoryCui2024

02.11.00Not-Suitable-for-Work (NSFW) Prompts

"Inputting a prompt contain an unsafe topic (e.g., notsuitable-for-work (NSFW) content) by a benign user. "

HumanIntentionalPost-deployment
Risk CategoryDeng2023

04.01.00Toxicity and Abusive Content

This typically refers to rude, harmful, or inappropriate expressions.

OtherOtherPost-deployment
Risk CategoryDeng2023

04.04.00Controversial Opinions

The controversial views expressed by large models are also a widely discussed concern. Bang et al. (2021) evaluated several large models and found that they occasionally express inappropriate or extremist views when discussing political top-ics. Furthermore, models like ChatGPT (OpenAI, 2022) that claim political neutrality and aim to provide objective information for users have been shown to exhibit notable left-leaning political biases in areas like economics, social policy, foreign affairs, and civil liberties.

AIOtherPost-deployment
Risk CategoryHagendorff2024

05.03.00Harmful Content - Toxicity

Generating unethical, fraudulent, toxic, violent, pornographic, or other harmful content is a further predominant concern, again focusing notably on LLMs and text-to-image models. Numerous studies highlight the risks associated with the intentional creation of disinformation, fake news, propaganda, or deepfakes, underscoring their significant threat to the integrity of public discourse and the trust in credible media. Additionally, papers explore the potential for generative models to aid in criminal activities, incidents of self-harm, identity theft, or impersonation. Furthermore, the literature investigates risks posed by LLMs when generating advice in high-stakes domains such as health, safety-related issues, as well as legal or financial matters.

HumanIntentionalPost-deployment
Risk Sub-CategorySolaiman2023

13.01.02Cultural Values and Sensitive Content

"Cultural values are specific to groups and sensitive content is normative. Sensitive topics also vary by culture and can include hate speech, which itself is contingent on cultural norms of acceptability."

AIIntentionalPost-deployment
Risk CategoryWeidinger2022

16.01.00Risk area 1: Discrimination, Hate speech and Exclusion

"Speech can create a range of harms, such as promoting social stereotypes that perpetuate the derogatory representation or unfair treatment of marginalised groups [22], inciting hate or violence [57], causing profound offence [199], or reinforcing social norms that exclude or marginalise identities [15,58]. LMs that faithfully mirror harmful language present in the training data can reproduce these harms. Unfair treatment can also emerge from LMs that perform better for some social groups than others [18]. These risks have been widely known, observed and documented in LMs. Mitigation approaches include more inclusive and representative training data and model fine-tuning to datasets that counteract common stereotypes [171]. We now explore these risks in turn."

AIIntentionalOther
Risk Sub-CategoryWeidinger2022

16.01.02Hate speech and offensive language

"LMs may generate language that includes profanities, identity attacks, insults, threats, language that incites violence, or language that causes justified offence as such language is prominent online [57, 64, 143,191]. This language risks causing offence, psychological harm, and inciting hate or violence."

AIIntentionalPost-deployment
Risk Sub-CategoryWeidinger2021

17.01.03Toxic language

"LM’s may predict hate speech or other language that is “toxic”. While there is no single agreed definition of what constitutes hate speech or toxic speech (Fortuna and Nunes, 2018; Persily and Tucker, 2020; Schmidt and Wiegand, 2017), proposed definitions often include profanities, identity attacks, sleights, insults, threats, sexually explicit content, demeaning language, language that incites violence, or ‘hostile and malicious language targeted at a person or group because of their actual or perceived innate characteristics’ (Fortuna and Nunes, 2018; Gorwa et al., 2020; PerspectiveAPI)"

AIIntentionalPost-deployment
Risk Sub-CategoryWeidinger2023

18.01.03Toxic content

"Generating content that violates community standards, including harming or inciting hatred or violence against individuals and groups (e.g. gore, child sexual abuse material, profanities, identity attacks)"

AIIntentionalPost-deployment
Risk CategoryVidgen2024

23.01.00Violent crimes

"This category addresses responses that enable, encourage, or endorse the commission of violent crimes."

AIOtherPost-deployment
Risk Sub-CategoryVidgen2024

23.01.01Mass violence

Risk Sub-CategoryVidgen2024

23.01.02Murder

Risk Sub-CategoryVidgen2024

23.01.03Physical assault against a person

Risk Sub-CategoryVidgen2024

23.01.04Violent domestic abuse

Risk Sub-CategoryVidgen2024

23.01.05Terror (Terror groups, Terror actors, Terrorist actions)

Risk CategoryVidgen2024

23.02.00Non-violent crimes

"This category addresses responses that enable, encourage, or endorse the commission of non-violent crimes."

AIOtherPost-deployment
Risk Sub-CategoryVidgen2024

23.02.01Theft

Evaluate this risk for your use case

Our risk evaluation wizard is coming soon.

Ce site utilise des cookies essentiels et fonctionnels pour améliorer votre expérience. Politique de confidentialité