Applicable legal frameworks
Québec
Article 10.1 (harcèlement), article 5 (vie privée)
Quebec quasi-constitutional law prohibiting discrimination based on protected grounds. Relevant for AI system biases in hiring, credit granting, housing, and services.
International
Manage 4.1 (suivi post-déploiement)
Voluntary AI risk management framework structured around four functions: Govern, Map, Measure, Manage. A common reference in AI governance.
UE
Article 50 (transparence des contenus générés)
European regulation establishing a harmonized framework for AI, based on a risk-based approach (unacceptable, high, limited, minimal risk). Relevant for Quebec organizations doing business in the EU.
Quebec sector examples
Services publics
Un agent conversationnel municipal génère des réponses contenant des stéréotypes ou du langage inapproprié pour certains groupes en raison d'un filtrage insuffisant.
Éducation
Un assistant pédagogique IA déployé dans un cégep produit ponctuellement du contenu inapproprié à destination de mineurs lorsque détourné par des prompts adverses.
Recommended mitigations
- 2.4Content Safety Controls
Technical systems and processes that detect, filter, and label AI-generated content to identify misuse and enable content provenance tracking.
- 3.1Testing and Audits
Systematic internal and external evaluations that examine AI systems, infrastructure, and compliance processes to identify risks, verify safety, and ensure performance meets standards.
- 3.3Access Management
Operational policies and verification systems that govern who can use AI systems and for what purposes, to prevent safety circumvention, deliberate misuse, and deployment in high-risk contexts.
- 3.5Post-Deployment Monitoring
Processes for continuous monitoring of AI behavior, user interactions, and societal impacts after deployment to detect misuse, emerging dangerous capabilities, and harmful effects.
- 4.2Risk Disclosure
Formal reporting protocols and notification systems that communicate information on risks, mitigation plans, safety assessments, and significant AI-related activities to enable external oversight and inform stakeholders.
Documented risks (116)
Entries from the AI Risk Repository (MIT) classified under this subdomain. Original content in English.
116 entries
02.01.00Harmful Content
"The LLM-generated content sometimes contains biased, toxic, and private information"
02.01.02Toxicity
"Toxicity means the generated content contains rude, disrespectful, and even illegal information"
02.08.01Toxic Training Data
"Following previous studies [96], [97], toxic data in LLMs is defined as rude, disrespectful, or unreasonable language that is opposite to a polite, positive, and healthy language environment, including hate speech, offensive utterance, profanities, and threats [91]."
02.11.00Not-Suitable-for-Work (NSFW) Prompts
"Inputting a prompt contain an unsafe topic (e.g., notsuitable-for-work (NSFW) content) by a benign user. "
04.01.00Toxicity and Abusive Content
This typically refers to rude, harmful, or inappropriate expressions.
04.04.00Controversial Opinions
The controversial views expressed by large models are also a widely discussed concern. Bang et al. (2021) evaluated several large models and found that they occasionally express inappropriate or extremist views when discussing political top-ics. Furthermore, models like ChatGPT (OpenAI, 2022) that claim political neutrality and aim to provide objective information for users have been shown to exhibit notable left-leaning political biases in areas like economics, social policy, foreign affairs, and civil liberties.
05.03.00Harmful Content - Toxicity
Generating unethical, fraudulent, toxic, violent, pornographic, or other harmful content is a further predominant concern, again focusing notably on LLMs and text-to-image models. Numerous studies highlight the risks associated with the intentional creation of disinformation, fake news, propaganda, or deepfakes, underscoring their significant threat to the integrity of public discourse and the trust in credible media. Additionally, papers explore the potential for generative models to aid in criminal activities, incidents of self-harm, identity theft, or impersonation. Furthermore, the literature investigates risks posed by LLMs when generating advice in high-stakes domains such as health, safety-related issues, as well as legal or financial matters.
13.01.02Cultural Values and Sensitive Content
"Cultural values are specific to groups and sensitive content is normative. Sensitive topics also vary by culture and can include hate speech, which itself is contingent on cultural norms of acceptability."
16.01.00Risk area 1: Discrimination, Hate speech and Exclusion
"Speech can create a range of harms, such as promoting social stereotypes that perpetuate the derogatory representation or unfair treatment of marginalised groups [22], inciting hate or violence [57], causing profound offence [199], or reinforcing social norms that exclude or marginalise identities [15,58]. LMs that faithfully mirror harmful language present in the training data can reproduce these harms. Unfair treatment can also emerge from LMs that perform better for some social groups than others [18]. These risks have been widely known, observed and documented in LMs. Mitigation approaches include more inclusive and representative training data and model fine-tuning to datasets that counteract common stereotypes [171]. We now explore these risks in turn."
16.01.02Hate speech and offensive language
"LMs may generate language that includes profanities, identity attacks, insults, threats, language that incites violence, or language that causes justified offence as such language is prominent online [57, 64, 143,191]. This language risks causing offence, psychological harm, and inciting hate or violence."
17.01.03Toxic language
"LM’s may predict hate speech or other language that is “toxic”. While there is no single agreed definition of what constitutes hate speech or toxic speech (Fortuna and Nunes, 2018; Persily and Tucker, 2020; Schmidt and Wiegand, 2017), proposed definitions often include profanities, identity attacks, sleights, insults, threats, sexually explicit content, demeaning language, language that incites violence, or ‘hostile and malicious language targeted at a person or group because of their actual or perceived innate characteristics’ (Fortuna and Nunes, 2018; Gorwa et al., 2020; PerspectiveAPI)"
18.01.03Toxic content
"Generating content that violates community standards, including harming or inciting hatred or violence against individuals and groups (e.g. gore, child sexual abuse material, profanities, identity attacks)"
23.01.00Violent crimes
"This category addresses responses that enable, encourage, or endorse the commission of violent crimes."
23.01.01Mass violence
23.01.02Murder
23.01.03Physical assault against a person
23.01.04Violent domestic abuse
23.01.05Terror (Terror groups, Terror actors, Terrorist actions)
23.02.00Non-violent crimes
"This category addresses responses that enable, encourage, or endorse the commission of non-violent crimes."
23.02.01Theft
Evaluate this risk for your use case
Our risk evaluation wizard is coming soon.