Applicable legal frameworks
Québec
Articles 4, 5, 7-12, 14, 22 (consentement, finalité, minimisation, conservation, EFVP article 3.3)
Quebec law on the protection of personal information in force since September 22, 2023, regulating the collection, use, disclosure, and retention of personal information by businesses and public bodies. Includes obligations regarding automated decision-making (Article 12.1).
Articles sur la communication des renseignements de santé
Regulates the use, disclosure, and retention of health information in Quebec, including for secondary uses (research, AI in health).
Canada
Annexe 1 - 10 principes équitables
Canadian federal law applicable to private sector organizations for personal information collected in the course of commercial activities. Applies notably outside Quebec.
UE
Articles 5, 6, 9, 25, 32, 35 (DPIA)
European regulation on data protection. Relevant for Quebec organizations processing data of European residents.
Articles 10, 26 (qualité des données)
European regulation establishing a harmonized framework for AI, based on a risk-based approach (unacceptable, high, limited, minimal risk). Relevant for Quebec organizations doing business in the EU.
Quebec sector examples
Banque et assurance
Un modèle d'analyse de transactions bancaires régurgite, lors de tests internes, des numéros de compte ou prénoms apparaissant dans son corpus d'entraînement, en violation de l'article 10 de la Loi 25.
Santé et services sociaux
Un assistant médical IA infère le statut sérologique d'un patient à partir de signaux indirects, exposant des renseignements de santé non communiqués explicitement par la personne concernée.
Services publics
Une preuve de concept municipale entraîne un modèle sur des courriels citoyens sans réaliser d'évaluation des facteurs relatifs à la vie privée (EFVP) prévue par l'article 3.3 de la Loi 25.
Recommended mitigations
- 1.1Board Structure and Oversight
Governance structures and leadership roles that establish senior management accountability for AI safety and risk management.
- 2.1Model and Infrastructure Security
Technical and physical safeguards that secure AI models, their weights, and infrastructure to prevent unauthorized access, theft, alteration, and espionage.
- 3.2Data Governance
Policies and procedures that frame the responsible acquisition, curation, and use of data to ensure compliance, quality, user privacy, and removal of harmful content.
- 3.3Access Management
Operational policies and verification systems that govern who can use AI systems and for what purposes, to prevent safety circumvention, deliberate misuse, and deployment in high-risk contexts.
- 4.6User Rights and Redress
Frameworks and procedures that enable users to identify and understand interactions with AI systems, report issues, request explanations, and seek redress or remedy when affected by AI systems.
Documented risks (80)
Entries from the AI Risk Repository (MIT) classified under this subdomain. Original content in English.
80 entries
02.01.03Privacy Leakage
"Privacy Leakage means the generated content includes sensitive personal information"
02.07.00Privacy Leakage
"The model is trained with personal data in the corpus and unintentionally exposing them during the conversation."
02.07.01Private Training Data
"As recent LLMs continue to incorporate licensed, created, and publicly available data sources in their corpora, the potential to mix private data in the training corpora is significantly increased. The misused private data, also named as personally identifiable information (PII) [84], [86], could contain various types of sensitive data subjects, including an individual person’s name, email, phone number, address, education, and career. Generally, injecting PII into LLMs mainly occurs in two settings — the exploitation of web-collection data and the alignment with personal humanmachine conversations [87]. Specifically, the web-collection data can be crawled from online sources with sensitive PII, and the personal human-machine conversations could be collected for SFT and RLHF"
02.07.02Memorization in LLMs
"Memorization in LLMs refers to the capability to recover the training data with contextual prefixes. According to [88]–[90], given a PII entity x, which is memorized by a model F. Using a prompt p could force the model F to produce the entity x, where p and x exist in the training data. For instance, if the string “Have a good day!\n alice@email.com” is present in the training data, then the LLM could accurately predict Alice’s email when given the prompt “Have a good day!\n”."
02.07.03Association in LLMs
"Association in LLMs refers to the capability to associate various pieces of information related to a person. According to [68], [86], given a pair of PII entities (xi , xj ), which is associated by a model F. Using a prompt p could force the model F to produce the entity xj , where p is the prompt related to the entity xi . For instance, an LLM could accurately output the answer when given the prompt “The email address of Alice is”, if the LLM associates Alice with her email “alice@email.com”. L"
03.04.00Privacy and regulation violations
"Some of the broken systems discussed above are also very invasive of people’s privacy, controlling, for instance, the length of someone’s last romantic relationship [51]. More recently, ChatGPT was banned in Italy over privacy concerns and potential violation of the European Union’s (EU) General Data Protection Regulation (GDPR) [52]. The Italian data-protection authority said, “the app had experienced a data breach involving user conversations and payment information.” It also claimed that there was no legal basis to justify “the mass collection and storage of personal data for the purpose of ‘training’ the algorithms underlying the operation of the platform,” among other concerns related to the age of the users [52]. Privacy regulators in France, Ireland, and Germany could follow in Italy’s footsteps [53]. Coincidentally, it has recently become public that Samsung employees have inadvertently leaked trade secrets by using ChatGPT to assist in preparing notes for a presentation and checking and optimizing source code [54, 55]. Another example of testing the ethics and regulatory limits can be found in actions of the facial recognition company Clearview AI, which “scraped the public web—social media, employment sites, YouTube, Venmo—to create a database with three billion images of people, along with links to the webpages from which the photos had come” [56]. Trials of this unregulated database have been offered to individual law enforcement officers who often use it without their department’s approval [57]. In Sweden, such illegal use by the police force led to a fine of e250,000 by the country’s data watchdog [57]."
04.06.00Privacy and Data Leakage
Large pre-trained models trained on internet texts might contain private information like phone numbers, email addresses, and residential addresses.
05.05.00Privacy
Generative AI systems, similar to traditional machine learning methods, are considered a threat to privacy and data protection norms. A major concern is the intended extraction or inadvertent leakage of sensitive or private information from LLMs. To mitigate this risk, strategies such as sanitizing training data to remove sensitive information or employing synthetic data for training are proposed.
06.02.00Loss of privacy
"AI offers the temptation to abuse someone's personal data, for instance to build a profile of them to target advertisements more effectively."
09.02.01Privacy
"Face recognition technologies and their ilk pose significant privacy risks [47]. For example, we must consider certain ethical questions like: what data is stored, for how long, who owns the data that is stored, and can it be subpoenaed in legal cases [42]? We must also consider whether a human will be in the loop when decisions are made which rely on private data, such as in the case of loan decisions [37]."
11.04.04Privacy violations
Privacy violation occurs when algorithmic systems diminish privacy, such as enabling the undesirable flow of private information [180], instilling the feeling of being watched or surveilled [181], and the collection of data without explicit and informed consent... privacy violations may arise from algorithmic systems making predictive inference beyond what users openly disclose [222] or when data collected and algorithmic inferences made about people in one context is applied to another without the person’s knowledge or consent through big data flows
12.08.00Privacy
"The potential for the AI system to infringe upon individuals' rights to privacy, through the data it collects, how it processes that data, or the conclusions it draws."
13.01.04Privacy and Data Protection
"Examining the ways in which generative AI systems providers leverage user data is critical to evaluating its impact. Protecting personal information and personal and group privacy depends largely on training data, training methods, and security measures."
15.02.04Privacy
The risk of loss or harm from leakage of personal information via the ML system.
16.02.00Risk area 2: Information Hazards
"LM predictions that convey true information may give rise to information hazards, whereby the dissemination of private or sensitive information can cause harm [27]. Information hazards can cause harm at the point of use, even with no mistake of the technology user. For example, revealing trade secrets can damage a business, revealing a health diagnosis can cause emotional distress, and revealing private data can violate a person’s rights. Information hazards arise from the LM providing private data or sensitive information that is present in, or can be inferred from, training data. Observed risks include privacy violations [34]. Mitigation strategies include algorithmic solutions and responsible model release strategies."
16.02.01Compromising privacy by leaking sensitive information
"A LM can “remember” and leak private data, if such information is present in training data, causing privacy violations [34]."
16.02.02Compromising privacy or security by correctly inferring sensitive information
Anticipated risk: "Privacy violations may occur at inference time even without an individual’s data being present in the training corpus. Insofar as LMs can be used to improve the accuracy of inferences on protected traits such as the sexual orientation, gender, or religiousness of the person providing the input prompt, they may facilitate the creation of detailed profiles of individuals comprising true and sensitive information without the knowledge or consent of the individual."
17.02.00Information Hazards
"Harms that arise from the language model leaking or inferring true sensitive information"
17.02.01Compromising privacy by leaking private infiormation
"By providing true information about individuals’ personal characteristics, privacy violations may occur. This may stem from the model “remembering” private information present in training data (Carlini et al., 2021)."
17.02.02Compromising privacy by correctly inferring private information
"Privacy violations may occur at the time of inference even without the individual’s private data being present in the training dataset. Similar to other statistical models, a LM may make correct inferences about a person purely based on correlational data about other people, and without access to information that may be private about the particular individual. Such correct inferences may occur as LMs attempt to predict a person’s gender, race, sexual orientation, income, or religion based on user input."
Evaluate this risk for your use case
Our risk evaluation wizard is coming soon.