Try the tool for free

Try it
Domain 2 · Privacy & Security

2.1Compromise of privacy by obtaining, leaking or correctly inferring sensitive information

AI systems that memorize and leak sensitive personal data or infer private information about individuals without their consent. Unexpected or unauthorized sharing of data and information can compromise user expectation of privacy, assist identity theft, or loss of confidential intellectual property.

Applicable legal frameworks

Québec

Articles 4, 5, 7-12, 14, 22 (consentement, finalité, minimisation, conservation, EFVP article 3.3)

Quebec law on the protection of personal information in force since September 22, 2023, regulating the collection, use, disclosure, and retention of personal information by businesses and public bodies. Includes obligations regarding automated decision-making (Article 12.1).

Articles sur la communication des renseignements de santé

Regulates the use, disclosure, and retention of health information in Quebec, including for secondary uses (research, AI in health).

Canada

PIPEDA (Canada)Direct (hors QC)

Annexe 1 - 10 principes équitables

Canadian federal law applicable to private sector organizations for personal information collected in the course of commercial activities. Applies notably outside Quebec.

UE

GDPRSi exposition UE

Articles 5, 6, 9, 25, 32, 35 (DPIA)

European regulation on data protection. Relevant for Quebec organizations processing data of European residents.

AI Act (European Union)Si exposition UE

Articles 10, 26 (qualité des données)

European regulation establishing a harmonized framework for AI, based on a risk-based approach (unacceptable, high, limited, minimal risk). Relevant for Quebec organizations doing business in the EU.

Quebec sector examples

Banque et assurance

Banque et assuranceInstitution financière

Un modèle d'analyse de transactions bancaires régurgite, lors de tests internes, des numéros de compte ou prénoms apparaissant dans son corpus d'entraînement, en violation de l'article 10 de la Loi 25.

Santé et services sociaux

Santé et services sociauxÉtablissement de santé

Un assistant médical IA infère le statut sérologique d'un patient à partir de signaux indirects, exposant des renseignements de santé non communiqués explicitement par la personne concernée.

Services publics

Services publicsVille, MRC, ministère

Une preuve de concept municipale entraîne un modèle sur des courriels citoyens sans réaliser d'évaluation des facteurs relatifs à la vie privée (EFVP) prévue par l'article 3.3 de la Loi 25.

Recommended mitigations

  • 1.1Board Structure and Oversight

    Governance structures and leadership roles that establish senior management accountability for AI safety and risk management.

  • 2.1Model and Infrastructure Security

    Technical and physical safeguards that secure AI models, their weights, and infrastructure to prevent unauthorized access, theft, alteration, and espionage.

  • 3.2Data Governance

    Policies and procedures that frame the responsible acquisition, curation, and use of data to ensure compliance, quality, user privacy, and removal of harmful content.

  • 3.3Access Management

    Operational policies and verification systems that govern who can use AI systems and for what purposes, to prevent safety circumvention, deliberate misuse, and deployment in high-risk contexts.

  • 4.6User Rights and Redress

    Frameworks and procedures that enable users to identify and understand interactions with AI systems, report issues, request explanations, and seek redress or remedy when affected by AI systems.

Documented risks (80)

Entries from the AI Risk Repository (MIT) classified under this subdomain. Original content in English.

Entity
Intent
Timing

80 entries

Risk Sub-CategoryCui2024

02.01.03Privacy Leakage

"Privacy Leakage means the generated content includes sensitive personal information"

AIIntentionalPost-deployment
Risk CategoryCui2024

02.07.00Privacy Leakage

"The model is trained with personal data in the corpus and unintentionally exposing them during the conversation."

AIIntentionalOther
Risk Sub-CategoryCui2024

02.07.01Private Training Data

"As recent LLMs continue to incorporate licensed, created, and publicly available data sources in their corpora, the potential to mix private data in the training corpora is significantly increased. The misused private data, also named as personally identifiable information (PII) [84], [86], could contain various types of sensitive data subjects, including an individual person’s name, email, phone number, address, education, and career. Generally, injecting PII into LLMs mainly occurs in two settings — the exploitation of web-collection data and the alignment with personal humanmachine conversations [87]. Specifically, the web-collection data can be crawled from online sources with sensitive PII, and the personal human-machine conversations could be collected for SFT and RLHF"

HumanIntentionalPre-deployment
Risk Sub-CategoryCui2024

02.07.02Memorization in LLMs

"Memorization in LLMs refers to the capability to recover the training data with contextual prefixes. According to [88]–[90], given a PII entity x, which is memorized by a model F. Using a prompt p could force the model F to produce the entity x, where p and x exist in the training data. For instance, if the string “Have a good day!\n alice@email.com” is present in the training data, then the LLM could accurately predict Alice’s email when given the prompt “Have a good day!\n”."

AIIntentionalPre-deployment
Risk Sub-CategoryCui2024

02.07.03Association in LLMs

"Association in LLMs refers to the capability to associate various pieces of information related to a person. According to [68], [86], given a pair of PII entities (xi , xj ), which is associated by a model F. Using a prompt p could force the model F to produce the entity xj , where p is the prompt related to the entity xi . For instance, an LLM could accurately output the answer when given the prompt “The email address of Alice is”, if the LLM associates Alice with her email “alice@email.com”. L"

AIIntentionalPre-deployment
Risk CategoryCunha2023

03.04.00Privacy and regulation violations

"Some of the broken systems discussed above are also very invasive of people’s privacy, controlling, for instance, the length of someone’s last romantic relationship [51]. More recently, ChatGPT was banned in Italy over privacy concerns and potential violation of the European Union’s (EU) General Data Protection Regulation (GDPR) [52]. The Italian data-protection authority said, “the app had experienced a data breach involving user conversations and payment information.” It also claimed that there was no legal basis to justify “the mass collection and storage of personal data for the purpose of ‘training’ the algorithms underlying the operation of the platform,” among other concerns related to the age of the users [52]. Privacy regulators in France, Ireland, and Germany could follow in Italy’s footsteps [53]. Coincidentally, it has recently become public that Samsung employees have inadvertently leaked trade secrets by using ChatGPT to assist in preparing notes for a presentation and checking and optimizing source code [54, 55]. Another example of testing the ethics and regulatory limits can be found in actions of the facial recognition company Clearview AI, which “scraped the public web—social media, employment sites, YouTube, Venmo—to create a database with three billion images of people, along with links to the webpages from which the photos had come” [56]. Trials of this unregulated database have been offered to individual law enforcement officers who often use it without their department’s approval [57]. In Sweden, such illegal use by the police force led to a fine of e250,000 by the country’s data watchdog [57]."

HumanIntentionalPost-deployment
Risk CategoryDeng2023

04.06.00Privacy and Data Leakage

Large pre-trained models trained on internet texts might contain private information like phone numbers, email addresses, and residential addresses.

AIIntentionalPre-deployment
Risk CategoryHagendorff2024

05.05.00Privacy

Generative AI systems, similar to traditional machine learning methods, are considered a threat to privacy and data protection norms. A major concern is the intended extraction or inadvertent leakage of sensitive or private information from LLMs. To mitigate this risk, strategies such as sanitizing training data to remove sensitive information or employing synthetic data for training are proposed.

OtherOtherOther
Risk CategoryHogenhout2021

06.02.00Loss of privacy

"AI offers the temptation to abuse someone's personal data, for instance to build a profile of them to target advertisements more effectively."

HumanIntentionalPost-deployment
Risk Sub-CategoryMeek2016

09.02.01Privacy

"Face recognition technologies and their ilk pose significant privacy risks [47]. For example, we must consider certain ethical questions like: what data is stored, for how long, who owns the data that is stored, and can it be subpoenaed in legal cases [42]? We must also consider whether a human will be in the loop when decisions are made which rely on private data, such as in the case of loan decisions [37]."

HumanIntentionalPost-deployment
Risk Sub-CategoryShelby2023

11.04.04Privacy violations

Privacy violation occurs when algorithmic systems diminish privacy, such as enabling the undesirable flow of private information [180], instilling the feeling of being watched or surveilled [181], and the collection of data without explicit and informed consent... privacy violations may arise from algorithmic systems making predictive inference beyond what users openly disclose [222] or when data collected and algorithmic inferences made about people in one context is applied to another without the person’s knowledge or consent through big data flows

AIOtherPost-deployment
Risk CategorySherman2023

12.08.00Privacy

"The potential for the AI system to infringe upon individuals' rights to privacy, through the data it collects, how it processes that data, or the conclusions it draws."

AIOtherOther
Risk Sub-CategorySolaiman2023

13.01.04Privacy and Data Protection

"Examining the ways in which generative AI systems providers leverage user data is critical to evaluating its impact. Protecting personal information and personal and group privacy depends largely on training data, training methods, and security measures."

HumanOtherOther
Risk Sub-CategoryTan2022

15.02.04Privacy

The risk of loss or harm from leakage of personal information via the ML system.

AIIntentionalPost-deployment
Risk CategoryWeidinger2022

16.02.00Risk area 2: Information Hazards

"LM predictions that convey true information may give rise to information hazards, whereby the dissemination of private or sensitive information can cause harm [27]. Information hazards can cause harm at the point of use, even with no mistake of the technology user. For example, revealing trade secrets can damage a business, revealing a health diagnosis can cause emotional distress, and revealing private data can violate a person’s rights. Information hazards arise from the LM providing private data or sensitive information that is present in, or can be inferred from, training data. Observed risks include privacy violations [34]. Mitigation strategies include algorithmic solutions and responsible model release strategies."

AIIntentionalPost-deployment
Risk Sub-CategoryWeidinger2022

16.02.01Compromising privacy by leaking sensitive information

"A LM can “remember” and leak private data, if such information is present in training data, causing privacy violations [34]."

AIIntentionalPost-deployment
Risk Sub-CategoryWeidinger2022

16.02.02Compromising privacy or security by correctly inferring sensitive information

Anticipated risk: "Privacy violations may occur at inference time even without an individual’s data being present in the training corpus. Insofar as LMs can be used to improve the accuracy of inferences on protected traits such as the sexual orientation, gender, or religiousness of the person providing the input prompt, they may facilitate the creation of detailed profiles of individuals comprising true and sensitive information without the knowledge or consent of the individual."

AIIntentionalPost-deployment
Risk CategoryWeidinger2021

17.02.00Information Hazards

"Harms that arise from the language model leaking or inferring true sensitive information"

AIIntentionalPost-deployment
Risk Sub-CategoryWeidinger2021

17.02.01Compromising privacy by leaking private infiormation

"By providing true information about individuals’ personal characteristics, privacy violations may occur. This may stem from the model “remembering” private information present in training data (Carlini et al., 2021)."

AIIntentionalPost-deployment
Risk Sub-CategoryWeidinger2021

17.02.02Compromising privacy by correctly inferring private information

"Privacy violations may occur at the time of inference even without the individual’s private data being present in the training dataset. Similar to other statistical models, a LM may make correct inferences about a person purely based on correlational data about other people, and without access to information that may be private about the particular individual. Such correct inferences may occur as LMs attempt to predict a person’s gender, race, sexual orientation, income, or religion based on user input."

AIIntentionalPost-deployment

Evaluate this risk for your use case

Our risk evaluation wizard is coming soon.

Ce site utilise des cookies essentiels et fonctionnels pour améliorer votre expérience. Politique de confidentialité