Product
May 9, 2024

Testing LLMs for Data Leakage Vulnerabilities with DynamoEval

Make sure that your LLMs are safe! rest for vulnerabilities involving data leakage using DynamoEval. Use our practical strategies to safe your data.

Testing LLMs for Data Leakage Vulnerabilities with DynamoEval

Low-code tools are going mainstream

Purus suspendisse a ornare non erat pellentesque arcu mi arcu eget tortor eu praesent curabitur porttitor ultrices sit sit amet purus urna enim eget. Habitant massa lectus tristique dictum lacus in bibendum. Velit ut viverra feugiat dui eu nisl sit massa viverra sed vitae nec sed. Nunc ornare consequat massa sagittis pellentesque tincidunt vel lacus integer risu.

  1. Vitae et erat tincidunt sed orci eget egestas facilisis amet ornare
  2. Sollicitudin integer  velit aliquet viverra urna orci semper velit dolor sit amet
  3. Vitae quis ut  luctus lobortis urna adipiscing bibendum
  4. Vitae quis ut  luctus lobortis urna adipiscing bibendum

Multilingual NLP will grow

Mauris posuere arcu lectus congue. Sed eget semper mollis felis ante. Congue risus vulputate nunc porttitor dignissim cursus viverra quis. Condimentum nisl ut sed diam lacus sed. Cursus hac massa amet cursus diam. Consequat sodales non nulla ac id bibendum eu justo condimentum. Arcu elementum non suscipit amet vitae. Consectetur penatibus diam enim eget arcu et ut a congue arcu.

Vitae quis ut  luctus lobortis urna adipiscing bibendum

Combining supervised and unsupervised machine learning methods

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

  • Dolor duis lorem enim eu turpis potenti nulla  laoreet volutpat semper sed.
  • Lorem a eget blandit ac neque amet amet non dapibus pulvinar.
  • Pellentesque non integer ac id imperdiet blandit sit bibendum.
  • Sit leo lorem elementum vitae faucibus quam feugiat hendrerit lectus.
Automating customer service: Tagging tickets and new era of chatbots

Vitae vitae sollicitudin diam sed. Aliquam tellus libero a velit quam ut suscipit. Vitae adipiscing amet faucibus nec in ut. Tortor nulla aliquam commodo sit ultricies a nunc ultrices consectetur. Nibh magna arcu blandit quisque. In lorem sit turpis interdum facilisi.

“Nisi consectetur velit bibendum a convallis arcu morbi lectus aecenas ultrices massa vel ut ultricies lectus elit arcu non id mattis libero amet mattis congue ipsum nibh odio in lacinia non”
Detecting fake news and cyber-bullying

Nunc ut facilisi volutpat neque est diam id sem erat aliquam elementum dolor tortor commodo et massa dictumst egestas tempor duis eget odio eu egestas nec amet suscipit posuere fames ded tortor ac ut fermentum odio ut amet urna posuere ligula volutpat cursus enim libero libero pretium faucibus nunc arcu mauris sed scelerisque cursus felis arcu sed aenean pharetra vitae suspendisse ac.

Recent studies have shown that large language models can memorize and later emit verbatim text from their training data when prompted. This poses potential privacy risks and legal liability, as the training data may contain sensitive, copyrighted, or personally identifiable information. Real world examples of commercial AI offerings generating copyrighted or non-distributable data have already resulted in legal action. Even as model providers attempt to patch data extraction issues like the vulnerabilities that DeepMind has publicly identified, model deployers may need to continuously patch this vulnerability as new attacks are identified by enterprises. 

We often speak to enterprises that are concerned about productionizing an AI system trained on a large corpus of undisclosed data, where that AI system could generate copyrighted or sensitive text from that dataset. While the legal basis for this concern is still an open topic of debate, our enterprise customers commonly cite statements by the White House Executive Order, which has charged the US Copyright Office to “issue recommendations to the President on potential executive actions relating to copyright and AI''. Similarly, our customers refer to the FTC’s recent comment that “training an AI tool on protected expression without the creator’s consent" could result in an AI system that “exploits a creator’s reputation” and “reveals private information” that causes “substantial injury to customers”.

Given these statements from regulators, it's more important than ever for organizations to test if their language models are at risk of memorizing and leaking sensitive or protected data. For the last year, the Dynamo AI team has been working closely with customers to streamline Data Extraction Attacks as part of our comprehensive privacy suite. We’re excited to detail how this test has enabled organizations to identify and mitigate potential data leakage vulnerabilities in their language models before production-level deployment.

Key Features and Benefits:

  • Supports all popular open-source and commercial (OpenAI, Azure, Bedrock, etc.) language models
  • Supports attack techniques and metrics from state of the art literature
  • Recommendations for defending models against data extraction, including privacy-preserving training techniques, guardrails, and base model selection guidance
  • Can be customized to work with any dataset which was used to train the model

The figure below exhibits a real world example of the Data Leakage Attack with a paragraph from the novel "Harry Potter and the Sorcerer's Stone". We provide the first 22 words (the prefix) of the paragraph to the Llama 2 13B language model. Asking the model to complete the paragraph, we see that the model is able to output 40 identical words from the original text (colored in red), demonstrating high likelihood the model has seen the original paragraph from Harry Potter in its training corpus.

Our approach

The Data Extraction attack is designed to simulate an attacker's attempt to determine if a document corpus was included in the pre-training or fine-tuning dataset of a model. We have a suite of proprietary prompting strategies to uncover memorized pieces of text from models. As an example of a basic test we perform, DynamoEval will prompt the AI system with the first few words in a protected paragraph from the training dataset and analyze whether the model's completion matches the original text. We employ a set of similarity thresholds, including trigram memorization, exact starting word memorization, and overlapping words memorization, to identify if the generated text can be classified as "memorized." It assumes the adversary has black-box access to the model, allowing them to observe the generated text based on a given prompt.

Running Data Extraction test in Dynamo AI platform

You can easily run a data extraction attack using our SDK or the Dynamo AI dashboard. In the figure below, you can see the SDK reference for running a test.

dfl = DynamoFL(DYNAMOFL_API_KEY, host=DYNAMOFL_HOST)

test = dfl.data_extraction_test(
    name = "Data Extraction - Llama 2 - Harry Potter",
    model_key = model.key,
    dataset_id = dataset.id,
    gpu = GPUConfig(gpu_type = GPUType.V100, gpu_count = 1),
    memorization_granularity = "paragraph",
    sampling_rate = 1000,
    grid = [
        {
            'prompt_length': [256, 512],
            'temperature': [0, 0.5, 0.7, 1.0]
        }
    ]
)
  • name: name of the test
  • model_key: model key for the generator model tested
  • datsaet_id: dataset id containing the reference text which has to be extracted
  • gpu: type and number of GPU(s) to be used for the test
  • memorization_granularity: Granularity of memorization (Ex: paragraph, sentence)
  • grid: a set of test hyperparameters to be searched (model’s temperature, prompt length)
  • sampling_rate: Number of times the model will be queried during the attack

Mitigation measures

To help organizations defend against data extraction attacks, Dynamo AI provides tooling and guidance for implementing the following countermeasures:

  1. Guardrails (Fine-tuning and Pre-training): Implement guardrails that prevent language models from completing data extraction requests from users. These guardrails act as a first line of defense, blocking attempts to extract sensitive memorized data. Our AI guardrail, DynamoGuard specifically helps protect you against this attack.
  2. Privacy-Mitigation Techniques (Fine-tuning): Apply techniques such as differential privacy and deduplication during the fine-tuning process. Differential privacy adds noise to the training data, making it harder to extract specific data points. Deduplication removes exact copies of sensitive data from the training set, reducing the risk of memorization. Dynamo AI provides a fine tuning SDK, DynamoEnhance, that implements these methods.
  3. Smaller Models (Fine-tuning): Research suggests that smaller models are less prone to memorizing their training data verbatim. Leveraging DynamoEval can help organizations identify the optimal model size By iteratively fine-tuning with different model sizes that balances performance and privacy. 

By leveraging Dynamo AI’s tooling and expertise, organizations can significantly reduce the risk of data leakage through data extraction attacks. Our comprehensive approach addresses vulnerabilities at both the fine-tuning and pre-training stages, ensuring that language models are deployed with the utmost security and privacy.

Contact us

As LLMs become increasingly powerful and widely adopted, the risk of exposing sensitive information from training datasets grows. As part of our commitment to providing holistic privacy solutions, the Data Extraction Attack complements our existing suite of attacks, which includes PII Extraction, PII Inference and Membership Inference. With Dynamo AI's comprehensive privacy solutions, teams can effectively measure, address, and prevent data leakage, ensuring the responsible deployment and use of LLMs while safeguarding sensitive information.

We also offer a range of AI privacy and security solutions to help you build trustworthy and responsible AI systems. To learn more about how Dynamo AI can help you evaluate and improve your RAG models, or to explore our AI privacy and security offerings, please reach out to us at our contact page.