All samples

Test AI app against LLM failure modes

Sample chat application that demonstrates how to test AI applications against common LLM failure modes using the LanguageModelFailurePlugin.

Waldek Mastykarz

Test AI app against LLM failure modes

Summary

Sample chat application that demonstrates how to test AI applications against common large language model (LLM) failure modes using Dev Proxy’s LanguageModelFailurePlugin. The app shows how to build resilient AI applications that gracefully handle unexpected or problematic LLM responses.

Screenshot of the AI Failure Resilience demo app showing a chat interface with Dev Proxy simulating LLM failure modes

The sample showcases:

  • Chat Interface: Interactive chat UI to test LLM interactions
  • LLM Failure Simulation: Dev Proxy injects various failure types into API responses
  • Graceful Error Handling: Demonstrates how to handle problematic LLM responses
  • VS Code Integration: Use Dev Proxy Toolkit for local development and testing

Compatibility

Dev Proxy v2.1.0

Contributors

Version history

VersionDateComments
1.2February 4, 2026Updated to Dev Proxy v2.1.0
1.1January 18, 2026Fixed sample metadata
1.0January 6, 2026Initial release

Minimal path to awesome

  1. Get the sample:
    • Download just this sample:

      npx gitload-cli https://github.com/pnp/proxy-samples/tree/main/samples/ai-failure-resilience

      or

    • Download as a .ZIP file and unzip it, or

    • Clone this repository

  2. Open the sample folder in Visual Studio Code
  3. Install the Dev Proxy Toolkit extension
  4. Generate a fine-grained personal access token with models:read permission granted
  5. Update the apiKey variable value in js/env.js with your token
  6. Start the debug session by pressing F5
  7. The browser opens with the chat interface
  8. Type questions and observe how Dev Proxy injects failure responses

Running manually

  1. In a terminal, start Dev Proxy: devproxy
  2. In a separate terminal, start the web server: npx http-server -c-1 -p 3000
  3. Open http://localhost:3000 in your browser

Simulated failure types

The sample includes configuration for seven common LLM failure types:

Failure TypeDescription
HallucinationGenerates false or made-up information
BiasStereotypingIntroduces bias or stereotyping in responses
CircularReasoningUses circular reasoning in explanations
ContradictoryInformationProvides contradictory information
PlausibleIncorrectProvides plausible but incorrect information
MisinterpretationMisinterprets the user’s request
OverconfidenceUncertaintyShows overconfidence about uncertain information

Customizing failure types

To test specific failure scenarios, edit the .devproxy/devproxyrc.json file and modify the failures array. For example, to focus on testing content accuracy issues:

{
  "languageModelFailurePlugin": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v2.1.0/languagemodelfailureplugin.schema.json",
    "failures": [
      "Hallucination",
      "PlausibleIncorrect",
      "OutdatedInformation",
      "ContradictoryInformation"
    ]
  }
}

Additional failure types

The plugin supports the following additional failure types that you can add to your configuration:

  • AmbiguityVagueness - Provides ambiguous or vague responses
  • FailureDisclaimHedge - Uses excessive disclaimers or hedging
  • FailureFollowInstructions - Fails to follow specific instructions
  • IncorrectFormatStyle - Provides responses in incorrect format or style
  • OutdatedInformation - Provides outdated or obsolete information
  • OverSpecification - Provides unnecessarily detailed responses
  • Overgeneralization - Makes overly broad generalizations
  • OverreliancePriorConversation - Over-relies on previous conversation context

Creating custom failure types

You can create custom failure types by adding .prompty files to the ~appFolder/prompts directory. Name the file lmfailure_<failure>.prompty (kebab-case) and reference it in the configuration using PascalCase.

Example lmfailure_technical-jargon-overuse.prompty:

---
name: Technical Jargon Overuse
model:
  api: chat
sample:
  scenario: Simulate a response that overuses technical jargon.
---

user:
How do I create a simple web page?

user:
You are a language model under evaluation. Your task is to simulate incorrect responses. {{scenario}} Do not try to correct the error.

Then add TechnicalJargonOveruse to your failures array.

Features

Using this sample you can use Dev Proxy to:

  • Interact with a real chat interface while Dev Proxy simulates LLM failures
  • See how failure modes like hallucinations and bias appear in practice
  • Test your application’s error handling and user experience
  • Build more robust and reliable AI-powered applications

Help

We do not support samples, but this community is always willing to help, and we want to improve these samples. We use GitHub to track issues, which makes it easy for community members to volunteer their time and help resolve issues.

You can try looking at issues related to this sample to see if anybody else is having the same issues.

If you encounter any issues using this sample, create a new issue.

Finally, if you have an idea for improvement, make a suggestion.

Disclaimer

THIS CODE IS PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.