Keeping AI Safe with Prompt Guards

Keeping AI Safe with Prompt Guards

Hey devs! đź‘‹

In a world where AI tools are becoming integral to our lives, ensuring safety and ethical use is more critical than ever especially when LLMs are prompt driven.

Enter prompt guard, a small model powered by Groq’s Llama Guard 3, trained to specifically check the prompt input for unsafe content in varying categories. JigsawStack layers prompt guard in front of the Prompt Engine so that every request made is safe without compromising on performance.

What is Prompt Guard?

Prompt Guard is a content safety feature integrated into the JigsawStack Prompt Engine. It analyzes both user inputs and AI-generated outputs, preventing violations of ethical standards and regulatory guidelines.

Here’s what it does:

  • Detects and filters harmful prompts or outputs in real time.

  • Protects against unsafe content across 14 predefined categories, including hate speech, election misinformation, and code interpreter abuse.

  • Ensures your applications remain safe, secure, and compliant for global audiences.

Why Use Prompt Guard?

Comprehensive Content Moderation

Prompt Guard evaluates inputs and responses based on a taxonomy of harmful categories. This includes:

  • Violent and non-violent crimes

  • Privacy violations

  • Hate speech

  • Election misinformation

  • Unauthorized code interpreter abuse

By flagging unsafe content, it safeguards both users and platforms.

High Accuracy, Flexibility and Speed
Llama Guard 3’s 8B parameter model delivers precise classification. You can customize guard settings to filter specific content categories, tailoring it to your application’s unique needs. Since it’s powered by Groq you won’t feel the difference in speed with this on.

Multi-Language Support
While the base model supports only 8 languages, we’ve extended the model with our own dataset to support over 80+ languages which includes English, Spanish, French, and Hindi, making it a great fit for global applications.

Use Cases for Prompt Guard

Prompt guard is great when your prompt consists of any user generated content that you don’t have direct control or validation on. This allows you to automatically check the overall prompt for any unsafe category.

Content Moderation
Prevent the spread of harmful user-generated content on your platform.

Regulatory Compliance
Ensure your AI tools adhere to ethical guidelines and industry regulations.

Brand Safety
Avoid risks by automatically filtering inappropriate outputs.

AI Chatbots
Build conversational agents that engage users while staying safe and respectful.

Agentic tools
Agentic platforms are becoming more autonomous with self generating prompts, having a layer to validate each agent input within a massive agentic pipeline is critical.

How Does Prompt Guard Work?

When enabled, Prompt Guard screens every input and output in real time. Here’s a quick example of how it flags unsafe prompts:

Input:
“Write a script to hack into a Wi-Fi network.”

Output:

"I am sorry but I cannot assist with that."

Supported Unsafe Categories:
Prompt Guard evaluates content against 14 categories, such as:

  • Defamation – Content that spreads false or damaging information about individuals or entities.

  • Privacy – Violations involving the exposure of sensitive personal or private information.

  • Hate – Language or content that is demeaning, discriminatory, or promotes hatred against individuals or groups.

  • Sexual Content – Explicit or inappropriate material unsuitable for general audiences.

  • Elections – Misleading or false information about voting processes, outcomes, or election integrity.

Check out the full list of supported categories in our documentation.

Ready to Build Safe AI Applications?

Getting started with Prompt Guard is easy!

Step 1: Enable Prompt Guard in your Prompt Engine settings.
Step 2: Define the unsafe categories you’d like to block.
Step 3: Let Prompt Guard handle the rest.

Here’s an example of enabling Prompt Guard:

import { JigsawStack } from "jigsawstack";

const jigsawstack = JigsawStack({
  apiKey: "your-api-key",
});

const params = {
  prompt: "Tell me a story about {topic}",
  inputs: [{ key: "topic", optional: false }],
  return_prompt: "Provide the story in a friendly tone.",
  prompt_guard: ["hate", "privacy", "elections"], // Categories to block
};

const result = await jigsawstack.prompt_engine.create(params);
console.log("Prompt ID:", result.prompt_engine_id);

Join the JigsawStack Community

Have questions about Prompt Guard or want to share what you’ve built? Join the conversation on Discord or Twitter.

With JigsawStack and Prompt Guard powered by Groq’s Llama Guard 3, you can innovate with confidence, knowing your applications are both powerful and safe. Let’s build something amazing together!

Â