15 KiB

Raw Blame History

rehype-harden

A rehype plugin that ensures that untrusted markdown does not contain images from and links to unexpected origins.

This is particularly important for markdown returned from LLMs in AI agents which might have been subject to prompt injection.

Secure prefixes

This package validates URL prefixes and URL origins. Prefix allow-lists can be circumvented with open redirects, so make sure to make the prefixes are specific enough to avoid such attacks.

E.g. it is more secure to allow https://example.com/images/ than it is to allow all of https://example.com/ which may contain open redirects.

Additionally, URLs may contain path traversal like /../. This package does not resolve these. It is your responsibility that your web server does not allow such traversal.

Features

🔒 URL Filtering: Blocks links and images that don't match allowed URL prefixes
🔧 Drop-in: Works with any rehype-compatible pipeline

Installation

npm install rehype-harden
# or
yarn add rehype-harden
# or
pnpm add rehype-harden

Quick Start

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remarkRehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://github.com/", "https://docs."],
    allowedImagePrefixes: ["https://via.placeholder.com", "/"],
  })
  .use(/* whatever compiler you want, eg hast-to-jsx-runtime or hast-to-svelte */);

API

Args

`defaultOrigin?: string`

The origin to resolve relative URLs against
Required when allowedLinkPrefixes or allowedImagePrefixes are provided (except when using wildcard ["*"])
When using wildcard ["*"] without defaultOrigin, relative URLs (e.g., /path, ./page) are allowed and preserved as-is
Example: "https://mysite.com"

`allowedLinkPrefixes?: string[]`

Array of URL prefixes that are allowed for links
Links not matching these prefixes will be blocked and shown as [blocked]
Use "*" to allow all URLs (disables filtering. However, javascript: and data: URLs are always disallowed)
Default: [] (blocks all links)
Example: ['https://github.com/', 'https://docs.example.com/'] or ['*']

`allowedImagePrefixes?: string[]`

Array of URL prefixes that are allowed for images
Images not matching these prefixes will be blocked and shown as placeholders
Use "*" to allow all URLs (disables filtering. However, javascript: and data: URLs are always disallowed unless allowDataImages is enabled)
Default: [] (blocks all images)
Example: ['https://via.placeholder.com/', '/'] or ['*']

`allowDataImages?: boolean`

When set to true, allows data:image/* URLs (base64-encoded images) in image sources
This is useful for scenarios where images are embedded directly in markdown (e.g., documents converted from PDF or .docx)
Only data:image/* URLs are allowed; other data: URLs (like data:text/html) remain blocked for security
data: URLs are never allowed in links, regardless of this setting
Default: false (blocks all data: URLs)
Example: true

`allowedProtocols?: string[]`

Array of custom URL protocols that are allowed in links
Useful for deep links to applications (e.g., tel:, mailto:, postman:, vscode:, slack:)
Use "*" to allow all protocols that can be parsed as valid URLs
Dangerous protocols (javascript:, data:, file:, vbscript:) are always blocked regardless of this setting
Default: [] (only allows built-in safe protocols: https:, http:, mailto:, irc:, ircs:, xmpp:, blob:)
Example: ['tel:', 'postman:', 'vscode:'] or ['*']

`linkBlockPolicy?: BlockPolicyType`

Controls how blocked links are handled
"indicator" (default): Renders as plain text with [blocked] suffix and the blocked URL in a title attribute
"text-only": Renders just the link text without any indicator or URL
"remove": Removes the blocked link entirely from the output

`imageBlockPolicy?: BlockPolicyType`

Controls how blocked images are handled
"indicator" (default): Renders as a placeholder span with [Image blocked: {alt text}]
"text-only": Renders just the alt text (images with no alt text are removed)
"remove": Removes the blocked image entirely from the output

`blockedImageClass?: string`

When an image is blocked with the "indicator" policy, the replacement span includes this class for styling.

`blockedLinkClass?: string`

Same as above, but for blocked links using the "indicator" policy.

Examples

Basic Usage with Default Blocking

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

// Blocks all external links and images by default
const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden) // No options = blocks everything
  .use(/* your compiler */);

const result = processor.processSync(markdownContent);

Allow Specific Domains

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: [
      "https://github.com/",
      "https://docs.github.com/",
      "https://www.npmjs.com/",
    ],
    allowedImagePrefixes: [
      "https://via.placeholder.com/",
      "https://images.unsplash.com/",
      "/", // Allow relative images
    ],
  })
  .use(/* your compiler */);

const result = processor.processSync(markdownContent);

Relative URL Handling

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://mysite.com/"],
    allowedImagePrefixes: ["https://mysite.com/"],
  })
  .use(/* your compiler */);

const markdownWithRelativeUrls = `
[Relative Link](/internal-page)
![Relative Image](/images/logo.png)
`;

const result = processor.processSync(markdownWithRelativeUrls);

Allow All URLs (Wildcard)

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    allowedLinkPrefixes: ["*"],
    allowedImagePrefixes: ["*"],
  })
  .use(/* your compiler */);

const markdownWithExternalUrls = `
[Any Link](https://anywhere.com/link)
![Any Image](https://untrusted-site.com/image.jpg)
[Relative Link](/internal-page)
`;

const result = processor.processSync(markdownWithExternalUrls);
// All URLs are allowed, including relative URLs like /internal-page

Note: Using "*" disables URL filtering entirely. Only use this when you trust the markdown source. When using wildcard without defaultOrigin, relative URLs are preserved as-is in the output.

Allow Base64 Images

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedImagePrefixes: ["https://mysite.com/"],
    allowDataImages: true, // Enable base64 images
  })
  .use(/* your compiler */);

const markdownWithBase64Images = `
![Base64 Image](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==)
![Regular Image](https://mysite.com/image.png)
`;

const result = processor.processSync(markdownWithBase64Images);

Note: This is particularly useful when converting documents from formats like PDF or .docx where images are embedded as base64. Only data:image/* URLs are allowed; other data: URLs remain blocked for security.

Blob URLs

Blob URLs (blob:) are automatically allowed by default for both links and images. These are browser-generated URLs that reference in-memory objects and are commonly used for:

Previewing user-uploaded files before upload
Client-side image manipulation
Displaying generated content

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedImagePrefixes: ["https://mysite.com/"],
  })
  .use(/* your compiler */);

const markdownWithBlobUrl = `
![Preview](blob:https://example.com/40a5fb5a-d56d-4a33-b4e2-0acf6a8e5f64)
`;

const result = processor.processSync(markdownWithBlobUrl);
// The blob: URL will be allowed even without being in allowedImagePrefixes

Note: Blob URLs are safe because they can only reference content already loaded in the browser's memory. They cannot be used to exfiltrate data or load external resources.

Custom Protocol Support

Enable custom protocols for deep linking to applications and services:

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    allowedProtocols: ['tel:', 'mailto:', 'postman:', 'vscode:', 'slack:'],
  })
  .use(/* your compiler */);

const markdownWithCustomProtocols = `
[Call us](tel:+1234567890)
[Email support](mailto:support@example.com)
[Open in Postman](postman://open/collection)
[View in VS Code](vscode://file/path/to/file.ts)
[Join Slack](slack://channel?id=C123456)
`;

const result = processor.processSync(markdownWithCustomProtocols);
// All these custom protocol links will be allowed

Common use cases:

tel: - Phone number links that open the dialer on mobile devices
mailto: - Email links (allowed by default, but shown here for completeness)
sms: - SMS/text message links
postman:, vscode:, slack: - Deep links to desktop applications
Custom app protocols - Links to your own Electron or native applications

You can also use the wildcard to allow any custom protocol:

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    allowedProtocols: ['*'], // Allow all protocols
  })
  .use(/* your compiler */);

Security Note: Even with allowedProtocols: ['*'], dangerous protocols like javascript:, data:, file:, and vbscript: are always blocked for security. Custom protocols are safe because they trigger OS-level protocol handlers and don't execute in the browser context.

Block Policies

Control how blocked content is handled instead of the default [blocked] indicator:

import { harden, BlockPolicy } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remarkRehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://trusted.com/"],
    allowedImagePrefixes: ["https://trusted.com/"],
    linkBlockPolicy: "text-only", // Show link text only, no [blocked] indicator
    imageBlockPolicy: "remove", // Remove blocked images entirely
  })
  .use(/* your compiler */);

Available policies: "indicator" (default), "text-only", "remove".

Custom Styling for Blocked Content

import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://trusted.com/"],
    allowedImagePrefixes: ["https://trusted.com/"],
    blockedLinkClass: "blocked-link",
    blockedImageClass: "blocked-image",
  })
  .use(/* your compiler */);

const result = processor.processSync(markdownContent);

Security Features

URL Filtering

Links: Filters href attributes in <a> elements
Images: Filters src attributes in <img> elements
Relative URLs: Properly resolves and validates relative URLs against defaultOrigin
Path Traversal Protection: Normalizes URLs to prevent ../ attacks
Wildcard Support: Use "*" prefix to disable filtering (only when markdown is trusted)
Prefix Matching: Validates that URLs start with allowed prefixes and have matching origins

Blocked Content Handling

Behavior is configurable per element type via linkBlockPolicy and imageBlockPolicy:

"indicator" (default): Blocked links show a [blocked] suffix; blocked images show [Image blocked: {alt}]
"text-only": Outputs just the link text or image alt text with no indicator
"remove": Removes blocked elements entirely from the output

Attack Prevention

XSS Prevention: Blocks javascript:, data:, vbscript:, file: and other dangerous protocols (always, regardless of configuration)
Redirect Protection: Prevents unauthorized redirects to malicious sites
Tracking Prevention: Blocks unauthorized image tracking pixels
Domain Spoofing: Validates full URLs, not just domains
Safe Protocols: Allows safe protocols including https:, http:, mailto:, blob:, and others while blocking dangerous ones
Custom Protocols: Optional support for custom protocols (e.g., tel:, postman:, vscode:) with explicit opt-in via allowedProtocols

Testing

The package includes comprehensive tests covering:

Basic markdown rendering
URL filtering for links and images
Relative URL handling
Security bypass prevention
Edge cases and malformed URLs
TypeScript type safety

Run tests:

pnpm test

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see the LICENSE file for details.

Security

If you discover a security vulnerability, please send an e-mail to security@vercel.com.

15 KiB Raw Blame History