Back to blog

AI patterns

Markdown for Agents

AI agents are increasingly the primary "reader" of web content — and they prefer Markdown to HTML. This article is for web developers and DevOps engineers who want to optimise their websites for AI agents without depending on a specific host.

SvK by Sven von Känel 16 min read
  • KI
  • Markdown
  • Agents

Introduction

AI agents are increasingly the primary "reader" of web content — and they prefer Markdown to HTML. This article is for web developers and DevOps engineers who want to optimise their websites for AI agents without depending on a specific host. It looks at two implementation approaches — application-specific and proxy-based — with a concrete .NET code example and a sample Nginx configuration.

Motivation

In a blog post titled "Introducing Markdown for Agents", Cloudflare — as one of the major hosting providers — introduces a new feature for websites that takes account of the fact that more and more content on the internet isn't fetched by humans, but by AI agents (a.k.a. bots). The starting point is the fact that parsing HTML content for AI agents is relatively expensive (in token terms), since a web page naturally contains many elements that are layout-related rather than content-related. Markdown content, by contrast, can be parsed by AI agents far more "token-efficiently" and therefore more cheaply, because the focus is clearly on the content. The core idea in the article is therefore to (automatically) deliver Markdown instead of HTML when the HTTP request header Accept contains the value "text/markdown", for example:

curl https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ \
  -H "Accept: text/markdown"

This feature is available automatically for sites hosted on Cloudflare, according to the documentation. Since the idea is genuinely compelling — in the future, content will probably be fetched "manually" via Google ever less often, and researched ever more often by an agent through an AI chat interface — the question becomes how sites not hosted on Cloudflare could be equipped with this feature. That seems important because in future it may not be the "Google ranking" but the "AI friendliness" of a site that determines the reach of your content. The "[content-signal](https://contentsignals.org/)" response header also plays a role here, used to influence the calling agent's behaviour:

content-signal: ai-train=yes, search=yes, ai-input=yes

Implementation approaches

The following is a look at the options available in principle for implementing a comparable feature without relying on the host.

Application-specific implementation

This solution is especially interesting when much of the content is "already" in Markdown and can be delivered directly when the "Accept: text/markdown" header is detected, with no conversion step. That's the case, for example, when content-heavy pages of a website are generated from Markdown content in a headless CMS like Directus. Dynamic content can be made available "on the fly" or pre-generated in Markdown.

Advantages

  • High content fidelity is achievable, since you have full site-specific control over the conversion process.

Disadvantages

  • Site-specific implementation is needed (although a uniform tech stack can absorb this through shared libraries).

Example implementation

Here's an example for a recently built website that draws its content from a Directus headless CMS, in the form of a .NET middleware. Content available in the CMS is loaded directly from there (TryResolveCmsContent); other pages are converted "on the fly" using the ReverseMarkdown library after the rendered page is finished. Note that this is a sample implementation, not production-ready code.

///-------------------------------------------------------------------------------------------------
/// <summary> Middleware that serves page content as Markdown when the request includes
///           Accept: text/markdown. Follows the Cloudflare "Markdown for Agents" proposal. </summary>
///-------------------------------------------------------------------------------------------------
internal sealed class MarkdownForAgentsMiddleware(
  RequestDelegate next,
  MarkdownForAgentsOptions options,
  ILogger<MarkdownForAgentsMiddleware> logger)
{
  ///-------------------------------------------------------------------------------------------------
  /// <summary> The next middleware in the pipeline. </summary>
  ///-------------------------------------------------------------------------------------------------
  private readonly RequestDelegate mNext = next;

  ///-------------------------------------------------------------------------------------------------
  /// <summary> The middleware configuration options. </summary>
  ///-------------------------------------------------------------------------------------------------
  private readonly MarkdownForAgentsOptions mOptions = options;

  ///-------------------------------------------------------------------------------------------------
  /// <summary> The logger instance. </summary>
  ///-------------------------------------------------------------------------------------------------
  private readonly ILogger<MarkdownForAgentsMiddleware> mLogger = logger;

  ///-------------------------------------------------------------------------------------------------
  /// <summary> The ReverseMarkdown converter instance (thread-safe singleton). </summary>
  ///-------------------------------------------------------------------------------------------------
  private readonly Converter mConverter = new(new Config
  {
    UnknownTags = Config.UnknownTagsOption.Bypass,
    RemoveComments = true,
    GithubFlavored = true,
    SmartHrefHandling = true
  });

  ///-------------------------------------------------------------------------------------------------
  /// <summary> Known CMS content categories mapped from URL path segments. </summary>
  ///-------------------------------------------------------------------------------------------------
  private static readonly FrozenDictionary<String, String> sCmsLanguages =
    new Dictionary<String, String>(StringComparer.OrdinalIgnoreCase)
    {
      { "de", "Deutsch" },
      { "en", "English" }
    }.ToFrozenDictionary(StringComparer.OrdinalIgnoreCase);

  // ... (full implementation in the original article — request handling,
  //      Accept-header parsing with q-values, ETag/304 negotiation,
  //      Vary header merging, and structured logging)
}

The middleware registration:

public static WebApplication UseMarkdownForAgents(
  this WebApplication app,
  Action<MarkdownForAgentsOptions>? configure = null)
{
  ArgumentNullException.ThrowIfNull(app);

  var options = new MarkdownForAgentsOptions();
  configure?.Invoke(options);
  app.UseMiddleware<MarkdownForAgentsMiddleware>(options);

  return app;
}

The configuration class (adjust to your own needs):

internal sealed class MarkdownForAgentsOptions
{
  public String ContentSignalHeaderValue { get; set; } = "ai-train=yes, search=yes, ai-input=yes";
  public Boolean AcceptLegacyTextXMarkdown { get; set; } = true;
  public Boolean SetNoTransform { get; set; } = true;
  public Boolean SetWeakETag { get; set; } = true;
  public String ContentStartMarker { get; set; } = "<!-- CONTENT_START -->";
  public String ContentEndMarker { get; set; } = "<!-- CONTENT_END -->";
  public List<String> SkipPathPrefixes { get; set; } =
  [
    "/api/", "/auth/", "/cart/", "/status",
    "/error", "/chathub", "/assets/"
  ];
}

A short note on two specifics in the code above.

Quality value in the Accept header

The Accept HTTP header can optionally carry several preferred formats, each with a "quality" value:

Accept: text/html, application/json;q=0.9, text/plain;q=0.5, application/xml;q=0

This indicates "how strongly" a particular format is preferred. The code above serves Markdown when the q value for "text/markdown" is greater than zero.

ETag header

The ETag (entity tag) response header is an ID for a specific version of a resource — for example an HTML page or, in our case, a Markdown result. It enables resource caching, among other things. A client that has cached the resource sends back the original ETag. If nothing has changed, the server responds with HTTP status 304 (Not Modified) — otherwise with the changed resource and a new ETag value. That avoids retransmitting unchanged content. Another use case is handling colliding edit operations; more on that in the documentation. The code above serves an ETag to enable caching, for example.

Deployment note: depending on configuration, reverse proxies / CDNs may use their own cache key, which means you have to ensure that Accept (HTML or Markdown) and ETag actually flow into the cache key. Otherwise you risk "cross-content" cache hits (Markdown delivered to a browser, or vice versa).

Vary header

The Vary response header tells the browser (or the agent) which request header influences the format of the response. This signals to the calling party that different values of that request header will produce different results.

Vary: Accept

So this tells the browser that the result delivered depends on the value of the Accept request header.

Application-independent implementation

This is mainly interesting for hosting scenarios where a reverse proxy (Nginx, HAProxy, Traefik, YARP, …) or edge proxy (Cloudflare, AWS CloudFront, Fastly, Akamai, Vercel's Edge Network) sits in front of the actual website. The idea here is to detect the "Accept: text/markdown" header and either

  • route to a service that fetches the HTML markup of the page in question and converts it to Markdown "on the fly", or
  • use a cache like Redis "pre-filled" with the corresponding Markdown content.

Both approaches can of course be combined by storing generated Markdown in the cache.

Advantages

  • Uniform implementation across many heterogeneous applications
  • One solution for several websites and hosting environments
  • Application code doesn't need changes

Disadvantages

  • The quality of the HTML-to-Markdown conversion depends heavily on the complexity of the source HTML (header, footer, sidebars, ads, and so on)
  • An extra network "hop", since the conversion application sits between the proxy and the main application
  • If content is pre-generated, you need a generation pipeline or background job. You also need to think about change frequency and invalidation of the generated content.

Implementation

How to deliver this depends mainly on:

  • Required throughput: ideas include a Node.js Express solution with the Turndown library, or a fast Go implementation
  • Caching support
  • Boilerplate-removal quality: a library like @mozilla/readability plus jsdom may help

Sample Nginx configuration

An Nginx configuration that wires in a Markdown conversion service might look like this.

Note: the md_renderer upstream referenced here is a placeholder — the actual conversion service (e.g. based on Turndown or a comparable library) has to be implemented and deployed separately.

# /etc/nginx/conf.d/site.conf

# Decide upstream based on Accept header (safe to use 'map').
map $http_accept $wants_markdown {
    default 0;
    ~*text/markdown 1;
    ~*text/x-markdown 1;
}

# Pick the upstream name based on $wants_markdown.
map $wants_markdown $backend_upstream {
    0 "html_origin";
    1 "md_renderer";
}

upstream html_origin {
    server 10.0.10.25:8080;
    keepalive 64;
}

upstream md_renderer {
    server 10.0.20.15:3000;
    keepalive 32;
}

server {
    listen 80;
    server_name www.example.com;

    location / {
        proxy_pass http://$backend_upstream;

        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP $remote_addr;

        proxy_set_header X-Original-URI $request_uri;
    }
}

With SSL added:

server {
    listen 443 ssl http2;
    server_name www.example.com;

    ssl_certificate     /etc/letsencrypt/live/www.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/www.example.com/privkey.pem;

    location / {
        proxy_pass http://$backend_upstream;

        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Original-URI $request_uri;
    }
}

server {
    listen 80;
    server_name www.example.com;
    return 301 https://$host$request_uri;
}

Checklist

The following points matter for both approaches above:

  • Correct cache handling (where used): the cache should be consistent and current (see also the ETag header above).
  • Content parity: the Markdown should reflect the same main content as the HTML, not a "thinned" version.
  • Boilerplate control: when converting from HTML, nav, footer, and sidebars should be removed (Readability-style extraction helps).
  • Security: the Markdown service must not fetch arbitrary URLs (SSRF). Only your own origins should be allowed.
  • Observability: track metrics like Markdown hits, conversion time, cache hit rate, and origin fetch errors.
  • Graceful fallback: if conversion fails, several options exist: return HTML (not ideal), return 406 Not Acceptable, or return minimally extracted Markdown (plain text).

Alternative approaches

Google recently introduced WebMCP (Web Model Context Protocol), a new JavaScript interface intended to let AI agents interact with websites in a standardised way. Unlike Markdown for Agents, the focus here is not on scraping page content but on targeted interaction with the site — for example, filling forms or placing orders. The two approaches complement each other: Markdown for Agents optimises passive reading of content; WebMCP enables active interaction. Implementing WebMCP requires more development effort, since it means working with new JavaScript APIs.

Conclusion

Delivering Markdown content for AI agents isn't a Cloudflare privilege — with manageable effort, the feature can be integrated into any existing web infrastructure. Whether application-specific via middleware or proxy-based via Nginx and friends: what matters is that your content is optimised for the next generation of "readers". Anyone who makes their site AI-friendly today secures the reach of tomorrow — because the Accept header of the future will more and more often be text/markdown.

NEWSLETTER

Four to six times a year, no marketing noise.

One pattern, one case, one recommendation. Signup with double opt-in, unsubscribe at any time.