Automatic Splitting

How SitemapHost handles large sitemaps by automatically splitting them

The Problem

Search engines impose strict limits on sitemap files:

  • 50,000 URLs maximum per sitemap file
  • 50MB maximum uncompressed file size

For sites with more than 50,000 pages, you need to split your sitemap into multiple files and create a sitemap index that references them all. Managing this manually is error-prone and time-consuming.

The Solution

SitemapHost automatically handles splitting for you. When you upload more than 50,000 URLs, we:

  1. Split your URLs into chunks of 50,000
  2. Generate individual sitemap files for each chunk
  3. Create a sitemap index that references all child sitemaps
  4. Serve the index at your root sitemap URL

You don't need to do anything special to enable splitting. Just upload all your URLs and we handle the rest automatically.

How It Works

Before: Your Upload

You upload 150,000 URLs via the API or dashboard:

{
  "domain": "sitemap.yoursite.com",
  "urls": [
    { "loc": "https://yoursite.com/page-1" },
    { "loc": "https://yoursite.com/page-2" },
    // ... 149,998 more URLs
  ]
}

After: Generated Files

SitemapHost generates the following structure:

sitemap.yoursite.com/
├── sitemap.xml       (sitemap index)
├── sitemap-1.xml     (URLs 1-50,000)
├── sitemap-2.xml     (URLs 50,001-100,000)
└── sitemap-3.xml     (URLs 100,001-150,000)

Sitemap Index

The main sitemap.xml becomes a sitemap index:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://sitemap.yoursite.com/sitemap-1.xml</loc>
    <lastmod>2024-01-15T10:30:00Z</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://sitemap.yoursite.com/sitemap-2.xml</loc>
    <lastmod>2024-01-15T10:30:00Z</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://sitemap.yoursite.com/sitemap-3.xml</loc>
    <lastmod>2024-01-15T10:30:00Z</lastmod>
  </sitemap>
</sitemapindex>

Child Sitemaps

Each child sitemap contains up to 50,000 URLs:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/page-1</loc>
    <lastmod>2024-01-15</lastmod>
  </url>
  <url>
    <loc>https://yoursite.com/page-2</loc>
    <lastmod>2024-01-14</lastmod>
  </url>
  <!-- ... up to 50,000 URLs -->
</urlset>

Submitting to Search Engines

When using automatic splitting, you only need to submit the main sitemap index URL to search engines. They will automatically discover and process all child sitemaps.

robots.txt: Add only the main sitemap URL to your robots.txt:

User-agent: *
Allow: /

Sitemap: https://sitemap.yoursite.com/sitemap.xml

Incremental Updates

When you update your sitemap, we intelligently handle the changes:

  • Adding URLs - New URLs are added to existing chunks or new chunks are created
  • Removing URLs - Chunks are rebalanced to maintain optimal distribution
  • Updating URLs - Only affected chunks are regenerated

The lastmod timestamp in the sitemap index is updated whenever any child sitemap changes, signaling to search engines that they should re-crawl.

Limits

LimitValueNotes
URLs per chunk50,000Google/Bing limit
Max chunks50025 million URLs total
File size per chunk< 50MBAutomatically managed

Next Steps