Sitemap XML Best Practices

Table of Contents

  1. What is a sitemap?
  2. Types of sitemaps
  3. Structure and syntax
  4. Technical limits and constraints
  5. Which URLs to include?
  6. XML attributes
  7. Sitemap index
  8. Declaration and submission
  9. Specialized sitemaps
  10. Common errors to avoid
  11. Automation and maintenance
  12. Validation

What is a sitemap?

A sitemap is an XML file that lists the URLs of a website to help search engines discover and crawl them more efficiently. It does not guarantee that a page will be indexed, but it improves communication between the site and crawlers.

Analogy: A sitemap is like a table of contents you hand to a search engine, saying: "Here are all the pages on my site, and here is their relative importance."

When is a sitemap particularly useful?

  • Large sites (hundreds or thousands of pages)
  • Sites with weak or poorly structured internal linking
  • New sites with few external backlinks
  • Sites with media-rich content (images, videos)
  • Sites with pages that are updated very frequently

Types of sitemaps

Type Format Primary use
XML Sitemap .xml Standard web pages
Image Sitemap .xml (extension) Indexable images
Video Sitemap .xml (extension) Video content
News Sitemap .xml (extension) Press articles (Google News)
Text Sitemap .txt Simple URL list (limited use)
RSS/Atom Sitemap .xml Content feeds (blogs, podcasts)

Structure and syntax

Minimal valid example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
  </url>
  <url>
    <loc>https://example.com/about</loc>
  </url>
</urlset>

Full example with all attributes

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2024-11-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Formatting rules

  • Encoding must be UTF-8
  • All URLs must use the full protocol (https:// or http://)
  • Special characters must be escaped as XML entities:
Character XML Entity
& &amp;
' &apos;
" &quot;
> &gt;
< &lt;

Technical limits and constraints

Constraint Maximum value
Number of URLs per file 50,000
File size 50 MB uncompressed
Compressed size (.xml.gz) Accepted by all search engines

If either of these thresholds is reached, you must use a sitemap index (see dedicated section).

Recommended compression

It is advisable to compress large sitemaps to .gz to reduce bandwidth usage and speed up crawling:

sitemap.xml.gz

Which URLs to include?

✅ Include

  • Canonical pages (the definitive version of a URL)
  • Pages accessible without authentication
  • Pages returning an HTTP 200 status
  • Pages you want indexed
  • Pages with content that is valuable to users

❌ Exclude

  • Redirect pages (301, 302)
  • Pages with a <meta name="robots" content="noindex"> tag
  • Pages blocked in robots.txt
  • Internal search result pages
  • Admin, login, and cart pages
  • Pagination pages (except page 1 in some cases)
  • URLs with session or tracking parameters (?sessionid=, ?utm_source=)
  • Duplicate pages (keep only the canonical version)
  • Pages returning an error (404, 410, 500)

Golden rule: Only submit URLs you want indexed and that correspond to the canonical version.


XML attributes

<loc> (required)

The full absolute URL of the page.

<loc>https://example.com/my-page</loc>
  • Must exactly match the protocol used (https preferred)
  • Must be the canonical version of the URL (with or without trailing slash, consistently)

<lastmod> (optional)

Date of the last content modification, in W3C Datetime format (YYYY-MM-DD or full YYYY-MM-DDTHH:MM:SS+00:00).

<lastmod>2024-11-15</lastmod>

Best practices:

  • Only use lastmod if the date is real and reliable (dynamically generated from the database)
  • Do not set today's date on all pages statically — Google ignores incorrect or too-frequently-updated dates

<changefreq> (optional)

Estimated frequency of content changes. Possible values:

Value Typical use
always Pages changed on every visit
hourly Real-time feeds
daily News, active blogs
weekly Regularly updated content
monthly Stable content pages
yearly Nearly static pages
never Permanent archives

Important note: Google officially states it makes little use of this attribute. It remains useful as a hint, but is not a directive.


<priority> (optional)

Relative priority of the page within the site, between 0.0 and 1.0 (default: 0.5).

<priority>0.8</priority>

Recommendations:

  • Homepage: 1.0
  • Main category pages: 0.8
  • Standard content pages: 0.5 to 0.7
  • Secondary pages: 0.3 to 0.5

Note: Priority is relative to your own site, not to other sites. Setting 1.0 everywhere has no positive effect and dilutes the signal.


Sitemap index

When a site exceeds 50,000 URLs or when the file exceeds 50 MB, you must split the sitemaps and create an index file.

Sitemap index structure

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2024-11-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-articles.xml</loc>
    <lastmod>2024-11-14</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-images.xml</loc>
    <lastmod>2024-11-10</lastmod>
  </sitemap>
</sitemapindex>

Best practices for the sitemap index

  • Place the index file at the root: https://example.com/sitemap.xml
  • Only reference sitemaps from the same domain
  • Update the <lastmod> of the sitemap index when a child sitemap changes
  • Segment logically: by content type, by language, by section

Declaration and submission

1. In robots.txt (recommended)

Add at the end of the robots.txt file:

Sitemap: https://example.com/sitemap.xml

Multiple sitemaps can be declared:

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-images.xml

2. Via Google Search Console

  • Go to Index > Sitemaps
  • Submit the full URL of the sitemap
  • Allows you to monitor errors and indexation status

3. Via Bing Webmaster Tools

  • Go to Sitemaps
  • Submit the sitemap URL

4. HTTP ping (deprecated)

Google removed the ping endpoint in 2023. This method no longer works.

Recommended strategy: Declare in robots.txt + submit in Search Console + submit in Bing Webmaster Tools.


Specialized sitemaps

Image sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://example.com/my-page</loc>
    <image:image>
      <image:loc>https://example.com/images/photo.jpg</image:loc>
      <image:title>Image description</image:title>
      <image:caption>Image caption</image:caption>
    </image:image>
  </url>
</urlset>

Video sitemap

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
  <url>
    <loc>https://example.com/my-video</loc>
    <video:video>
      <video:thumbnail_loc>https://example.com/thumb.jpg</video:thumbnail_loc>
      <video:title>Video title</video:title>
      <video:description>Video description</video:description>
      <video:content_loc>https://example.com/video.mp4</video:content_loc>
      <video:duration>600</video:duration>
      <video:publication_date>2024-11-15T08:00:00+00:00</video:publication_date>
    </video:video>
  </url>
</urlset>

News sitemap (Google News)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
  <url>
    <loc>https://example.com/article/my-title</loc>
    <news:news>
      <news:publication>
        <news:name>Publication name</news:name>
        <news:language>en</news:language>
      </news:publication>
      <news:publication_date>2024-11-15T09:00:00+00:00</news:publication_date>
      <news:title>Article title</news:title>
    </news:news>
  </url>
</urlset>

Google News sitemaps must only contain articles published within the last 48 hours.

Multilingual sitemap (hreflang)

<url>
  <loc>https://example.com/en/page</loc>
  <xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/page"/>
  <xhtml:link rel="alternate" hreflang="fr" href="https://example.com/fr/page"/>
  <xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/page"/>
</url>

Required namespace: xmlns:xhtml="http://www.w3.org/1999/xhtml"


Common errors to avoid

Error Impact Solution
Including noindex URLs Inconsistency flagged in Search Console Exclude blocked pages
Including redirects Wasted crawl budget Only include final destination URLs
Non-canonical URLs Diluted SEO signals Use only the canonical version
Unreliable lastmod (always today) Google ignores the attribute Generate dynamically from the database
Forgetting HTTPS when available Protocol confusion Use the site's default protocol
Inconsistency with robots.txt Pages submitted but blocked Verify URLs are not blocked
Omitting sitemap index on large sites Exceeding the 50,000 URL limit Implement a segmented index
Never updating the sitemap New pages not discovered Automate sitemap generation
Mixing HTTP and HTTPS Content duplication Absolute protocol consistency

Automation and maintenance

Popular CMS

CMS Recommended plugin/module
WordPress Yoast SEO, Rank Math, Google XML Sitemaps
Joomla OSMap, Xmap
Drupal Simple XML Sitemap
Magento Built-in natively
Shopify Built-in natively (/sitemap.xml)

Programmatic generation (PHP)

<?php
header('Content-Type: application/xml; charset=utf-8');
echo '<?xml version="1.0" encoding="UTF-8"?>';
?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<?php foreach ($pages as $page): ?>
  <url>
    <loc><?= htmlspecialchars($page['url']) ?></loc>
    <lastmod><?= date('Y-m-d', strtotime($page['updated_at'])) ?></lastmod>
    <changefreq>weekly</changefreq>
    <priority><?= $page['priority'] ?></priority>
  </url>
<?php endforeach; ?>
</urlset>

Recommended update frequency

  • Dynamic content sites (e-commerce, blogs): automatic regeneration on each publication
  • Static sites: regeneration on each deployment
  • Notifying search engines: submit a ping via Search Console after a major change

Validation

Online tools

  • Google Search Console — Official test and submission tool
  • Screaming Frog SEO Spider — Full audit and error detection
  • XML Sitemap Validatorxmlsitemaps.com
  • Bing Webmaster Tools — Validation and submission for Bing

Essential manual checks

# Check sitemap accessibility
curl -I https://example.com/sitemap.xml

# Check the returned Content-Type (should be application/xml or text/xml)
curl -s -D - https://example.com/sitemap.xml -o /dev/null | grep content-type

# Validate XML syntax
xmllint --noout sitemap.xml

Validation checklist

  • [ ] The file is publicly accessible (HTTP 200)
  • [ ] The Content-Type is application/xml or text/xml
  • [ ] The XML is valid (no syntax errors)
  • [ ] All URLs return HTTP 200
  • [ ] No noindex URLs are included
  • [ ] No URLs blocked by robots.txt are included
  • [ ] The sitemap is declared in robots.txt
  • [ ] The sitemap is submitted in Search Console
  • [ ] lastmod dates are reliable and up to date
  • [ ] URLs use the correct protocol (HTTPS)

References