Sitemap XML Best Practices
Table of Contents
- What is a sitemap?
- Types of sitemaps
- Structure and syntax
- Technical limits and constraints
- Which URLs to include?
- XML attributes
- Sitemap index
- Declaration and submission
- Specialized sitemaps
- Common errors to avoid
- Automation and maintenance
- Validation
What is a sitemap?
A sitemap is an XML file that lists the URLs of a website to help search engines discover and crawl them more efficiently. It does not guarantee that a page will be indexed, but it improves communication between the site and crawlers.
Analogy: A sitemap is like a table of contents you hand to a search engine, saying: "Here are all the pages on my site, and here is their relative importance."
When is a sitemap particularly useful?
- Large sites (hundreds or thousands of pages)
- Sites with weak or poorly structured internal linking
- New sites with few external backlinks
- Sites with media-rich content (images, videos)
- Sites with pages that are updated very frequently
Types of sitemaps
| Type | Format | Primary use |
|---|---|---|
| XML Sitemap | .xml |
Standard web pages |
| Image Sitemap | .xml (extension) |
Indexable images |
| Video Sitemap | .xml (extension) |
Video content |
| News Sitemap | .xml (extension) |
Press articles (Google News) |
| Text Sitemap | .txt |
Simple URL list (limited use) |
| RSS/Atom Sitemap | .xml |
Content feeds (blogs, podcasts) |
Structure and syntax
Minimal valid example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
</url>
<url>
<loc>https://example.com/about</loc>
</url>
</urlset>
Full example with all attributes
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2024-11-15</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Formatting rules
- Encoding must be UTF-8
- All URLs must use the full protocol (
https://orhttp://) - Special characters must be escaped as XML entities:
| Character | XML Entity |
|---|---|
& |
& |
' |
' |
" |
" |
> |
> |
< |
< |
Technical limits and constraints
| Constraint | Maximum value |
|---|---|
| Number of URLs per file | 50,000 |
| File size | 50 MB uncompressed |
Compressed size (.xml.gz) |
Accepted by all search engines |
If either of these thresholds is reached, you must use a sitemap index (see dedicated section).
Recommended compression
It is advisable to compress large sitemaps to .gz to reduce bandwidth usage and speed up crawling:
sitemap.xml.gz
Which URLs to include?
✅ Include
- Canonical pages (the definitive version of a URL)
- Pages accessible without authentication
- Pages returning an HTTP
200status - Pages you want indexed
- Pages with content that is valuable to users
❌ Exclude
- Redirect pages (
301,302) - Pages with a
<meta name="robots" content="noindex">tag - Pages blocked in
robots.txt - Internal search result pages
- Admin, login, and cart pages
- Pagination pages (except page 1 in some cases)
- URLs with session or tracking parameters (
?sessionid=,?utm_source=) - Duplicate pages (keep only the canonical version)
- Pages returning an error (
404,410,500)
Golden rule: Only submit URLs you want indexed and that correspond to the canonical version.
XML attributes
<loc> (required)
The full absolute URL of the page.
<loc>https://example.com/my-page</loc>
- Must exactly match the protocol used (
httpspreferred) - Must be the canonical version of the URL (with or without trailing slash, consistently)
<lastmod> (optional)
Date of the last content modification, in W3C Datetime format (YYYY-MM-DD or full YYYY-MM-DDTHH:MM:SS+00:00).
<lastmod>2024-11-15</lastmod>
Best practices:
- Only use
lastmodif the date is real and reliable (dynamically generated from the database) - Do not set today's date on all pages statically — Google ignores incorrect or too-frequently-updated dates
<changefreq> (optional)
Estimated frequency of content changes. Possible values:
| Value | Typical use |
|---|---|
always |
Pages changed on every visit |
hourly |
Real-time feeds |
daily |
News, active blogs |
weekly |
Regularly updated content |
monthly |
Stable content pages |
yearly |
Nearly static pages |
never |
Permanent archives |
Important note: Google officially states it makes little use of this attribute. It remains useful as a hint, but is not a directive.
<priority> (optional)
Relative priority of the page within the site, between 0.0 and 1.0 (default: 0.5).
<priority>0.8</priority>
Recommendations:
- Homepage:
1.0 - Main category pages:
0.8 - Standard content pages:
0.5to0.7 - Secondary pages:
0.3to0.5
Note: Priority is relative to your own site, not to other sites. Setting
1.0everywhere has no positive effect and dilutes the signal.
Sitemap index
When a site exceeds 50,000 URLs or when the file exceeds 50 MB, you must split the sitemaps and create an index file.
Sitemap index structure
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2024-11-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-articles.xml</loc>
<lastmod>2024-11-14</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-images.xml</loc>
<lastmod>2024-11-10</lastmod>
</sitemap>
</sitemapindex>
Best practices for the sitemap index
- Place the index file at the root:
https://example.com/sitemap.xml - Only reference sitemaps from the same domain
- Update the
<lastmod>of the sitemap index when a child sitemap changes - Segment logically: by content type, by language, by section
Declaration and submission
1. In robots.txt (recommended)
Add at the end of the robots.txt file:
Sitemap: https://example.com/sitemap.xml
Multiple sitemaps can be declared:
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-images.xml
2. Via Google Search Console
- Go to Index > Sitemaps
- Submit the full URL of the sitemap
- Allows you to monitor errors and indexation status
3. Via Bing Webmaster Tools
- Go to Sitemaps
- Submit the sitemap URL
4. HTTP ping (deprecated)
Google removed the ping endpoint in 2023. This method no longer works.
Recommended strategy: Declare in
robots.txt+ submit in Search Console + submit in Bing Webmaster Tools.
Specialized sitemaps
Image sitemap
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://example.com/my-page</loc>
<image:image>
<image:loc>https://example.com/images/photo.jpg</image:loc>
<image:title>Image description</image:title>
<image:caption>Image caption</image:caption>
</image:image>
</url>
</urlset>
Video sitemap
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://example.com/my-video</loc>
<video:video>
<video:thumbnail_loc>https://example.com/thumb.jpg</video:thumbnail_loc>
<video:title>Video title</video:title>
<video:description>Video description</video:description>
<video:content_loc>https://example.com/video.mp4</video:content_loc>
<video:duration>600</video:duration>
<video:publication_date>2024-11-15T08:00:00+00:00</video:publication_date>
</video:video>
</url>
</urlset>
News sitemap (Google News)
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>https://example.com/article/my-title</loc>
<news:news>
<news:publication>
<news:name>Publication name</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2024-11-15T09:00:00+00:00</news:publication_date>
<news:title>Article title</news:title>
</news:news>
</url>
</urlset>
Google News sitemaps must only contain articles published within the last 48 hours.
Multilingual sitemap (hreflang)
<url>
<loc>https://example.com/en/page</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/page"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://example.com/fr/page"/>
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/page"/>
</url>
Required namespace: xmlns:xhtml="http://www.w3.org/1999/xhtml"
Common errors to avoid
| Error | Impact | Solution |
|---|---|---|
Including noindex URLs |
Inconsistency flagged in Search Console | Exclude blocked pages |
| Including redirects | Wasted crawl budget | Only include final destination URLs |
| Non-canonical URLs | Diluted SEO signals | Use only the canonical version |
Unreliable lastmod (always today) |
Google ignores the attribute | Generate dynamically from the database |
| Forgetting HTTPS when available | Protocol confusion | Use the site's default protocol |
Inconsistency with robots.txt |
Pages submitted but blocked | Verify URLs are not blocked |
| Omitting sitemap index on large sites | Exceeding the 50,000 URL limit | Implement a segmented index |
| Never updating the sitemap | New pages not discovered | Automate sitemap generation |
| Mixing HTTP and HTTPS | Content duplication | Absolute protocol consistency |
Automation and maintenance
Popular CMS
| CMS | Recommended plugin/module |
|---|---|
| WordPress | Yoast SEO, Rank Math, Google XML Sitemaps |
| Joomla | OSMap, Xmap |
| Drupal | Simple XML Sitemap |
| Magento | Built-in natively |
| Shopify | Built-in natively (/sitemap.xml) |
Programmatic generation (PHP)
<?php
header('Content-Type: application/xml; charset=utf-8');
echo '<?xml version="1.0" encoding="UTF-8"?>';
?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<?php foreach ($pages as $page): ?>
<url>
<loc><?= htmlspecialchars($page['url']) ?></loc>
<lastmod><?= date('Y-m-d', strtotime($page['updated_at'])) ?></lastmod>
<changefreq>weekly</changefreq>
<priority><?= $page['priority'] ?></priority>
</url>
<?php endforeach; ?>
</urlset>
Recommended update frequency
- Dynamic content sites (e-commerce, blogs): automatic regeneration on each publication
- Static sites: regeneration on each deployment
- Notifying search engines: submit a ping via Search Console after a major change
Validation
Online tools
- Google Search Console — Official test and submission tool
- Screaming Frog SEO Spider — Full audit and error detection
- XML Sitemap Validator — xmlsitemaps.com
- Bing Webmaster Tools — Validation and submission for Bing
Essential manual checks
# Check sitemap accessibility
curl -I https://example.com/sitemap.xml
# Check the returned Content-Type (should be application/xml or text/xml)
curl -s -D - https://example.com/sitemap.xml -o /dev/null | grep content-type
# Validate XML syntax
xmllint --noout sitemap.xml
Validation checklist
- [ ] The file is publicly accessible (HTTP 200)
- [ ] The Content-Type is
application/xmlortext/xml - [ ] The XML is valid (no syntax errors)
- [ ] All URLs return HTTP 200
- [ ] No
noindexURLs are included - [ ] No URLs blocked by
robots.txtare included - [ ] The sitemap is declared in
robots.txt - [ ] The sitemap is submitted in Search Console
- [ ]
lastmoddates are reliable and up to date - [ ] URLs use the correct protocol (HTTPS)