2.19.5.1. Site map (sitemap.xml)
The sitemap.xml file, in its standardized format, provides search engines with a list of pages to be indexed. A description of the protocol is available on the official site.
Important points:
- The
sitemap.xmlfile must have exactly this name and must be encoded in UTF-8. - A single
sitemap.xmlfile must not exceed 50 MB. If the file is larger than 50 MB, you must either compress it (using thexml.ziporxml.tarextension) or create a group consisting of multiple sites. - A single
sitemap.xmlfile should contain no more than 50000 links. - The
sitemap.xmlfile must be placed in the site root directory and be accessible via a browser at a URL of the formhttp://www.example.com/sitemap.xml. - All links in the site map must be absolute (in the format
http://www.example.com/). - The site map must meet the requirements of the relevant search engine crawler, as some of them have specific conditions for using this file.
- The site map used by search engine crawlers is merely a recommendation. Crawlers may ignore it if there are errors in the site map itself or for other reasons of their own.
- Certain special characters must be escaped.
Syntax of sitemap.xml
When creating a site map, you must follow a specific syntax. A basic site map with correct syntax looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://example.com/</loc>
</url>
</urlset>
Tags
The following tags are used in the sitemap.xml file:
<?xml version="1.0" encoding="UTF-8"?>— prologue of an XML file. This line specifies the encoding and version of the XML. This line must always be the first one and is required. Required tag<urlset>...</urlset>— parent tag that contains all subsequent references to site pages using the<url>tags. Required tag
The opening tag must specify the current protocol, like this:<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">...</urlset><url>...</url>— tag that contains the URL itself and information about it. Required tag<loc></loc>— a tag that specifies a particular URL. Required tag<lastmod></lastmod>— date of last modification. Optional tag<changefreq></changefreq>— expected frequency of changes to this page. This tag is for informational purposes only. Optional tag
Valid values:always— check for changes during each indexing.hourly/daily/weekly/monthly/yearly— check for changes at regular intervals. Every: hour/day/week/month/year.never— never check for changes.
<priority></priority>— the priority of a URL relative to other URLs listed in the site map. The value ranges from 0.0 to 1.0; the default value for all URLs is 0.5. Optional tagWarning!
The priority tag does not affect how pages appear in search results. Its value only affects the order in which pages on the site are indexed.
Character escaping
In XML files, for all data (including URLs), the characters & (ampersand), ' (single quotes), " (double quotes), < (greater-than), and > (less-than) must be specified as HTML entities (starting with &).
Creating group of multiple sitemap files
If the sitemap.xml file is larger than 50 MB or contains more than 50000 links, you should split it into multiple files and create a sitemap.xml file that links to the other sitemap files.
Example of a sitemap file:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml</loc>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml</loc>
</sitemap>
</sitemapindex>
The sitemap index file has the following syntax:
<?xml version="1.0" encoding="UTF-8"?>— prologue of an XML file. This line specifies the encoding and version of the XML. This line must always be the first one and is required. Required tag<sitemapindex>...</sitemapindex>— parent tag that contains all subsequent references to site map files. Required tag<sitemap>...</sitemap>— tag that contains the URL pointing to the sitemap file and information about it. Required tag<loc></loc>— tag that specifies a specific URL for the sitemap file. Required tag<lastmod></lastmod>— date of last modification. Optional tag
Sitemap generation and validation services
Examples of services for generating and validating sitemap files.