sitemap.xml and robots.txt
sitemap.xml and robots.txt examples and urls configuration.
urls.py:
from django.views.generic.simple import direct_to_template
urlpatterns += patterns('',
(r'^robots\.txt$', direct_to_template,
{'template': 'robots.txt', 'mimetype': 'text/plain'}),
(r'^sitemap\.xml$', direct_to_template,
{'template': 'sitemap.txt', 'mimetype': 'text/xml'}),
)
Or use TemplateView for Django version above 1.4:
from django.views.generic import TemplateView
urlpatterns += patterns('',
(r'^robots\.txt$', TemplateView.as_view(
template_name='robots.txt', content_type='text/plain')),
(r'^sitemap\.xml$', TemplateView.as_view(
template_name='sitemap.xml', content_type='text/xml')),
)
templates/sitemap.xml:
<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://mysite.com/somepage/</loc>
<lastmod>2013-01-01</lastmod>
<changefreq>weekly</changefreq>
<priority>1.00</priority>
</url>
</urlset>
templates/robots.txt:
User-agent: Yandex
Disallow: /admin
Disallow: /static
Disallow: /media
Host: mysite.com
User-agent: Goolebot
Disallow: /admin
Disallow: /static
Disallow: /media
User-agent: *
Crawl-delay: 30
Disallow: /admin
Disallow: /static
Disallow: /media
Note, You should extend robots.txt by urls You don't wan't to be indexed by search crawlers.
Opposite to robots.txt, sitemap.xml should contains urls of pages You want search engines knows about.
Links:
- http://fredericiana.com/2010/06/09/three-ways-to-add-a-robots-txt-to-your-django-project/
- http://www.wordsinarow.com/xml-sitemaps.html
Licensed under CC BY-SA 3.0