Sitemap contains urls which are blocked by robots txt

New Page

“Indexed, though blocked by robots.txt” – what does it mean and how do I fix it?

Last update: June 20, 2021

“Indexed but blocked by robots.txt” indicates that Google indexed the URLs even though they were blocked by your robots.txt file.

Google has marked these URLs as “Valid with Warning” because they don’t know if you want these URLs indexed. In this article, you will learn how to fix this problem.

Here’s what it looks like in the Google Search Console Index Coverage report, with the number of URL impressions displayed:

Screenshot of indexing, although blocked by robots.txt in the GSC index coverage report

Double check at URL level

You can verify this by going to Coverage > Indexed, though blocked by robots.txte and inspecting one of the URLs listed. Then under Crawl it will say “No: Blocked by robots.txt” for the Crawl allowed field and “Error: Blocked by robots.txt” for the page fetch field.

What part of your content can’t be crawled by Google?

Submit your website and find out immediately!

An unexpected error has appeared. Contact us please. Something went wrong. Please try again later.

Domain (required) Please enter a valid domain name (www.example.com). This domain is already monitored by ContentKing. Please contact us if you are the rightful owner of this domain. This site redirects to. Do you want to add this website instead? This area has been excluded from monitoring. If you are an owner, please contact us. We can’t find this area. Are you sure this is correct? Unfortunately, we cannot access this website (login failure). Contact us for assistance. Unfortunately, we cannot access this website (blank answer). Contact us for assistance. Unfortunately, we cannot access this site (size limit). Contact us for assistance. Unfortunately, we cannot access this website (timeout). Please contact us for assistance. Unfortunately, we cannot access this website (blocked by web application firewall). Please contact us for assistance.

Loading Check now. No credit card needed

So what happened?

Normally Google wouldn’t index these URLs, but apparently they found links to them and deemed them important enough to index.

The snippets displayed are likely to be sub-optimal, such as:

Description Google not available robots.txt

Useful resources

  • Doesn’t a robots.txt file tell search engines to de-index pages?
  • How to Get Google to Index Your Website

How to fix “Indexed, though blocked by robots.txt”

  1. Export the list of URLs from Google Search Console and sort them alphabetically.
  2. Review the URLs and check if they include URLs…
    1. Que voulez-vous indexer ? Si tel est le cas, veuillez mettre à jour votre fichier robots.txt pour permettre à Google d’accéder à ces URL.
    2. A los que no quieres que accedan los motores de búsqueda. Si este es el caso, deje su archivo robots.txt como está, pero verifique si tiene algún enlace interno que deba eliminar.
    3. A los que pueden acceder los buscadores, pero que no quieres que indexen. En este caso, actualice su archivo robots.txt para reflejar esto y aplique las directivas de robots noindex .
    4. That shouldn’t be accessible to anyone, ever. Take for example, a staging environment. In this case, follow the steps explained in our Protecting Staging Environments article.
  3. In case it’s not clear to you what part of your robots.txt is causing these URLs to be blocked, select an URL and hit the TEST ROBOTS.TXT BLOCKING button in the pane that opens on the right hand side. This will open up a new window showing you what line in your robots.txt prevents Google from accessing the URL.
  4. When you’re done making changes, hit the VALIDATE FIX button to request Google to re-evaluate your robots.txt against your URLs.

Start monitoring your site before making changes

Track every change you make, and make sure your robots.txt don’t further hurt your site!

Some unexpected error happened. Please contact us. Something went wrong. Please, try again later.

Domain (required) Please enter a valid domain name (www.example.com). This domain is already being monitored by ContentKing. Please contact us if you are the rightful owner of this domain. This website redirects to . Do you want to add this website instead? This domain has been excluded from monitoring. If you own it, please contact us. We can’t find this domain. Are you sure it’s correct? Unfortunately, we can’t reach this website (connect failure). Please contact us for help. Unfortunately, we can’t reach this website (empty response). Please contact us for help. Unfortunately, we can’t reach this website (size limit). Please contact us for help. Unfortunately, we can’t reach this website (time-out). Please contact us for help. Unfortunately, we can’t reach this website (blocked by web application firewall). Please contact us for help.

Loading Check now. No credit card needed Indexed, although blocked by robots.txt patch for WordPress

The process for fixing this issue for WordPress sites is the same as described in the steps above, but here are some tips for quickly finding your robots.txt file in WordPress:

WordPress + Yoast SEO

If you are using the Yoast SEO plugin, follow the steps below to adjust your robots.txt file:

  1. Login to your wp-admin section.
  2. From the sidebar, navigate to Yoast SEO plugin > Tools.
  3. Access the file editor.

WordPress + RankMath

If you are using the Rank Math SEO plugin, follow the steps below to adjust your robots.txt file:

  1. Login to your wp-admin section.
  2. In the sidebar, navigate to Rank Math > General Settings.
  3. Accédez à Modifier robots.txt.

WordPress + SEO tout-en-un

Si vous utilisez le plugin All in One SEO, suivez les étapes ci-dessous pour ajuster votre fichier robots.txt :

  1. Connectez-vous à votre section wp-admin.
  2. In the sidebar, go to All in One SEO > Robots.txt.

Pro tip

If you’re working on a WordPress website that hasn’t launched yet, and can’t wrap your head around why your robots.txt contains the follow:

User-agent: * Disallow: /

then check your settings under: Settings > Reading and look for Search Engine Visibility.

If the box Discourage search engines from indexing this site is checked, WordPress will generate a virtual robots.txt preventing search engines from accessing the site.

Indexed, though blocked by robots.txt fix for Shopify

Shopify doesn’t allow you to manage your robots.txt from their system, so you’re working with a default one that’s applied to all sites.

Perhaps you’ve seen the “Indexed, though blocked by robots.txt” message in Google Search Console or received a “New index coverage issue detected” email from Google about it. We recommended to always check out what URLs this concerns, because you don’t want to leave anything to chance in SEO.

Review the URLs, and see if any important URLs are blocked. If that’s the case, you’ve got two options which require some work, but do allow you to change your robots.txt file on Shopify:

  1. Set up a reverse-proxy (opens in a new tab)
  2. Use Cloudflare Workers (opens in a new tab)

Whether or not these options are worth it to you depends on the potential reward. If it’s sizable, look into implementing one of these options.

You can take the same approach on the Squarespace platform.

Useful resources

  • Shopify SEO Tips and Best Practices: Complete Guide

FAQs

🤖 Pourquoi Google affiche-t-il cette erreur pour mes pages ?

Google a trouvé des liens vers des pages qui ne leur sont pas accessibles en raison des directives d’interdiction de robots.txt. Lorsque Google jugera ces pages suffisamment importantes, il les indexera.

🧐 Comment corrigez-vous cette erreur ?

La réponse courte à cette question consiste à s’assurer que les pages que vous souhaitez que Google indexe soient uniquement accessibles aux robots d’exploration de Google. Et les pages que vous ne voulez pas qu’elles indexent ne doivent pas être liées en interne. La réponse longue est décrite dans la section “Comment réparer ‘Indexé, bien que bloqué par robots.txt'” de cet article.

🧾 Can I edit my robots.txt file in WordPress?

Popular SEO plugins like Yoast, Rank Math and All in one SEO, for example, allow you to edit your robots.txt file directly from the wp admin panel.

Content King Academy

Read the full Academy article to learn all about the Google Search Console Index Coverage Report

Introducing robots. SMS

bookmark_border Stay organized with collections Save and categorize content according to your preferences.

A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is mainly used to avoid overloading your site with requests; it is not a mechanism to keep a webpage out of Google . To keep a webpage out of Google, block indexationnoindex or password protect the page.

If you’re using a CMS, such as Wix or Blogger , you may not need (or be able to) edit your robots.txt file directly. Instead, your CMS could expose a search settings page or some other mechanism to tell search engines whether or not to crawl your page.

If you want to hide or show any of your pages from search engines, find instructions on how to change your page’s visibility in search engines in your CMS (for example, search for “wix hide page from search engines of research”).

What is a bot? txt used for?

A robots.txt file is primarily used to manage crawler traffic to your site, and generally to keep a file outside of Google, depending on the type of file:

robots.txt effect on different file types
Web page

You can use a robots.txt file for web pages (HTML, PDF, or other non-media formats that Google can read), to manage crawl traffic if you think your server will be overwhelmed with robot requests. Google crawling or to prevent tracking. . unimportant or similar pages on your site.

Warning  : Do not use a robots.txt file to hide your web pages from Google search results.

If other pages link to your page with descriptive text, Google can still index the URL without visiting the page. If you want to block your page from search results, use another method, such as password protection or noindex.

If your web page is blocked with a robots.txt file , your URL may still appear in search results, but the search result will not have a description. Image files, video files, PDFs and other non-HTML files will be excluded. If you see this search result for your page and want to fix it, remove the robots.txt entry that is blocking the page. If you want to hide the page entirely from search, use another method.

Media file

Use a robots.txt file to manage crawl traffic and also to block image, video, and audio files from appearing in Google search results. This will not prevent other pages or users from linking to your image, video or audio file.

  • Learn more about how to block images from appearing on Google.
  • Learn more about deleting or limiting your video files on Google.
resource file You can use a robots.txt file to block resource files, such as unimportant image files, scripts, or styles, if you think pages loaded without those resources won’t be significantly affected by the leak . However, if the absence of these resources makes it difficult for Google’s crawler to understand the page, don’t block them, or Google won’t do a good job of crawling pages that depend on these resources.

Understand the limits of a robot. Text file

Before creating or modifying a robots.txt file, you should understand the limitations of this URL blocking method. Depending on your goals and situation, you might consider other mechanisms to ensure that your URLs cannot be found on the web.

  • robots.txt directives may not be supported by all search engines. Instructions in robots.txt files cannot enforce crawler behavior on your site; It is up to the tracker to obey them. While Googlebot and other reputable web crawlers obey the instructions in a robots.txt file, other crawlers may not. Therefore, if you want to protect information from web crawlers, it is better to use other locking methods, such as password protecting private files on your server.
  • Different bots interpret the syntax differently. Although respectable web crawlers follow the directives of a robots.txt file, each crawler may interpret the directives differently. You need to know the proper syntax for addressing different web crawlers, as some may not understand certain instructions.
  • A page that is not allowed in robots.txt can still be indexed if it links to other sites. Although Google does not crawl or index content blocked by a robots.txt file, we may find and index an unauthorized URL if it is linked from another website. Therefore, the URL and possibly other publicly available information, such as anchor text in links to the page, may still appear in Google search results. To properly prevent your URL from appearing in Google search results, password protect files on your server, use the noindexmeta tag or response header, or delete the page altogether.

Caution – Combining multiple crawling and indexing directives may cause some directives to override other directives. Learn how to combine crawling with indexing and publishing guidelines. Create or update a robots. Text file

If you’ve decided you need one, learn how to create a robots.txt file. Or if you already have one, find out how to update it.


Video Sitemap contains urls which are blocked by robots txt

Related Posts

Free chat room code for my website

Contents1 How to set up a free chat room on the website.2 3 comentarios2.1 Trackbacks/Pingbacks2.2 Submit a Comment Cancel reply3 How to Create a Chat Room Website…

Background image full screen css

Contents1 Cómo – Full Page Image1.1 Example1.2 Example2 CSS background image tamaño tutorial: how to codify a complete page background image3 Perfect Full Page Background Image3.1 Méthode CSS géniale,…

WordPress leverage browser caching

Contents1 Aproveche el almacenamiento in hidden del navegador1.1 Will it works for my website?1.2 Where are plugin options1.3 Some JavaScript files still display under Leverage Browser Caching1.4…

WordPress post to facebook page

Contents1 How to Automatically Post to Facebook from WordPress1.1 Download Now: How to Launch a WordPress Website [Free Guide + Checklist]1.2 1. Create an IFTTT account.1.3 2….

Download images from wordpress media library

Contents1 How to export your WordPress media library1.1 Download maintenant : How to launch a WordPress website [Free Guide + Checklist]1.2 How to export your WordPress media…

WordPress single post template

Contents1 How to Create Custom Unique Post Templates in WordPress2 Post Template Files2.1 author.php2.2 Fecha.php3 Handbook navigation4 How to Create Custom Single Post Templates in WordPress5 Video…