How to fix “Indexed, though blocked by robots.txt” in Google Search Console

Indexed, though blocked by robots.txt

This post will discuss how your SEO can be negatively impacted if you are blocking content using the robots.txt file on your website.

Indexed, though blocked by robots.txt
Indexed, though blocked by robots.txt

Accidentally blocking website content using the robots.txt file can prevent search engines from accessing your content and indexing it for users to see in search results. In other words, even when a page is indexed, it won’t show up in Google’s database because the site’s robots.txt file can block the search engine from showing it in SERPs.

It’s essential to know the implications of blocking crawlers because they are vital for Search Engines to find information on your site.

What is a robots.txt file?

A robots.txt file is a text file that provides instructions to robots (search engine crawlers) regarding the pages on your site that should be crawling and those they should not. In the example below, we can see that Walmart is telling crawlers not to visit the URL “/cart” by using the disallow rule. This also implies that since the robots are unable to crawl the page, they shouldn’t be able to index it. However, this is not always the case.

Walmart robots.txt file
Walmart robots.txt file

How to fix “Indexed, though blocked by robots.txt”

Check for Disallow text in robots.txt

The easiest way to figure out if you are blocking pages is by using the robots.txt tester in GSC, identifying any issues.

If you know what you’re looking for, you can navigate to yourdomain.com/robots.txt to read the file. You will be looking to remove code that looks like this:

Disallow: /

Using WordPress + Yoast SEO

If you’re using the Yoast SEO plugin, follow the steps below to edit your robots.txt file.

  1. Go to Yoast Plugin in WordPress; It’s named “SEO” in the sidebar
  2. Click on Tools
  3. Select File Editor
  4. Edit your Robots.txt file
  5. Press Save
Tools: Yoast Plugin robots.txt editor
Yoast plugin robots.txt editor

WordPress is set to No-Index

Quite often, when a website has been redesigned, the developer forgets to remove the no-index checkbox in WordPress. Developers set the website to no index in staging because they don’t want the staging server to be visible to the public, nor do they want the site to be seen as duplicate content.

How to fix WordPress websites set to no-index 

  1. Logged into WordPress
  2. Go to Settings
  3. Select the Reading tab
  4. Uncheck: Discourage search engines from indexing this site
WordPress: Search engine visibility
Search engine visibility

How to fix intermittent blocks

If you are receiving intermittent blocks, check to see if you’ve been pushing a staging server into production and back again. It’s best to clear your cache and check with your developers.

How to fix user-agent blocks

User-agent blocks can be challenging to pinpoint the issue. The first thing you should check is your .htaccess file.

You can access the .htaccess file using an FTP manager with “View Invisible files” turned on or using the Yoast file editor tool (mentioned above).

If you have a WordPress website, the file should look like this:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

If you run into trouble, you can delete the .htaccess file and save the permalinks in WordPress; this will force WP to create a new .htaccess file.

If you DO NOT want the URL indexed

When you block a page from being crawled, Google may still index it because crawling and indexing are two different things. If you want the page no-indexed, you can add a no-index tag to the page’s header.

Add a <meta> tag

To prevent search engine crawlers from indexing a page on your website, add the following meta tag into the <head> section of your page:

<meta name="robots" content="noindex">

To prevent only Google crawlers from indexing a page:

<meta name="googlebot" content="noindex">

You should be aware that some search engine crawlers might ignore the no-index tag and your page might still appear in search results from other search engines.

No-index a page using Yoast

How to No-Index a specific page using the Yoast Plugin:

  1. Find the page/post you want to edit
  2. Press edit in WordPress
  3. Scroll to the bottom of the page
  4. Click on the Advanced dropdown
  5. Set “Allow search engines to show this Page in search results?” to NO
Yoast Plugin No-index setting
Yoast plugin no-index setting

Conclusion

Hopefully, this guide helped you to resolve the Google Search Console warning. If not, you can always contact me for additional troubleshooting and support.

Leave a Reply

Your email address will not be published. Required fields are marked *