Welcome to our troubleshooting guide for the most common indexing problems. It’s high time we wrote one, considering how often those issues are encountered across the web. We are going to pay more particular attention to the case of Google since this is where we notice most of the complaints. After all, it’s pretty logical that the majority of websites want to see themselves on the world’s top search engine.
But what do we exactly label as problems here?
Let us ask you one concrete question. Are you suddenly experiencing some unexplainable deterioration regarding your website traffic? If so, this might be simply due to an indexing drawback that you aren’t aware of yet. So here’s the plan: we are going to break down the possible scenarios one by one. For each problem, you will get the most easily implementable solutions.
Perhaps one of the most common indexing problems is duplicate content. This occurs when similar (if not fully identical) content is displayed on multiple URLs. Search engines struggle to decide which one they should choose and put in their results. Oftentimes they just ignore the content in question altogether. This may have extremely negative consequences, such as ending up with an entire website not showing up on Google.
This is not really anyone’s fault in particular. It may be a technical bug or something beyond your control. For example, some sites may be using parts of your content without you knowing it (content syndication).
You may also be in the middle of an ongoing migration process from HTTP to HTTPS. Or you may be using two versions of your site (one with prefixes such as www and the other without). Not to mention printer-friendly pages and session IDs that keep track of your visitors. All of these scenarios can cause indexing problems because they look like duplicate content.
Search engines may put into the same basket URLs that have any resemblances with one another. Likewise, the same codes appearing in a different order or the same terms using different cases (upper versus lower) can cause confusion. Compare:
- www.inflatablemugs.com/homepage/?a=1&b=2 and www.inflatablemugs.com/homepage/?b=2&a=1
- www.inflatablemugs.com/products and www.inflatablemugs.com/Products
Code-related issues can also happen when your online store offers different versions/colors of the same product.
What To Do
One of the best troubleshooting practices here is to use canonical URLs (rel=”canonical”). By doing so, you instruct the search engines to treat the so-called duplications as copies of specific pages. Copies benefit from the same advantages and status as canonical URLs.
Redirection also works fine. Use a 301 redirect protocol to lead the crawlers from the duplicate page to the original page. This strategy can even act as a combination of ranking power (pages reinforcing each other).
You can also add meta robots to the HTML headers of the pages you don’t want to see indexed. This won’t prevent Google from crawling them, but they won’t be indexed.
Have you already gotten a ‘Crawled – Currently Not Indexed’ message after submitting a URL to Google Search Console? The explanation is less mysterious than it seems. One thing most web experts are unanimous about is definitely the importance of quality over quantity. No one should create a website just for the sake of it and then fill it with mediocre material.
For example, website owners who care about their rankings can simply not afford to practice ‘cheap’ SEO. Just think about keywords. One can’t use them randomly. They should rather be parts of a coherent ensemble that follows basic rules of grammar, semantics, argumentation, originality, etc. Unfinished/fragmented content and plagiarism are also very likely to create indexing problems.
What To Do
Pay attention to content quality. We are aware that there’s no such thing as 100% novelty. Every ‘new’ idea and creation is inspired by and the extension of older ones. Nevertheless, one should always try to bring some personalized touch.
Your headlines, descriptions, and body texts should be unique and match the objectives you have set for your site. Not confident enough about your skills? Don’t hesitate to hire or collaborate with domain specialists such as copywriters.
Web designers and webmasters are also here to ensure coherence to your site and fix any related breakdown. And, of course, refrain from falling into plagiarism. Beyond legal and ethical issues, this can also result in duplicated content.
If you really can’t avoid a few lower-quality elements on your website, that’s ok. Just instruct Google not to crawl them. To do so, use a robots.txt file. Similarly, noindex tags can prevent search engines from indexing certain sections.
Another one of these common indexing problems is crawl budget. Googlebot is not Led Zeppelin. It doesn’t say ‘I’m gonna crawl’ at all costs. Jokes aside, the crawling mechanism is indeed an extremely resource-consuming one. When creating website indexes, all search engines and their web crawlers have to sort out numerous sophisticated parameters. Therefore, they have to proceed in an economical way by setting some quotas.
They simply can’t spend all their time on one single website. So they remain within a determined budget. For each website, they crawl and index only a certain number of pages. This limitation can lead to various indexing problems. Most typically, Google can ignore some of your URLs.
No need to panic though. Search engines are generous enough despite their aforementioned quota. They are usually restrictive only with websites having too many pages and redirects. But since it’s better to be safe than sorry, let’s check the potential precautions and solutions.
What To Do
Be ‘techniclean.’ Make sure that your website is compliant with basic contemporary technical requirements. Check your site speed and keep it up-to-date. Use a sitemap along with flat, interlinked site architecture.
Speaking of interlinked architecture, be careful with orphan pages. If you have any, integrate them to the rest of the site through internal and external links.
Internal links are, by the way, very budget-friendly elements. So incorporate them into your site as much as you can. This will allow Googlebot to navigate across your pages more easily. Here again, avoid duplicate content in order not to waste the crawl budget assigned to your site.
Another one of these common indexing problems is Soft 404. Soft 404 errors are among the website issues occurring frequently. You probably already know the classical HTTP 404. It’s the standard response code indicating that a certain page hasn’t been found. It usually appears when the page in question has been removed or in case of a broken link. This information is crucial for web crawlers.
That’s how they know they shouldn’t care about ‘dead’ pages anymore. So what about soft 404? Sometimes, there’s a communication breakdown with the server. People trying to access non-existent pages get a simple ‘page not found’ message instead of the standard HTTP 404 confirmation. Or worse, they are redirected to a totally different page. These soft or ‘fake’ 404 situations create a mess because they mislead the crawlers.
What To Do
First of all, you got to be sure that there’s no false alarm. Google Search Console may sometimes treat certain pages as soft 404 with no apparent reason. Use the following verification procedure:
Time needed: 1 minute.
Use the following verification procedure:
- Access your list of soft 404 in your Console board.
Log in to your account. Reach your Coverage report through the ‘Coverage’ section on the left menu. Click on the ‘Submitted URL seems to be a Soft 404‘ notification.
- Browse the list of soft 404.
While checking the list of soft 404 pages returned by the system, also open the related URLs in new tabs. Compare them with each other.
- Fix the possible errors.
If the page(s) you are inspecting is/are valid, select the ‘Validate Fix’ option. This informs Google that you want it/them to be crawled and appear in the search results.
You can check whether the operation has been successful by testing the URLs in your browser.
At other times, there may be too little content or a problem regarding the overall quality. Revise and improve your pages, then resubmit them to Google. Make also sure to delete any nonvalid pages and reconfigure the server with the appropriate HTTP response codes (404 or 410).
If you really need to keep a ‘problematic’ page, add a noindex directive in the header. That’s how search engines will know they should not index that specific page. You may also redirect a defective page to a valid page by adding a 301 redirect code in your .htaccess file.
General Crawling and Scanning Issues
Common indexing problems may appear in a multitude of other forms requiring a case-by-case investigation. A very common example is configuration mistakes regarding robots.txt files. Even the slightest error found in them may prevent Googlebot from scanning your pages properly. So make sure to check once again everything from A to Z. Be it your user-agent directives or simply the placement of slashes, every single detail counts.
Another general and frequent issue is related to the website size. A recent study has shown that smaller websites are more affected by reduced crawl budgets. In other words, the granted budget may be proportional to the site size. But some bigger sites also suffer from a similar problem because, well, they have much more components. As for duplicate content, smaller sites tend to be more repetitive. It could be partly because they have less diversified content.
Note that scanning and crawling issues happen more often on larger websites. There’s simply too much information to be processed.
What To Do
There’s no specific formula to report here except handling your website according to its website. Those of you dealing with larger sites ought to spend more time on inspection activities. Otherwise, the huge number of pages and elements can quickly become overwhelming.
Additional Note on Background Check Issues
If you ventured into some shady virtual activities in the past and then were penalized, you should clean this up. Otherwise, high chance that Google will keep ignoring your website.
Terminate any ongoing legal proceedings. Then make a fresh start. Build a brand new domain and site from scratch with totally updated content. Most importantly, be sure to make peace with the official rules this time.
Getting Rid of Indexing Problems
In our article, we have reviewed some of the most common indexing problems, especially with Google. It was of the utmost importance to investigate their respective solutions for obvious reasons. Like what? Doing justice to your SEO efforts, ensuring higher ranking and authority to your website, improving the user experience of your visitors. Yes, all of that and even more. A smoothly running indexing process is like a security valve for your website. As you can tell, we haven’t touched much upon API indexing; but if you wish, you can get detailed information by clicking on the link.