Fix soft 404 errors for site-wide SEO improvements

Published:
Updated:
by Wayne Smith

A soft 404 is a technical SEO problem with a website that consumes bandwidth, uses up the crawl budget for a website, provides a poor user experience for a user, and is unhelpful content. Soft-404s and some tactics related to not presenting a 404 error can also be exploited to harm the website in search engines.

Full Stack SEO

Where Soft-404s came from?

Web Servers will normally send a 404 status code when a page does not exist. However, the behavior can be changed; Content Management Systems deliver content dynamically and need to hook into the server to provide pages that are not static. The configuration for CMS systems is fairly straightforward; It becomes the responsibility of the CMS system to send a 404 when a document is not found.

However, Web sites can become fairly complex and an error may go unseen. Solution Smith uses a "keep it simple," as a strategy for website configuration.

Site Search

While site search engines may use a query string instead of friendly URLs, Google crawls URLs that contain query strings unless expressly told not to. It is typical for a search to deliver content and not a 404 error when the search does not find a document. Dynamic tag search based on friendly URLs; may be based on or adaptation of site search, these systems should deliver a 404 error when no tag results exist -- but it may have never been a design consideration.

Google's own custom search engine does not deliver 404 status codes when there are no results.

Link Bomb Risk

If a site search delivers a page and repeats the query on the page -- there is no risk that visitors' systems will be harmed or the site will lose any data. So this behavior would not be considered as a hacking risk by many people involved in site security. Somebody using it to search for pejoratives would be seen as engaging in juvenile behavior.

However, links to the search engine query with the pajorative, if not prevented would be crawled by search engines. And if enough links were pointing to these pejorative pages the site would become relevant to pejoratives and may also lose ranking for the pages and terms they should rank for.

How to use robots.txt to prevent query string indexing

The rules for robots.txt is the first rule from top to bottom that matches is used. So the order of the rules matters. The * symbol is a wild card so a robots.txt file to block indexing of query terms could look something like this:

... Solution Smith tests SEO tactics so you don't have to ...

Full-stack SEO has a lot of moving parts, and it takes time and effort to understand the nuances involved. Solution Smith takes care of the overhead costs associated with digital marketing resulting in real savings in terms of time and effort.