Published:
Updated:
by Wayne Smith
The majority of webmasters do not consider search engines customers for their content. If they did, they may see many opportunities to promote their sites by strategically addressing search engine pain points. Some of these pain points are because of the algorithms used by the search engine, and others are because of the web page.
The point of view is from an SEO perspective. But also based on experience in document retrieving systems and from vertical web search engine operation.
Value of Content for the Search Engine
The pain point for search engines is these pages do not add value to the search results but consume resources and degrade the performance of the search engine.
Thin Content vs Gain of Information
De-indexing of doorway pages and orphaned content
Duplicate and Syndicated Content, Keyword Cannibalization
Anything that causes a particular search results to show only pages from one site, or the same content across multiple sites is a pain point. People doing SEO want to sub-categorize the reasons, but they are the same pain point for search engines.
The search engine must address the lack of site diversity in the search results using algorithms, which remove the pages from the results.
Some pages are removed in real-time during the creation of the results page. The search engine will only display the first result and then filter out any additional pages for the site. A real-time filter is the least efficient refinement method. The real-time filter can then record the pages it had to remove and provide the list to Navboost, Rankbrain, and other search refinement algorithms. The refinement algorithms can then make adjustments so multiple pages don't naturally appear for the search term.
User engagement does not become a factor until the page is visible in search. Navboost is a refinement of the search results that take place outside of the natural order of the ranking system.
Some pages are permanently removed: A solution for pages that also hit the thin content pain point.
Crawled - currently not indexed
Crawled currently not indexed can be caused by several reasons. The major reason for 2024 seems to be keyword cannibalization and thin content issues.
Keyword Cannibalization
All websites have different levels of keyword cannibalization. The name of the website will normally appear on every page. Algorithmically, when enough people are searching for the site name and when other sites are linking to the brand ... A Knowledge Panel can be created to improve search efficiency.
Topics or categories and products or articles
Category pages and the pages they lead to for more detailed information share many of the same keywords. Content needs to be optimized so the correct page is the most relevant for the keywords.
Syndicated Content
Content that appears on multiple sites will only remain indexed for the site with the most authority on the content—often, the original author's site becomes the authoritative site for the content. Still, if the content appears on multiple sites at nearly the same time, it may be the larger or more established site.
Duplicate content and soft 404s
Duplicate content is a technical SEO error. Sites may present content on different URLs based on their set up. Generally, handled by a canonical tag.
The Canonical link, A broken promise
The canonical is understood by many to be a solution to the problem of duplicate content.
Fix soft 404 errors for site-wide SEO improvements
A soft 404 is a technical SEO problem with a website that consumes bandwidth, uses up the crawl budget for a website, provides a poor user experience for a user, and is unhelpful content.
... Solution Smith tests SEO tactics so you don't have to ...
Full-stack SEO has a lot of moving parts, and it takes time and effort to understand the nuances involved. Solution Smith takes care of the overhead costs associated with digital marketing resulting in real savings in terms of time and effort.