Search Engines Can Not Read

Published: 11-01-2024
Updated: 12-01-2024
by Wayne Smith

Search engines, and Google in particular, have many algorithms. Some people may think that these algorithms can actually read and understand the content; Then, on an emotional judgment, decide which content is best. Search engines actually just scan the content and sort pages based on mathematical algorithms.

Say for example you see a document in a language you can not read. And say you know 検索エンジン is Japanese for search engines.

You see on a document that says:

言葉以外の何ものでもない検索エンジン

You know or can deduce the document has something to do with search engines.

You see another document:

検索エンジン

日本語でいくつかの単語

Noting the predominance of the word/entity "search engines" in Japanese as the headline; You can sort the documents with the term used in the headline as perhaps the better or more relevant of the two documents.

You are not reading and understanding Japanese you are scanning the content and sorting based on characteristics or the way words are placed into the content.

Search Engines are Businesses

Operating a search engine is a business that uses marketing to frame features for those who will use the search engine. Saying, "Our search engine now uses headlines for ranking documents," is not talking to their clients and does not help marketing. The clients don't care how a search engine operates ... To Frame the feature as a benefit one may say, "Our search engine provides more relevant documents."

Helpful Content Update is Marketing, not Algorithm Terminology

When Inktomi bettered Altavista as a search engine one of the ways Inktomi did it was to look at zones on a page, more specifically the predominance of the headlines, and gave more weight in ranking based on how words were used.

Tenured SEO

Search engine algorithms have evolved to scan documents in better and better ways. Entity-based SEO is a disruptive technology and could change search; It is not a tenured evergreen technology that has proven itself worth the cost of implementation. Like local SEO the results merge into the SERPS; The entity-based pages co-exist with pages based solely on keyword optimization.

OpenAI search would be a disruptive search engine -- some speculate it will replace how people find sites on the internet. AI-based search has yet to gain mainstream traction ... It is still in the proof of concept stage, and rapidly changing.

Eliza the first NLP AI

ELIZA was an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum.

It was an AI chat imulating the dialog of a stereotypical psychologist. One would enter a sentence and it would respond, stereotypically with a question.

The source code is readily available...

Look for the first word after a "the" in the sentence. Consider that word as the [subject].

If the sentence lacks an article for the subject, Look for a proper name (capitalized noun).
If a pronoun is used, the subject remains the same as for the last input. But the pronoun can be used for the responce.

Look for a word that matches a feeling in its feelings word list.

If a feeling exists go to step 4.
If a feeling does not exist go to step 3.

Ask how the [subject] makes the client feel.
Ask why the [subject] makes the client [feel] that way.
Continue to ask the client about how or why - each additional input.

... Does Eliza understand how the client is feeling? No, of course not! It is artificial but appears intelligent. It takes something someone entered into it and paraphrases it, stereotypically in the form of a question about feelings. It creates an illusion of listening to what it is being told by its response.

Eliza ignores all connecting words and only looks for the subject and a feeling. Modern AI also ignores what it does not know how to interpret.

Modern NLP AI

The breadth of knowledge or data set for related entities of modern NLP AI systems today is impressive. Yet although they are self-learning they still fundamentally run on the same principle as Eliza. They look for entities in what is provided and build a data set of related entities.

Entity-based AI search optimization

Say you ask a question, "Which Chardonnay is more sweet?"

Within the entity data for Chardonnay, two major types of Chardonnays affect its flavor profile.

Entity: Chardonnay
Description: Chardonnay grapes can be made into a range of wines, from bone-dry to sweet dessert wine. Even if a Chardonnay is made in a dry style, several factors can make it seem sweet.
- TypeOf: Oaked Chardonnay
  Description: Aged in oak barrels, this wine has a rich texture, full body, and sweet aroma with notes of butterscotch and vanilla. The palate offers a buttery flavor with notes of honey, hazelnut, and caramel.
- TypeOf: Unoaked Chardonnay
  Description: Aged in stainless steel tanks, this wine has a light body, bright color, and crisp minerality. Its nose has citrus aroma notes with hints of lime, apple, and peach,

AI can answer, "Oaked Chardonnay is generally sweeter than unoaked Chardonnay," and provide references.

AI may lack the data that dry for the context of wine means less sugar or less sweet. Un-dry wine is not an entity, it is an understanding of the process of creating wine. Google's AI has recently discovered (October 2024) that wine can have residual sugar.

A better answer would be, "Oaked Chardonnay, which is not labeled as dry, is sweeter," but AI is impressive technology. Late-harvest can be fully fermented and have a higher alcohol content "dry." Sugar is converted to alcohol when fermented; late harvest grapes start with more sugar but ... the fermentation process is stopped based on desired flavors. Brewers yeast can ferment a 12% sugar mash to nearly 12% alcohol by volume (25 proof).

Entity based search benefit

The entity data can then be used like semantically related terms to provide users with pages they are interested in, ... the entities can be scanned for on the page. These results should have a higher degree of relevancy; And, search engines don't need to run expensive GPUs on 400 plus billion pages every 30 days -- they can scan the pages for the entities found by AI, published in part within the Knowledge Panel. Much of the entity data comes from Wikipedia.

The logical places to place entities on the page are the tenured (evergreen) keyword zones and within the schema.

Entity (relationship model)

A search algorithms is based off of mathmatics and ranking factors must be quantifiable or countable ... factors that are only true or false can be used for quality indicators to determine if a site will be listed or buried.

Search Engines Can Not Read Content