LSI (latent semantic indexing) and content relevancy

by Wayne Smith

Using keywords only to determine relevancy comes down to keyword density, which does not always produce quality search results. One solution to this is LSI (latent semantic indexing), by qualifying the results by checking for words related to the keyword the results are more relevant.

LSI is one model or method to determine relevancy by looking for related words, but LSI itself may not scale; The patented LSI algorithm was developed in 1988 before the internet at Bell Labs ... It was created based on documents that were not changing ... documents on the internet are both dynamic and being updated.

A Hybrid Solution

A relevancy system is required for a search engine. When searching for a brand the website for the brand is expected even when it is not the most popular page ... options include but are not limited to:

  • Looking at the anchor text for keywords and adding relvancy to the keyword based on the text.
    This can be done for both on-site links and off-site links when building the database.
  • Entity or Open Knowledge Graph data
    Open Knowledge Graph data, Natural Language processing, et la. Google has spend a king's ransom on technologies creating the entity–relationship model
  • Topical Authority - Based on site, or sections of a site
  • And others

Entity based search

When using the entity–relationship model the following are possible:

  • The entity may convey a user intent
  • The entity may be brand or have a Knowledge Graph.
  • If the search is for a site, determine by entity, a site links panel can be shown.
  • Topical Authority can be baked into the keyword database, IE a brand would be the authority of its brand.
  • Some sematics can be baked into the keyword database, IE mice is the same as mouse / Recipe is the same as DIY.

User Query Intent + SEO insights into entities