Keyword Research: Marketing Intelligence for Entities

Published: 03-19-2025
Updated: 12-03-2025
by Wayne Smith

Keyword research identifies the terms people use to find information and helps connect your content with the right audience. It prioritizes relevance and user intent rather than relying solely on search volume. Effective keyword research aligns with what customers actually want, creating a direct link between marketing goals and search visibility. This process provides valuable marketing intelligence—not just for digital channels, but for marketing as a whole. Entity research represents an advanced keyword research strategies lining up with the evolution from Keywords to Entities in Search

Relationship Between Keyword Research and Marketing: Keyword research and marketing research are closely connected. Fan-out queries—the modern evolution of long-tail strategies—reveal market size and uncover opportunities. Canonical entities act as authoritative targets within the knowledge graph, while individual user intent reflects audience-specific linguistic nuances. The “canonical loophole” explains why branded queries often rank effortlessly when the brand is already established as a canonical entity.

Part 1 – Understanding Keyword / Entity Cannibalization post HCU: Explores keyword and entity cannibalization before and after the 2023 Helpful Content Update, highlighting why relying on PPC-style keyword lists is insufficient for search optimization. Covers entity-based cannibalization, full-page topic analysis, qualifiers and stop words, micro-intent, and typical intent keywords, and provides best practices for avoiding overlapping content while maintaining topical authority.

Part 2 – Long-Tail Keyword and Fan-Out Query Mapping: Mapping topics and entities can help manage keyword cannibalization. Query research also informs topical structure, using hubs and spokes to build comprehensive coverage and establish topical authority.

Visibility is not a zero-sum ranking game. Experienced practitioners traditionally used long-tail keyword strategies to capture more search opportunities than those targeting a single high-volume term. Today, search engines extend this approach through fan-out query techniques, which identify entities demonstrating experience, expertise, and topical depth. While these signals are not direct ranking factors, they expand visibility by increasing a site's presence across a broader set of related queries.

Early long-tail theories suggested Google relied on Latent Semantic Indexing (LSI) to improve relevance. Google later clarified that it uses vector-based semantic models—such as Word2Vec, BERT, and MUM—to interpret relationships between words, entities, and concepts. This evolution supports fan-out query analysis and RAG-style approaches, enhancing entity integration and strengthening Answer Engine Optimization (AEO) by helping systems identify authoritative answers (E-E-A-T).

A strong fan-out strategy is increasingly critical as organic results continue to move lower on search results pages. Even a first-place ranking can receive zero clicks if the link is pushed below the fold by SERP features, AI overviews, or answer boxes. Broader query coverage mitigates this impact by increasing visibility across more entry points.

Relationship Between Keyword Research and Marketing

Keyword research and marketing research are often seen as separate, but SEM shows they are different tasks within a single integrated system. Both disciplines support the same market position and value propositions, keeping strategy aligned. When they operate in isolation, important opportunities are missed. PPC query data exposes real demand, while entity research reveals unmet needs and long-tail or AI fan-out opportunities. Together, they surface emerging gain-of-knowledge topics that traditional marketing research may overlook. This makes a unified search-and-marketing approach essential for capturing demand and improving visibility.

Search is Query-Based

Although modern organic optimization emphasizes entities—the “things” behind the words—the search results are still triggered by queries, and every query carries user intent. Search behavior remains the initiating signal; entities and their relationships merely refine how that intent is interpreted and matched to content.

Each search results page (SERP) presents a mix of features—such as ads, maps, shopping units, videos, and AI-generated summaries—distributed across different visual layers or channels. In many cases, organic listings now appear below the fold, meaning they may receive few or even zero clicks.

When organic results are pushed down the page, strategy must adapt. One effective approach is to position the target query as a category or hub page that performs well for navigation intent, while supporting content focuses on query fan-out—the related searches and subtopics surfaced by AI Overviews or semantic expansion. This alignment helps maintain visibility across both human and AI-driven query interpretations.

Finding Long-Tail Keywords

PPC advertising platforms provide keyword suggestion tools that can be used at no cost. PPC data is generally detailed, and the suggested terms are typically relevant. Within the search bar itself, autocomplete predictions reveal common long-tail keywords. Many results pages also include a “People also search for” section—or similar intent-driven features—that surface additional long-tail queries.

From PPC data, search volumes can be identified or extracted, offering insight into the total market size for a topic as well as the market size of economic alternatives and competing industries targeting the same customers. This marketing intelligence is valuable well beyond Search and highlights the benefit of close collaboration between digital marketing and broader marketing efforts, including identifying underserved or unmet market segments.

Competitor Analysis: While paid search tools may not reveal their data sources, analyzing the keywords used by competitors can uncover relevant keywords, topics, and even strategies for attracting customers from alternative industries. This analysis can also reveal collaboration opportunities with synergistic industries. For example, hotels naturally align with the tourism industry, creating cross-marketing opportunities.

Finding Fan-Out Queries

Fan-out queries represent the range of related entities that indicate a page’s depth, expertise, and overall informational value. They are leveraged by AI answer engines and appear in AI-generated overviews at the start of many search results. The methodology is elegant in both its simplicity and sophistication. Pages that include this additional information are more likely to be cited and favored in rankings.

For example, consider a plumbing business listing its services. A page about plumbing in general may not include this specific service data. Fan-out queries allow search engines to distinguish the plumber’s page from a general plumbing page, ensuring that searchers seeking a plumber are directed to the most relevant page.

For marketing intelligence, fan-out queries represent essential information. AI interprets this intelligence through proxy signals, such as confidence derived from search interest patterns.

While fan-out queries are not fully exposed and can be volatile, resources exist to help identify them. Beyond industry experts, tools such as trending topic trackers can reveal fan-out queries. Additionally, AI models like ChatGPT can provide insights when prompted, and competitor analysis can uncover further fan-out queries.

Honorable Mention: LSI Keywords

We know Google does not use LSI keywords: Google has explicitly stated this. Modern search systems rely on vector-based semantic models such as Word2Vec, BERT, and MUM to understand relationships between words, entities, and concepts, and Google holds patents on these technologies.

LSI search engines provide an impressive degree of relevancy, but they depend on a human-maintained database of related terms. In essence, LSI operates like an older form of fan-out analysis: if a page is relevant to “X,” it often also mentions “Y.” The “Y” term may be a synonym for the original keyword or ... following writing practices such as those used by the BBC—an alternate phrase referring to the same subject to avoid repetition.

In private document-retrieval systems, maintaining an LSI database can be practical. However, when considering the vastness of human knowledge and the scale of the public web, an LSI database becomes impractical; even if it were feasible, it would struggle to incorporate fresh information quickly.

In modern search results, the closest conceptual equivalent to LSI-style related terms is the set of bolded semantic keywords that appear within snippets. AI systems typically surface semantically relevant terms, and some optimization tools attempt to collect these related terms.

AI Query Modifications

Observing the AI-Based Search Query Transformation examines how search engines increasingly reinterpret and correct user queries toward canonical entities. As a result, traditional keyword matching continues to lose precision-based value. Understanding these AI-driven behaviors shows that effective keyword research now depends less on exact phrasing and more on identifying the entities, context, and intent behind searches.

In practice, this means optimizing for the full semantic breadth of a topic—creating interconnected content ecosystems that align with how AI systems recognize and relate meaning, rather than relying solely on isolated keyword targets.

Salient Terms

Salient terms: also known as Query-Based Salient Terms (QBST), as described in Google patents—represent the words and phrases an AI system identifies as central to a page’s meaning. Unlike traditional keyword lists, salient terms emerge from semantic models that evaluate context, entities, and relationships across the content. They are conceptually similar to LSI-style related terms but are far more accurate and adaptive because they are derived from modern, vector-based semantic models rather than static, human-maintained lists.

Although generative AI systems frequently surface salient terms, the highest-quality signals still come from subject-matter expertise and well-structured, authoritative content.

Answer-based AI systems often look for pages with strong salient-term coverage when selecting sources for citations.

The brand as a canonical entity – a critical keyword

When people search for a brand name or seek information about its products—a core market research subject—they generate signals around the brand’s keywords. This engagement helps search engines recognize these keywords, enhancing their visibility, and algorithms like Navboost interpret this user behavior.

The primary entity data for a brand consists of the products, services, or solutions it offers—these are the main topical keywords for the site. Additionally, NAP (name, address, and phone number) is foundational for establishing the brand as an entity and supporting visibility. Customer perceptions and reviews are an important factor in search visibility, but are not formally part of the brand’s entity graph or the site’s keyword graph.

Terms like “best” are not entities. When search engines process a query, they identify the entity within it and rank pages based on how well they are optimized for that entity. For example, searches for “good attorney in Los Angeles” or “best attorney in Los Angeles” typically return similar results. However, if visitors are likely to use modifiers like “best,” those words should be incorporated in titles and content to align with user intent and improve clarity.

In short, the products, services, or solutions your brand provides define the entities and keywords that should be clearly presented through site navigation and content. Supplemental words like “best” are useful only when they enhance clarity or meet searcher expectations.

Search intent keywords

In the broadest sense, search intent can be classified as informational, navigational, commercial, and transactional—a high-level framework for understanding user goals. However, matching content or keywords to search intent is often deeper and more nuanced, requiring consideration of context, phrasing, and the entity or topic the user is seeking.

For example, if the goal is to create a bottom-of-the-funnel transactional canonical entity for a product, all of the product’s features become critical keywords. By contrast, if the search query is a question about the product, then keywords related to point-of-view, FAQs, and user reviews become critical for capturing informational or consideration intent.

Individual user intent:

While often overlooked, user-group or audience-level intent is important for keyword research. Different audiences—such as investors, medical professionals, or developers—use their own domain-specific lexicons or keywords (for example, “black swan” or “grey rhino” among investors). These specialized vocabularies can be detected by LLM systems and may influence visibility in future algorithm updates.

Currently, individual or audience intent is not recognized as a direct ranking factor. What can be observed, however, is that AI overviews often guide users toward more specific or unambiguous searches that line up with different audiences. These systems tend to favor pages that use precise, experience-based language, effectively using linguistic specificity as a proxy for expertise or firsthand understanding. The lexicons used by experts become critical words to gain visibility.

This doesn’t mean a bricklayer must personally write the content—but the writer should incorporate the bricklayer’s knowledge, perhaps by interviewing them or using their terminology or technical wording directly. The goal is to reflect genuine subject-matter insight in the language itself, signaling depth and credibility to both users and AI systems.

AI Overviews as a keyword research tool

AI Overviews guide users toward more specific and unambiguous search terms. This refinement tends to favor sites and pages that demonstrate strong topical expertise and linguistic precision. The process operates through pattern- and rule-based matching—content that mirrors the vocabulary and phrasing used by recognized experts in a field is more likely to surface within LLM-driven results.

These linguistic refinements depend on the clarity and quality of content—not on superficial signals like author photos, résumés, or backlinks. While backlinks act as proxies for trust and authority, they only support topical relevance; they do not create it.

Here’s the exciting part:

AI Overviews reveal what large language models have already inferred about how topics, terms, and expertise relate. They act as a window into how AI systems interpret semantic precision and topical authority. In practice, using AI Overviews as a research tool allows you to observe which phrasing, terminology, and contextual relationships Google considers most aligned with user understanding of a topic.

Even when users ask simple or factual questions and never click through, those impressions still strengthen brand awareness. Appearing in AI Overviews functions as zero-click exposure—similar to non-converting visits—that reinforces a brand’s presence and credibility.

Limitations:

AI Overviews reflect the current ... today's not tomorrow's... state of an LLM’s knowledge and the relationships between entities and related terms. By examining how these entities connect, it is possible to identify gaps where additional content or context can extend the AI’s understanding—highlighting opportunities that go beyond what the model currently captures.

Knowledge, however, is dynamic—like rankings and search patterns, it evolves continually. The objective isn’t just to mirror what AI already knows, but to expand upon it with original insight and real-world experience—in short, to create gain of knowledge.

Long tail keywords or related entities

Incorporating long-tail keywords or related entities on a page demonstrates subject depth and contextual understanding—often providing unique or original insights that set the content apart from competing sources.

AI-driven search systems analyze these relationships through query “fan-out,” expanding a single question into semantically related concepts. Pages that effectively address these related entities are considered more comprehensive and therefore more relevant.

This broader coverage also reinforces perceived experience and expertise. While E-E-A-T remains a subjective framework, search systems use measurable proxies—such as topical completeness, entity relationships, and consistency across sources—to estimate whether content likely reflects genuine expertise.

What is a canonical entity

Beyond the technical meaning of a canonical URL, a “canonical entity” is the primary or most authoritative version of an entity within semantic search—the one that other related entities derive from or connect back to.

The term “canonical” appears across disciplines: in religion, it denotes the official or accepted texts; in biology, it describes the most complete or representative form of a protein from which variants are derived. Similarly, in search, a canonical entity represents the definitive version of a topic or object within the knowledge graph.

For AI-aware search, the objective is to create or establish the canonical entity—ensuring your content is recognized as the authoritative representation of that topic or entity. Long-tail keywords and related entities are essential in this process, helping search systems understand the breadth, context, and depth of the canonical entity.