Published:
Updated:
by Wayne Smith
Keyword research identifies the search terms that connect your content with the right audience—focusing on relevance and intent rather than sheer search volume. In practice, it bridges marketing objectives with search visibility by aligning what customers seek with what a brand offers.
Early SEO observations suggested that Google was adopting Latent Semantic Indexing (LSI) to improve result relevance. Google later clarified that it instead relies on vector-based semantic models—such as Word2Vec, BERT, and MUM—to interpret relationships between words and concepts.
This shift from strings to things reflects how search engines now identify pages that best represent topics or entities rather than simply matching keywords. Modern entity-based SEO applies this understanding by mapping relationships between entities—the “things” that define topical relevance—and using related entities as long-tail keywords or supporting topics within a hub-and-spoke structure, where the main “hub” page anchors the core entity and connected “spokes” explore its subtopics in depth.
AI Query Modifications
Observing the AI-Based Search Query Transformation examines how search engines increasingly reinterpret and correct user queries toward canonical entities. As a result, traditional keyword matching continues to lose precision-based value. Understanding these AI-driven behaviors shows that effective keyword research now depends less on exact phrasing and more on identifying the entities, context, and intent behind searches.
In practice, this means optimizing for the full semantic breadth of a topic—creating interconnected content ecosystems that align with how AI systems recognize and relate meaning, rather than relying solely on isolated keyword targets.
How marketing research overlaps with search engine optimization
The core areas where marketing research and entity SEO overlap are:
- Understanding Customer Needs and Behavior
- What customers want, how they think, what motivates purchases.
- How they perceive brands and products.
- Identifying Market Opportunities and Gaps
- Where unmet needs or underserved segments exist.
- What problems current solutions fail to solve.
- Evaluating Market Potential and Demand
- How large a market is, how fast it’s growing, and who the key players are.
- Assessing Marketing Performance
- How effective campaigns, channels, or messages are.
- Measuring brand awareness, satisfaction, and loyalty.
- Informing Product Development and Positioning
- Testing concepts, pricing, and features.
- Determining which positioning or messaging resonates most.
Search data provides the most direct, unfiltered view of customer intent, demand, and language for marketing research. Keyword or entity research mirrors these focus areas while seeking low-hanging opportunities that correlate with unmet needs or underserved segments. It fills in topics or subjects by addressing problems that current solutions fail to solve, identifies and provides content aligned with user intent—reflecting what customers want, how they think, and what motivates their purchases—and supports brand marketing by clarifying how users perceive brands and products.
The brand as a canonical entity – a critical keyword
When people search for a brand name or seek information about its products—a core market research subject—they generate signals around the brand’s keywords. This engagement helps search engines recognize these keywords, enhancing their visibility, and algorithms like Navboost interpret this user behavior.
The primary entity data for a brand consists of the products, services, or solutions it offers—these are the main topical keywords for the site. Additionally, NAP (name, address, and phone number) is foundational for establishing the brand as an entity and supporting visibility. Customer perceptions and reviews are an important factor in search visibility, but are not formally part of the brand’s entity graph or the site’s keyword graph.
Terms like “best” are not entities. When search engines process a query, they identify the entity within it and rank pages based on how well they are optimized for that entity. For example, searches for “good attorney in Los Angeles” or “best attorney in Los Angeles” typically return similar results. However, if visitors are likely to use modifiers like “best,” those words should be incorporated in titles and content to align with user intent and improve clarity.
In short, the products, services, or solutions your brand provides define the entities and keywords that should be clearly presented through site navigation and content. Supplemental words like “best” are useful only when they enhance clarity or meet searcher expectations.
Search intent keywords
In the broadest sense, search intent can be classified as informational, navigational, commercial, and transactional—a high-level framework for understanding user goals. However, matching content or keywords to search intent is often deeper and more nuanced, requiring consideration of context, phrasing, and the entity or topic the user is seeking.
For example, if the goal is to create a bottom-of-the-funnel transactional canonical entity for a product, all of the product’s features become critical keywords. By contrast, if the search query is a question about the product, then keywords related to point-of-view, FAQs, and user reviews become critical for capturing informational or consideration intent.
Individual user intent:
While often overlooked, user-group or audience-level intent is important for keyword research. Different audiences—such as investors, medical professionals, or developers—use their own domain-specific lexicons or keywords (for example, “black swan” or “grey rhino” among investors). These specialized vocabularies can be detected by LLM systems and may influence visibility in future algorithm updates.
Currently, individual or audience intent is not recognized as a direct ranking factor. What can be observed, however, is that AI overviews often guide users toward more specific or unambiguous searches that line up with different audiences. These systems tend to favor pages that use precise, experience-based language, effectively using linguistic specificity as a proxy for expertise or firsthand understanding. The lexicons used by experts become critical words to gain visibility.
This doesn’t mean a bricklayer must personally write the content—but the writer should incorporate the bricklayer’s knowledge, perhaps by interviewing them or using their terminology or technical wording directly. The goal is to reflect genuine subject-matter insight in the language itself, signaling depth and credibility to both users and AI systems.
AI Overviews as a keyword research tool
AI Overviews guide users toward more specific and unambiguous search terms. This refinement tends to favor sites and pages that demonstrate strong topical expertise and linguistic precision. The process operates through pattern- and rule-based matching—content that mirrors the vocabulary and phrasing used by recognized experts in a field is more likely to surface within LLM-driven results.
These linguistic refinements depend on the clarity and quality of content—not on superficial signals like author photos, résumés, or backlinks. While backlinks act as proxies for trust and authority, they only support topical relevance; they do not create it.
Here’s the exciting part:
AI Overviews reveal what large language models have already inferred about how topics, terms, and expertise relate. They act as a window into how AI systems interpret semantic precision and topical authority. In practice, using AI Overviews as a research tool allows you to observe which phrasing, terminology, and contextual relationships Google considers most aligned with user understanding of a topic.
Even when users ask simple or factual questions and never click through, those impressions still strengthen brand awareness. Appearing in AI Overviews functions as zero-click exposure—similar to non-converting visits—that reinforces a brand’s presence and credibility.
Limitations:
AI Overviews reflect the current ... today's not tomorrow's... state of an LLM’s knowledge and the relationships between entities and related terms. By examining how these entities connect, it is possible to identify gaps where additional content or context can extend the AI’s understanding—highlighting opportunities that go beyond what the model currently captures.
Knowledge, however, is dynamic—like rankings and search patterns, it evolves continually. The objective isn’t just to mirror what AI already knows, but to expand upon it with original insight and real-world experience—in short, to create gain of knowledge.
Keyword/entity cannibalization
Keyword or entity cannibalization occurs when multiple pages on a site target—or appear for—the same search query. When this overlap happens, search engines must determine which page offers the most relevant or authoritative response, often causing ranking fluctuations or reduced visibility for both.
For example, for a query like “best blue widget,” Google may test different result types by showing both informational and transactional pages. Because search behavior and context evolve, rankings for such blended queries can shift seasonally or as algorithms refine their understanding of the topic.
To prevent this, map keywords and entities to distinct pages with clear topical boundaries. When multiple pages compete for closely related queries, search engines try to identify a single, most representative version. If signals are divided between pages, both may lose visibility and authority—especially when each performs better for a slightly different variation of the same topic.
Practical Solutions to Cannibalization
One approach is to consolidate competing pages into a single, comprehensive resource—especially if the content serves overlapping purposes. However, merging pages can sometimes blur focus or weaken the clarity of the query match. In those cases, a different strategy is better.
A balanced alternative is a hub-and-spoke structure, where a primary “hub” page provides the main overview or transactional focus, while supporting pages (the “spokes”) explore related entities or features in greater depth and link back to the hub. This structure clarifies topical hierarchy and reduces internal competition.
Consider a multi-channel strategy of publishing informational or exploratory content a secondary platform, such as YouTube or community sites. This can help distribute overlapping topics without diluting the main site’s visibility.
The canonical loophole
Cannibalization may not be an issue when the query targets a site or brand name. In these cases, multiple pages on the site may include the same keyword, and this can also occur within a category or hub where several pages share a common keyword. Relying solely on search engines for keyword research can be misleading, as canonical pages often appear to have a “free pass.” The key distinction is that these queries reflect navigational intent rather than informational or transactional intent.
It’s also incorrect to judge cannibalization by comparing content on other sites. Canonical entities are dynamic: as new information emerges, the page considered most authoritative for a topic can change. In short, the loophole works, until is doesn't. Large sites like Amazon or news sites can be seen as canonical within search and appear to get a "free pass," It is not because these sites are treated differently; it is because they often represent the canonical for the information.
By carefully identifying long-tail keywords and related entities, it is possible to structure content around specific entities, often eliminating the risk of perceived cannibalization entirely.
Long tail keywords or related entities
Incorporating long-tail keywords or related entities on a page demonstrates subject depth and contextual understanding—often providing unique or original insights that set the content apart from competing sources.
AI-driven search systems analyze these relationships through query “fan-out,” expanding a single question into semantically related concepts. Pages that effectively address these related entities are considered more comprehensive and therefore more relevant.
This broader coverage also reinforces perceived experience and expertise. While E-E-A-T remains a subjective framework, search systems use measurable proxies—such as topical completeness, entity relationships, and consistency across sources—to estimate whether content likely reflects genuine expertise.
What is a canonical entity
Beyond the technical meaning of a canonical URL, a “canonical entity” is the primary or most authoritative version of an entity within semantic search—the one that other related entities derive from or connect back to.
The term “canonical” appears across disciplines: in religion, it denotes the official or accepted texts; in biology, it describes the most complete or representative form of a protein from which variants are derived. Similarly, in SEO, a canonical entity represents the definitive version of a topic or object within the knowledge graph.
For AI-aware SEO, the objective is to create or establish the canonical entity—ensuring your content is recognized as the authoritative representation of that topic or entity. Long-tail keywords and related entities are essential in this process, helping search systems understand the breadth, context, and depth of the canonical entity.