What does “semantic core” actually mean? Many ASO specialists habitually call a simple list of keywords a semantic core. They collect such lists based on query frequency, search suggestions and competitor analysis, then add those words directly to an app’s metadata. The assumption is that the more popular words you mention exactly, the better your app will rank. But is that really the case?

Summary of the article

In this article I show why the familiar “semantic cores” in ASO are often just lists of popular keys and do not reflect the real semantic structure of queries. This approach grew out of lexical search logic, where we look for literal word matches. Practice and changes in search architecture show that this is no longer enough.

Analysis of hundreds of iterations from 2020–2025 shows that text edits increasingly expand the set of indexed queries in positions 21–50 and 51–100 rather than giving stable growth at the top. Even more advanced cores remain “lexical”: they do not show meaning duplicates, do not separate different usage scenarios and do not give an objective picture. Decisions on priorities often remain subjective.

I then explain the industry’s move to meaning‑based search and hybrid models, where a lexical layer selects candidates and semantic and behavioural signals refine ranking. In this context, simply stuffing metadata with synonyms and loosely relevant keys can reduce positions because the description vector is blurred across topics. It is better to keep meaning groups clean and develop them coherently.

Finally, I give a practical method for constructing a truly semantic core in vector space: collect a wide pool of queries, obtain embeddings, build a proximity graph, cluster, prioritise clusters by potential, select formulations for intents and visualise coverage. Such a core becomes not only an ASO tool but also a source of product insights.

What the analysis of hundreds of iterations in 2020–2025 showed

The traditional ASO approach focused on exact word matching. In essence, a semantic core for us became a set of separate search queries that we distribute by frequency and add to metadata. After 2–4 weeks we analyse the effect of this metadata on positions and do the next iteration. And so on ad infinitum.

Recently I started to build predictive analytics based on machine learning to estimate in advance what increase in visibility and conversion particular text iterations would give. I used hundreds of iteration reports from our own projects, clients and industry colleagues over 2020–2025: what was in metadata, what was changed, what result was obtained, whether motivated traffic, advertising and other factors were used. This is almost a perfect data set for analysis.

By 2025 it became clear that textual iterations no longer work as well as in 2020 and primarily increase the number of indexed queries in positions 21–50 and 51–100. Shifts in the top buckets (top 1 and top 2–5) by 1–3 positions happen much less often and depend not only on text. The main effect of such iterations is expanding reach rather than rapid growth in installs. Meanwhile 70–90 % of organic traffic usually come from 5–15 keys. Losing even one strong query noticeably drops installs.

Most importantly, all these endless dances with keys do not perform the function expected of them. The problem lies directly in the semantic cores.

What is wrong with semantic cores

Here is a typical semantic core of 2021: it is essentially a three‑column table listing keys, approximate traffic and conditional promotion difficulty. You cannot judge relevance and priority for each key based only on traffic and difficulty. I do not know how it was used, but this is a real document someone sent me.

A more recent 2024 core distributes keys by group and adds extra metrics. There is minimal segmentation of queries by topic and the ability to assess not only raw traffic but also potential install capacity of a cluster. But we still do not see which queries actually duplicate each other, which describe different scenarios, which pull branded traffic and which do not. Any decision about “which key to keep, which to drop, which to strengthen in metadata” remains a matter of subjective expertise rather than an analysis of the semantic structure of the query space.

Early 2025 examples show a specialist breaking phrases into tokens and combining them manually to fill all 100 characters of the keywords field. Fundamentally the approach remains the same: all decisions are made at the level of individual words and their combinations. Tokens are selected by frequency and common sense, not based on real semantic proximity in the overall space. We still work with lexicon, not a vector model.

Refreshing how lexical and semantic search work

As comments on my previous article suggested that not everyone understood what I wrote, this time I will lay it out simply with visual examples.

Lexical search

Lexical search is a mechanism where the system looks for exact word matches in metadata. In other words, if a query word is not mentioned, your app will not appear in results. During indexing the algorithm selects candidates only by presence of words. Historically this practice led to collecting a semantic core of popular queries and inserting them into text to cover as many combinations as possible.

Move to search by meaning

Then came 2025, when both Apple and Google openly stated they were moving to search by meaning (natural language search). In September Google Play introduced Guided Search — search by goal or idea rather than just by app name or keywords. Earlier, in May, Apple announced that the App Store had learned to understand everyday language.

This is called semantic search. Instead of looking for exact words, the algorithm transforms the query and app texts into semantic representations (vectors) and compares them. With semantic search the system can show a result even if there is no exact word match, as long as the description is close in meaning to the user’s request.

In practice, modern stores use hybrid models: candidates are first selected lexically (by words), and their ranking is then refined using semantic factors and additional signals. Another scheme runs two searches in parallel—one on keywords, the other on vectors—and then merges and mixes the result lists.

I implemented similar logic to select the most relevant keys. First comes the lexical layer: I remove overt rubbish, fragments and non‑relevant queries. This leaves a candidate set of valid keys worth working with. Then the semantic layer kicks in. For the product and competitor descriptions I compute embeddings, measure proximity, add demand signals—download indices, positions and whether competitors rank. These are combined into a single mask: semantically strong keys pass immediately; keys of medium relevance are kept if there is real demand and visibility data; keys without meaning and without signals are discarded. As a result the final core is the intersection of queries that are close in meaning to the product and queries confirmed by user behaviour.

Why I claim Apple and Google use semantic methods

A sceptical reader might object: “We don’t know Apple’s and Google’s algorithms—aren’t these just guesses?” The objection is convenient but rests on an understanding of search from a decade ago and ignores publicly announced changes.

First, Apple has publicly stated that natural language search is used in the App Store: in iOS 18.1 the company explicitly promotes the “search the way you talk” format, where users enter phrases like “apps to help me relax before bed”. In news releases, natural language search and App Store review summaries are described as part of the platform’s intelligent features.

Second, back in 2019 Google publicly confirmed that it uses BERT in web search to better understand queries and improve ranking. Neural components have been applied in key search products for some time. It would be strange to expect that the store uses only a simple lexical scheme without additional NLP signals. We don’t know the exact details, but the practical logic is that modern search almost always combines lexical coincidence and semantic correspondence, and quality depends on which signals dominate.

Third, it is important to understand that today’s NLP models are not unique to each store. Processing natural language in search systems follows a standard architectural logic. The difference between platforms is not whether they use meaning, but how thresholds, weights and priorities are tuned between lexicon, semantics and behaviour.

Let’s structurally examine how this logic works in ideal architecture. The store indexes all app texts, splitting metadata into tokens. Each word corresponds to a list of apps where it appears. Preweighting occurs—where the word appears (title vs description), how often, saturation thresholds, etc. This forms a base of candidates and base lexical weights. When a user enters a query, the system determines its language and performs NLP processing: tokenisation and lemmatisation. Then the query words are passed through a machine learning model to produce numeric vectors encoding meaning. Each app description can be represented by a vector; the system compares the query vector with app vectors to find the closest.

I previously showed that I obtained a semantic core cleaned of rubbish. The next step is to determine the relevance of each query to our app. Here we use SentenceTransformer. By computing cosine similarity between the app’s description and each query or competitor we can determine semantic relevance.

In a hybrid model lexical and semantic candidate lists are combined. The simplest way is to combine and sort all candidates by a unified score considering textual coincidence and semantic similarity. A more complex way is to take an intersection: require that a candidate contain at least one word from the query and then let semantics rerank. Apple likely uses partial merging: unique identifiers (brand names, app names) should not be lost, so exact matches of important words are selected and results from semantic search are added for variety. They are then ranked together. Thus, keywords remain important for indexing, but final order increasingly depends on meaning and behavioural factors.

Behavioural signals and quality come next. Modern algorithms use hundreds of parameters. If an app has high retention or good ratings, that is a quality signal and may get a boost. If the app crashes often, it will be lowered. The system also tracks search behaviour: clicks, installs, quick deletions, the share of users who open the app after searching and remain active. Two apps with equal semantic relevance can rank differently: the one with better metrics will be higher.

After this there may be personalisation, extra boosting and corrections. An interesting case: In 2024 a well‑known brand released a new app. An indie developer came to us with an app almost identical to the brand’s app. The branded app quickly ended up in top buckets without obvious ASO optimisation, whereas the no‑name remained outside the top 20 on the same queries. Why? The brand and developer account act as strong quality signals. The store already has a history of other products from this developer, giving a ready probability that the new release is not garbage. The starting score is higher even before data accumulates. Branded and near‑branded queries quickly form a behavioural profile: familiar users search for the brand, click the new app, install and do not delete immediately. This is a strong signal. Conversely, a no‑name app has no brand anchor or history, so the hybrid model gives it only limited traffic at deeper positions until enough behavioural data accumulates.

Why not ideal models?

Stores introduce NLP and semantics out of necessity. In search, delays and scalability are critical, so optimised solutions are used rather than “ideal” huge models. Apple’s NLP announcement coincided with a renaissance in the App Store: according to Appfigures, 2025 saw 24 % more apps released than the previous year. This surge makes implementing NLP more complex while maintaining ranking quality. But the strategic vector is clear: as models become more efficient and user behaviour data accumulates, NLP will be used more fully and become the standard rather than an experiment.

How to build a semantic core in vector space

Now let’s break down the process so you understand what to do:

Gather initial data (search queries). Collect the widest possible list of queries relevant to your topic. These can be suggestions from the App Store and Google Play, popular queries from various sources, competitor queries, phrases from user reviews and more. Include diverse formulations: short (1–2 words) and longer phrases expressing questions or goals. The list should be large—tens of thousands of queries if possible. The broader the coverage, the better for space analysis.
Obtain embeddings. Run all phrases through an embedding model (for example, SentenceTransformer) to produce vectors—one per phrase. In this space, queries close in meaning will have similar coordinates. Normalise the vectors so that cosine similarity is simply the dot product.
Build a proximity graph. Compute, for example, the 20–50 nearest neighbours (by cosine similarity) for each embedding and connect them as linked nodes. Methods like HNSW can speed up neighbour search even with tens of thousands of points. The result shows which queries gravitate toward each other; thematic groups form dense components or clusters.
Cluster the queries. Use clustering methods—DBSCAN, graph community detection (Louvain or Chinese Whispers), or K‑means—to group semantically similar queries together. A good approach will reveal unexpected synonyms and variants you might not have considered.
Prioritise clusters (assess potential). After clustering you may have 100–200 clusters depending on core size and topic. Not all are equally important. Classic metrics like frequency and competitiveness are useful—estimate the total search traffic behind each cluster. Set priorities: which themes are most valuable and promising. This will help you focus on them during optimisation.
Select key phrases and optimise for clusters. For each cluster decide how your app should be represented to be relevant. Optimise text not for a single word but for the entire group, using a formulation that covers the common intent.
Visualise and check coverage. Visualise your clusters and current app position with a map (the type that accompanies this article). You can use a 2D projection (UMAP or T‑SNE) of all queries, colour clusters and mark which you already cover with text. Such a map shows at a glance whether you have gaps. Visualisation makes the semantic core clear: you see the topic and volume of queries. Without visualisation you will get a coordinate dump that means little.

This new process can be combined with product analytics: you can match clusters with segments of your target audience or different use cases within the app. Thus the semantic core becomes not just a list of words but a product planning tool: you see what users are interested in, how they express problems and adapt the product or its positioning accordingly.

Final remarks

The current architecture of the App Store and Google Play does not yet look completely stable and mature. In repeated monthly checks the quality of search sometimes approaches the expected model, then noticeably rolls back. This looks like an ongoing series of experiments: the search team tests hypotheses, adjusts parameters, implements improvements and rolls back changes that worsen results. During rollbacks we again see familiar ranking patterns.

Meanwhile Apple’s introduction of NLP coincided with a renaissance in the store: AppTractor’s Mobile Development channel writes that a renaissance based on the possibility of success is happening in the App Store. The 2025 surge is not a fleeting fashion but the result of various factors making app development profitable for a new class of entrepreneurs. This further raises the complexity of implementing NLP and the requirements for ranking quality. But as models are optimised and computing power grows, natural language search will become the standard quality bar, not an experimental add‑on.

In conclusion, search systems are not frozen at BM25. Those who rebuild their approach to semantic cores now—begin to think in terms of user intents—will gain an advantage. A semantic core must be truly semantic regardless of whether lexicon or semantics currently dominate. It should reflect the variety of ways users can describe their goals while remaining focused on those goals. In the era of dense retrieval and semantic similarity, it is not the clever hacker of search who wins, but the one who can make the app understandable to machines and valuable to people.

Why Your Semantic Cores Are Not Truly Semantic

Table of Contents

Summary of the article

What the analysis of hundreds of iterations in 2020–2025 showed

What is wrong with semantic cores

Refreshing how lexical and semantic search work

Lexical search

Move to search by meaning

Achieve growth with our expertise and unique tools

Why I claim Apple and Google use semantic methods

Why not ideal models?

How to build a semantic core in vector space

Final remarks

Written by

Also read:

ASO & ASA: A guide to smarter synergy

A Guide to ASO Services: From Audit to Ongoing Support

User Retention: How to Stand Out From the Competition