Ecommerce Keyword Research: Large scale research in a small scale world

On Friday 22, 2016, Alec talked at BrightonSEO about huge-scale keyword research and overcoming some of the problems that large retailers face. Below are the session notes.

Why and how do we research keywords?

We undertake keyword research to better understand how and what our customers look for when searching for our products. At its core, this information is used as the foundation of SEO strategy. Once we know the keywords our customers use, we can:

  • Define our website architecture
  • Set the content strategy so that we can ensure we have the content our customers are looking for
  • Start to develop link acquisition strategies in line with what our audience is interested in

With incomplete or incorrect keyword research, there will be a trickle-down to our whole SEO strategy, and it will eventually be less effective than it would have otherwise been.

Keyword research varies depending on who’s process you’re following, but the core elements are:

  • Finding the phrases your competitors are targeting
  • Completing consumer research to discover the reasons and problems behind why they buy your product
  • Using keyword discovery tools such as UberSuggest, Grepwords, and Keyword Planner to find topics and keywords that your customers may be searching

A solid keyword research process will take several hours to complete per topic or category – the goal is to get a complete view of what customers may search, so this is not something that should be rushed.

But how can large retailers (for example Maplin, House of Fraser, or Argos) – business with hundreds of top line categories and thousands of product types – undertake comprehensive keyword research? Using a traditional approach would require a team of people working for months to achieve this.

Our process for large-scale keyword research relies on Google’s Keyword Planner. The keyword planner has a lot of great data about what people search; Google just makes it difficult to access comprehensive data.

The keyword planner will only let you download 800 ideas at a time, so we must use the tool hundreds of times.

Start with one seed keyword, such as ‘dresses’ and download the 800 ideas.


Next take each of those 800 keywords and put them – one at a time – into Keyword Planner to download 800 more ideas per keyword.


At this stage, you should have around 640,000 keywords and would have used the Keyword Planner 801 times. Depending on your seed query and your industry, it may be worth putting each of those 640,000 keywords into the Keyword Planner (again, one-by-one) to identify extremely long tail terms. While the initial 800 terms could be done manually (I wouldn’t advise it), it would be impossible to undertake this without automation using the official Adwords API, the Bing Ad Insight Service, or our unofficial Keyword Planner scraper (the ‘Idea’ module is currently in closed beta – you will need to contact us for access).

Once you have a big list of relevant product keywords, you will need to identify the core product attributes. An attribute is something that a customer may care about – a colour, style, size, or other specification. To do this, run the keyword list through a tool which counts the number of times a word or phrase appears. You can use our Keyword Intelligence tool for this, download the source on Github, or use another specialty tool such as WriteWords.

2016-04-18_1105From here, you should take the most frequently occurring words and group them into attributes:

2016-04-18_1108Now that you have a fairly comprehensive view of the product attributes that your customers care about, it’s time to merge them into keywords. This is where all of the tools on the market (and the process you probably imagine fail, because…

Word ordering matters

When looking at keyword phrases, word ordering matters:

The keyword “long sleeve black lace dress” has a different search volume (480 in the UK) to the phrase “long sleeve lace black dress” (50 in the UK).

This means that at an initial look at a single word ordering, you may mistake a high-volume topic for a low volume one:

  • “Long Sleeve Black Lace Dress” – 1,680 searches/mo across 23 phrases, but the highest volume is just 480
  • “Navy Midi Dress” – 1,480 searches/mo across 5 phrases, where the highest volume is 1,300

We generated 1,670,390 unique phrases about dresses to find the gap produced by different orders.

  • When looking at every possible word order, we found that 15,157 keyword phrases had an average of 3,047,520 monthly searches.
  • When looking at only the highest volume order for a discrete phrase, we found that 4,397 keyword phrases had an average of just 1,106,960 searches per month

By ignoring different word orders, we would be ignoring 66% of the available volume.

Keyword combining / merging

You shouldn’t try to come up with every possible order for a set of attributes – this is a job for a machine. Given a list of product attribute groups, we should first permute all groups into sets of three or four, and then combine all words within those groups into keyword phrases.



Merging keywords into every possible order

Because you shouldn’t attempt to do this manually, and there are no existing tools to undertake this process, we have developed a solution to merge product attributes into every possible order.

  • We’ve added Merge into our (beta) tool Keyword Intelligence
  • Or you can integrate this into your own toolset – the source code to achieve this will be released in the coming days (contact us if you’d like early access to the repository)

Getting search volumes for millions of keywords

There are a few options for getting search volumes for millions of keywords:

  • Official AdWords API: This isn’t really an option unless your business has access to an API key already (i.e. for a bid management or reporting tool). Google does not give out API keys easily, and especially not for the main purpose of keyword research.
  • Bing Ad Insight API: Bing has a great API – they’ll give you volumes as well as the demographics, categorisation and a few other great pieces of information. The main problem is that (depending on your market), bing only has 5 to 15% of the search market share; and because the search volume doesn’t necessarily correlate with Google’s, you can’t assume much about the wider market with this data.
  • Tools like GrepWords or SEMrush: These tools work by building and maintaining a (relatively small) list of keywords (for instance, SEMrush has just 12 million keywords in the UK). If these tools have the keyword’s you’re asking for, they will give you that data. However because you are taking long tail keywords and reordering the words within them, it is unlikely that mainstream tools will have this data (and even more unlikely if you’re outside of the US/UK).

Because of these problems, we were forced to build our own solution to get live, accurate search volume data for millions of keywords from the Keyword Planner. We recently opened our tool up to the public to allow anyone to upload lists of keywords (up to one million keywords per list), then download search volumes for any language or country. Use our search volume tool Keyword Intelligence (or its API) to get access to this live Keyword Planner data


As keyword research forms a critical part of your SEO (or PPC) strategy, it’s important that the data is complete and accurate. Mainstream tools give incomplete data and ideas, as does using a traditional keyword merging process.

When doing keyword research for products and large retail websites, it’s important to work with product attributes rather than raw keywords. You can do this by following a simple process:

  • Generate large lists of relevant keywords (or start with your existing keyword lists)
  • Tokenise the list of keywords into attributes and group those attributes together
  • Merge the attributes together to form every possible combination and ordering of words within each keyword phrase
  • Get the search volume for each of the newly generated keywords to form your keyword research, as many of the reordered keywords are likely to have no volume