Information Retrieval and Ranking

The overall aim of the ranking process is to return the best set of results for the user based on their underlying intent. This means that search engines try to answer the problem that the user is trying to solve rather than just returning a set of documents which are relevant to the query.

Implicit Signals – User Intent

Search engines use all the information available to determine the user’s true problem and the reason behind their search (known as their ‘intent’).

Conversational search and history – what has the user searched for before? If they have been looking at DVD player reviews and are now searching for a specific model of DVD player, it is likely that they are looking to make a purchase, so a list of ecommerce stores that have that model in stock is appropriate in this scenario. If they have been searching for information about a person (i.e. Barack Obama) and then make a generic query such as ‘What’s his age?’, the engine will attempt to return Barack Obama’s age.

Location and network – where the user is, what device they’re using and what network they are on are key signals as to their true intent. A user on a mobile phone who searches for ‘pizza’ may be looking for the closest pizza restaurant to them, whereas a user making the same query from a PC might be looking for a list of pizza delivery companies.

Explicit Signals – Query

The main signal that the user gives the search engine is their query. This is treated as ‘what the user thinks he wants’ rather than ‘what the user wants’ by search engines, which often use more data points to determine what the best results to return might be. Search engines will often extract a lot more meaning from a query than simply the words used, such as synonyms, semantic and related concepts.

Information Retrieval Process

Once the search engine has predicted the user’s underlying intent, its first task is to retrieve a pool of documents that are relevant to that intent. This process is very complicated and typically spans many different servers and indices. A basic way to do this is to retrieve a list of all documents that exist which contain the words or match the core concept of the query.

Ranking Process

In ranking, search engines use an algorithm that considers hundreds of factors to rank the pool of eligible documents. Note that not all signals are equal, and not all of these will be used by all search engines – for instance, we know that Google does not currently use the social graph in its ranking algroithm, but some (especially the search engines of social networks themselves) do.

Link graph – search engines maintain a graph of every link and citation between documents on the Internet. Trust, relevance, and authority flow through each link to give each document a link-based ‘equity’ score (for example, Google’s PageRank). Traditionally, these link-based scores have had a big influence on the order in which documents were ranked. Link equity scores are explored in later sections, but to give an example: a document about an astronaut which has no links is less likely to rank than one linked from BBC Science and NASA.

Social graph – if the user’s friends or influencers have shared or positively engaged with a specific document, or recommended a website in the past, that document is more likely to rank highly on that specific occasion (social signals not typically influence rankings universally – this mainly happens on a personalised basis).

Engagement graph – search engines are able to monitor the behaviour of users in many different ways: whether it’s monitoring how many users return the search results after clicking a result, getting data from toolbars and browsers, or embedding code directly into webpages (such as by offering analytics or social sharing tools). This data is then used to decide how useful each document is in relation to the query and thus assign an ‘engagement’ score.

Semantic graph – search engines use their semantic graph to determine related concepts which can be included in the results for ‘serendipitous’ discovery – pointing the user to related concepts or information that they did not originally consider, but are likely to assist them on their search journey.

Domain expertise – if the website has previously and consistently provided a positive user experience for a specific topic, new relevant documents from that domain may be more likely to rank higher.

Other factors – as explored earlier, during the index process, search engines make a number of assumptions about each document and assigns scores across a range of factors. During ranking, these factors are weighted and compared with every document in the subset to efficiently rank each one.

The overall aim of ranking is to return the best result for that user’s intent, this means that search engines are quickly becoming knowledge engines. For many queries, answers are displayed directly on the search result page.


Once a set of documents has been defined and ranked, the last step in the process is to determine if there are alternatives to each document that may be even better to serve to the user. For instance, it’s more appropriate to send a user on a mobile device to the device-specific version of a page than it is to send them to the desktop page.

The other situation where alternatives may be returned is when there is a better geographic version of the page: i.e. when a page aimed at users in the USA was ranked, but the user is geographically located in Australia, search engines will try to return the Australian-specific alternative if it is defined.