Copy: Decoding the Google Answer Box Algorithm – a SERP Research on 10.353 Keywords

Last time we looked at the Google’s Answer Boxes, we came up with quite a handful of interesting observations. However, we couldn’t quite give you the best explanation of what it takes to get your website on the position zero, as some named it. We gathered you needed to be regarded as an authority site, but what does that really translate into?

The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny…’ – Isaac Asimov

So we set out to find out more about the issue, only this time with a more scientific outlook on things. This meant that, while we could still look at only some examples, we could make the sample much bigger. What’s a big enough sample? Well, in statistics a couple of thousands is usually enough. So just to make sure, we looked at about 10 000 keywords. Of course, we didn’t have one person (or more) look at every scenario, but rather we devised an algorithm that would do the job for us.


The algorithm did automatic keyword research. It looked for phrases such as “what is…” and “who is…”, adding just one letter after the phrase ( “what is a”, “what is b”, and so on up to “what is z”) and taking into account the autocomplete suggestions (since those are supposed to be most popular searches, therefore the ones most likely to elicit answer boxes).


To have a standardized cutoff point, we only took into account the first 10 autocomplete suggestions for each generated keyword. Using this method to extract the keywords we selected a sample of keywords that are most likely to return answer boxes.

Google Answer Box Appearance Ratio on 10k Keywords

This foray into the search engine came up with about 10 000 search terms (10 353 to be more precise). Of those, only 1 792 returned answer boxes, which is roughly 17% of the total number of searches. So the first straight observation is that the percentage of search results with answer boxes out of the total number of search results is fairly small.


We can say that this claim is true in general, since our sample size of 10 000 is enough to extrapolate for a population of pretty much any size with a high confidence level. While this may sound pretty unbelievable, that’s just how statistics works. Admittedly, we haven’t really been using a perfectly random sample, so let’s just say that the claim we made earlier is true of all searches that could potentially yield an answer box: off all that could, rather few actually do.

Google Answer Box Appearance Ratio


Google Answer Box Types

We have already established previously that answer boxes are a certain type of rich answers and can come in many shapes and sizes. Also, there are different methods to trigger the answer box. Theoretically, answer boxes are triggered by the featured snippet you set, but there’s not a fact. Mostly Google selects what info considers it is relevant for a specific query. So we instructed the algorithm to also figure out what kind of answer box it received.


Most of the answer boxes included definitions or descriptions that were the result of various website extractions; they were 1 236, which amounts to almost 69% of the answer boxes. Which means that all the other types of answer boxes – Web Definitions, Video Widgets, Google Widgets (conversions, maps) or Google Dictionary Definitions – taken together amount for less than a third of the answer boxes. But this is good news for SEO. If you mark up your content with structured data you’ll be able to appear in google’s answer boxes. Google’s natural language API helps webmasters to find all the entities from their website and get more rich snippets , better click through rate and maybe some answer box integrations.


If the answers only consisted of Google widgets, Google definitions or web definitions, you would have little to contribute to the landscape. As things are now, your website could be the source of a definition or description for the vast majority of the answers in the box.

Extracted Google Answer Box Types


Before continuing, let’s clear up a bit the definition’s types, as they appear in the answer box.

Google Dictionary Definition

Google Dictionary was an online dictionary service of Google, originating in its Google Translate service. The Dictionary website was terminated on 2011 but after that, part of its functionality was integrated into Google Search and now it looks like it’s integrated in the paragraph snippet. When it provides the direct answers from the Google Dictionary, you won’t see any URL near the generated content. It answers the question and it kind of gives you the feeling that Google “knows for sure” that the info is accurate and doesn’t need to give any extra explanation.

Google Dictionary Definition

Google Widget

Google has quite an impressive and helpful number of widgets, including translating, weather, driving directions or currency converter services. These widgets really improve the user’s experience, sparing him lots of clicks and time invested. For instance, if a user needs to find out how much  300 meters mean, reported to kilometers, a user doesn’t have to go on several sites to find out how much one meter mean and multiply it by 300. All he has to do is “ask” Google “how long is 300 meters” and he will get his answer instantly.

Google Widget Definition

Google Video Widget

Also, if you want to impress your friends with some new move dances or you are looking for a particular type of moves that you want to reproduce, Google understands this need and gives you a video result directly in the answer box.

Google Video Widget Definition

Google Web Definition

The quick answer boxes provided from the web definitions are quite a basic way of generating the information extracted from URLs with Glossary and dictionary words. These kind of definitions rely neither on entities nor on a dynamic process of extracting data but rather a static procedure is involved. Although there are a high number of answer boxes coming from web definition, they are not always the best answers that Google provides, time and again providing inaccurate or unrelated data.

Google Web Definition

Google Web Extraction

The definitions provided in the answer box from web extractions are, as we will see later on, more reliable, more dynamic and more accurate than the web definition. They usually come from sites that have high authority and also include the search query on their site. For instance, in the example below, if we want to find out what an atom is composed of, the answer box extracted the information from . As we follow this site, we will see that on the landing page we will have a dedicated content to this matter, with the matching title “What is an atom? What are atoms made of?”

Google Web Extraction

Unique Domains Used for Data Extraction

It seems that in the world of the search engines the rich get richer. When we analyzed the answers that were extracted from websites, we found out that they only came from 342 websites. So on average about 3.6 answers per website. But averages can be deceiving and in this case they actually are. Of those 342 websites (mainly , Wikipedia, dictionaries or Glossary) not all got the lion’s share.

Unique Domains Used For Data Extraction Google Answer Box

Top 10 Ranking Position Distribution in the Organic SERPs for the Answer Box URL

Of the many factors that might influence the “distribution”, one that comes to mind almost instantly is the SERP ranking. So we split the websites according to this criteria, and look and behold, websites that were found on the first position in the organic results accounted for a third (33%) of all answer box information. The top 5 pages accounted for more than three quarters (77%) of the answers.


There was just 1 answer out of the total 1 236 that came from a page not in the top 10 (statistically, that’s less than 0.1%). So rankings matter. And while you would be right to suggest that the relationship implied by this correlation may be more complicated here than what we are seeing from the numbers, you’d be taking a pretty serious chance to bet on being that 1 case in 1 236 that doesn’t need to be up high in the rankings to make it to the answer box. Or, to quote XKCD “scienc-y” web comic creator Randall Munroe, “Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there.’ ”

Answer Box Extraction URL SERP Distribution

Trusted Sites Distribution in Google Answer Boxes

In all fairness, we are inclined to cut you some slack and say that it’s not necessarily (or solely) the SERP rankings that matter, but that’s only because the SERP positioning might simply be an indicator of some other measure of your website’s quality: referring domains. This is a case where more is actually more and better. Domains that provide answer boxes with more than 10 000 referring domains are exactly half of all domains that represent answer sources.


Interestingly enough, a lot more answers (twice the number) come from domains with between 1 and 5 000 referring domains than from domains with between 5 and 10 000 referring domains. That may very well be, though, due to the arbitrary split or due to a lot of the values being around the cutoff point. Despite this, however, the 1 K mark is a fairly good predictor: more than 80% of all answers come from places that have 1 000 referring domains or more. But that means there’s still a reasonable chance of popping up in answer boxes even with less than that. However, if you drop under 100 you are on your own: less than a 3% chance of hitting the jackpot.

Trusted Sites Distribution Google Answer Boxes

Google Answer Box Crawl – No. of Results based on Words per Page Intervals

In fact, how the information is structured may have a lot to do with your chances of being considered a trustworthy source for answers. A helpful element is having a title that is roughly the same as the question and an answer that immediately follows the title. That speaks to the structure part. What about rich content? This is where, unlike before, less is actually more. Pages with less than 2 000 words (the rough equivalent of 5 pages typed in Times New Roman, font size 12, single spacing) account for close to 70% of all answer boxes.


As the number of words grows larger, the number of answer box results shrinks: 20% for pages with between 2 000 and 5 000 words, 5.5% for pages with between 5 000 and 10 000 words and only 2 percent for pages with over 10 000 words! Whether adding information automatically makes it harder for Google to look at that page for answers, or it simply makes it harder for us to keep things simple and straightforward, one thing is certain: leaner is better.

Google Answer Box Crawl Words Intervals

Google Answer Box Characteristic – Title vs No-Title Answer Boxes

As you’ll go around searching for different queries on Google, you might notice that there are two types of answer boxes, if we take into consideration the title: answer boxes which have a title and answer boxes which don’t have one. Just like in the case of any piece of content, the title can make a great difference. Let’s take a look at the screenshot below!

Title vs no TItle Google Answer Box

We are not talking only about the title’s purpose to garner attention and entice people to start reading your content. But about the basic purpose of titles: functionality. Beyond all, people need to know what the content is about. From all the analyzed answer boxes about 30% have a title, while the rest provide the information directly in the box, without any other introductions. Is a title beneficial? It definitely is as it highly improves the user’s experience. With a majority of “no title” answer boxes it is not exactly at hand saying that Google is on the right track with this matter. Yet, things might change in the future and having 100% titles in answer boxes in a couple of years might not be far from the truth.

Google Answer Box Characteristic Title vs No Title

Answer Box Stability and Freshness

There seems to be quite a lot of hard work you need to do to get into the much coveted answer boxes. But the reward is likely to pay off, perhaps even in ways that were not necessarily intended. We are no further still into the “that’s interesting” territory. The interesting thing being that the answer box functionality seems to be rather static and that once a website gets there, it might be a long time before it is removed. Not even, say, the website going down will shake the answer box. This turned out to be the case for a variety of queries, such as “what are lr14 batteries”, “where to buy plan b”, “what are k1 tax forms”, “what is seo spam”, or even “who is john endler and what did he study” (vertebrates, he studied vertebrates). So there is a very slight chance that an answer box will buy your website a little bit of post-mortem remembrance.

Expired Domains Rank No.1 in Google Answer Boxes

Being in the SEO industry and trying to make our way through all the Google’s guidelines, we are often asked “what is a natural link ?” We’ve tried to give the best answer to this question but what better place to ask about this than Google? Yet, as we tried to figure out what the exact definition of the Google friendly links was, the answer box failed to provide us with  such a rewarding explanation. What is even more interesting is that when we tried to follow the link from the answer box for more info, we stumble upon an expired domain:

Web Definition Broken Unregistered Site
Being imbued by the “researcher’s fever”, we decided to buy this domain and analyze the ranking data we may get from Google.

Registered The Domain

Even if it was dropped, this link stayed in the answer box from 16 May until 6th of July. This means almost two months while an un-registered domain ranked number 1 for the “what is a natural link” search query. And it would have probably stayed even longer if we haven’t bought it. Quite ironic, isn’t it? Google, the great unnatural link “slayer” providing us a broken link on the top of its results, trying to explain us what a natural links is.

Unregistered Registered Method

We decided to re-create the page based on its previous content and remove any extra data. So, with the help of WayBackMachine we extracted the content of that page and recreated it exactly.

WayBackMachine Information Extraction

 And this is the content that was put on the site based on the previous content that was there years ago.

SEO Terms Example

What is left to do now (beside enjoying the quality of the owner of a website listed in the answer box)? Track the traffic and enjoy the ride. We are still analyzing this site’s situation and as we gather enough valuable information, we will let you know what happened with our mentioned answer box expired site in the Answer Box results.

Watch Traffic Roll In

But some website definitions bring out even more issues as not only they hit the jackpot, they do so multiple times. Wikipedia’s entry for “Search engine optimization (SEO)” brings all the SEO-related curious people to its yard. It’s the source for no fewer than 14 answer boxes, including, information for questions such as “what is seo expert”, “what is seo consulting”, “what is seo industry”, “what is seo definition”, “what is seo marketing” and more. But do not be fooled by this “rich and well structured” content that provided so many answer boxes. What really happens is that for all the mentioned queries, even if it’s about SEO expert or SEO marketing, we are provided with the same, identical answer box. Not so impressive anymore, right?


Then again, there is a much greater chance that this static character of answer boxes will impact you negatively, since it will prevent your perfectly well-functioning website from entering the ranks because some defunct authority no one even knows if it exists anymore is taking up the space.

Website Extraction vs Website Definition Answer Boxes

I invite you to take a look at another interesting finding, regarding the Website Definitions.

It looks like none of the URLs for website definitions are found in the top 10 SERPS.

For instance, for the search query “what is a link description”, the URL suggested in the answer box,,  is not to be found in the first 10 pages of results. This raises two legit question:

  • how can a site that Google doesn’t consider worthy to be listed in the first 10 results  be given as a resource in the answer box
  • shouldn’t we worry about the quality of the information found in the answer box, given this situation?


As we analyze other answer boxes extracted from web definitions we find out that the majority of them seem to be low quality and sometimes even unrelated. Let’s take for instance the query “what is 360 link”. Even if the web definition provided by the answer box comes from Wikipedia (where 51% from all web definitions come from), it cannot be found in the top 10 results. Even more, the content provided is unrelated and has a commercial flavor (it refers to a product from the ProQuest company). This is the exact opposite of what John Muller from Google said  about “branding” the answer boxes:

we need to watch out […] so it doesn’t turn out to an advertisement for a web site but rather that it brings more information to the search results about this general topic.

Thereby, having so many issues, answer boxes generated from Web Definitions don’t look very reliable. Yet, in the case of website extraction things are more settled and we don’t encounter the same problems. Judging by the fact that the data are shown differently, we can assume that the extraction from web definitions vs Entity Extraction done using the Knowledge Graph is made totally different. The Website Extraction seems more precise while the Website Definition seems more basic.


Nevertheless, mysterious are the ways of Google but equally determined are the people from cognitiveSEO to find out answers. As we browsed so many queries with answer boxes, we identified a pattern in the web definitions extraction. It looks like the majority of definitions that are not coming from Wikipedia have a similar URL pattern using the words “glossary” or “dictionary” ( and other variations).

Google Website Definition Characteristic Patternable Words

Google SERP Re-crawl – 1 Week Later

As we tried to keep things as accurate as we can and assure ourselves and our readers that the data used in this research are representative, we’ve re-run the analysis one week later after the initial research was made. The results made us think even more about how the answer box algorithm really works (as we weren’t already) but confirmed in the same time the correctness of the initial investigation. After this re-crawl on the 10.353 keywords taken into account, we found 120 new answer boxes, 127 disappearances and 13 answer boxes with their status changed. From the new answer boxes, a large majority (about 80%) are Web Extractions and just a few are Google Widgets. Judging by the fact that for our sample only, we found more than 100 new answer boxes we might say that answer box is a growing “industry” and Google might offer answer boxes in short time for more queries.


Let’s move a bit our focus on the disappeared answer boxes. The reasons of the 127 dissolutions might be multiple and we cannot be 100% sure what really happened. But we have some well-funded assumptions. The first one would be that Google is making some A/B testing. It’s very likely that the big G is taking into consideration the bounce rate, the click through rate, the user’s experience overall and choose to keep or remove the answer boxes depending on these factors. I think that they are actively doing A/B testing on the Google Answer Boxes because sometime they appear sometime they do not for the same search. Google is doing a lot of testing in the SERPs and with answer boxes being such an important part of it right now, they might apply the same tactics.


Our second supposition is based on a situation that we meet quite often: Google is not always returning the same results for the same search query, answer box included. Meaning that for the same “ what is…” query, keeping the same coordinates of the search, sometimes we received an answer box and sometimes we didn’t. Thereby, this mysterious vanishing of the 127 answer boxes may originate from here.
As for the answer boxes with a changed status, we can see that a very small number underwent modifications. Most of these few adjustments concern transformations from Web Definitions into Google Widgets and vice versa.

1 Week Google Answer Box Changes



Google Answer Boxes might be quite controversial as the Google Search user interface lets Google’s users view and copy content without visiting the content provider’s website. In addition to losing organic traffic, webmasters might be also a bit “upset” with the fact that their perfectly well-functioning website doesn’t appear int the answer box while some broken site that doesn’t exist anymore is taking up the space . Double-ouch for Google answer boxes!


Yet, we cannot help ourselves from seeing things from the user’s point of view. If the answer box has accurate information, they provide the user with a better usability by sparing him  another click or providing a shortcut to the final action he needs to do. If, for instance, you urgently need to make a payment and you want to know how much  127$ is in Euros, all you have to do is “tell” Google to “convert 127$ to euro” and you’ll have the result in an instant. Not long ago, for the same operation you needed to consult a currency exchange site and after that manually calculate the amount you were interested in.


Having 80% of the newly emerged answer boxes , on our second analysis, coming from Web Extractions, gives webmasters quite a new breath. Judging by this information, we can say that Google is looking more and more at the definitions provided by high-quality websites, giving webmasters the chance to have their site mentioned in the first row, above all the search results. As we shown previously, answer boxes extracted from websites are more accurate and provide the user with a better experience. Thereby, Google taking more into consideration various websites as a source for the answer box is a win-win situation.


As we mentioned in this blog post, there are several issues with the answer boxes. The most important we feel the need to emphasize are the fact that the results generated are quite static and sometimes not relevant, even though they are mostly reserved for high quality sites. These issues can be a big enough obstacle for webmasters that wish and (maybe) deserve to be listed in the answer box. It is indeed a hard working process but not an impossible one. Proving Google that your site is trustworthy and an authority in the field it’s way harder to be done than to be said but it pays off on the long term. Moreover, following some tips that we came up with in a previous post on how to optimize for the Google Answer Box might be also really helpful.

The post Copy: Decoding the Google Answer Box Algorithm – a SERP Research on 10.353 Keywords appeared first on SEO Blog | cognitiveSEO Blog on SEO Tactics & Strategies.

Leave a Reply

Your email address will not be published. Required fields are marked *