• Background Image

    The Conversation

    Social Technology

December 13, 2016

Testing Underlines the Importance of Facebook Topic Data

Facebook Topic Data is perhaps the most important, yet underestimated sea of consumer data in existence. One notable C-level executive at a social listening software company told me recently that they weren’t focusing on Facebook Topic Data much because their clients didn’t show much interest in it.

I, for one, hope that changes quickly. And the reason is simple math.

In several separate tests over the last six months, we at Conversation Research Institute have entered standard brand and topic queries into popular social listening platforms. Our testing included NetBase, Nuvi, Brandwatch, Sysomos and Mention. In each of our tests, we looked at how many conversations over a given time period surfaced, then we input the exact same search threads into a Facebook Topic Data search to compare.

Almost to a decimal place, we could predict how many conversations would surface on Facebook based on the number of conversations we found on the open web. Would be surprised to hear that Facebook is nearly 1.5 times as many?

That’s right. According to our testing over around a dozen or so different terms, Facebook accounts for around 60% of the online conversation. In some cases, it’s even higher.

A recent search conducted for a client in the pest control industry turned up 58,000 interactions on Facebook in the time frame of Nov. 22 until Dec. 12. When you use Facebook Topic Data, the validity of the posts you receive is managed by DataSift — Facebook’s exclusive data provider. What that means is they do the disambiguation to ensure the posts you get are the posts you want, rather than ones that include irrelevant topics.

Out of those 58,000 interactions, we estimate that only about 10% of them are irrelevant — those not caught by DataSift’s processing. (In all fairness, two articles that were shared in the period included references to politicians, lawyers and even a religious group as “termites” which is difficult to eliminate without manual analysis.)

However, in that same period of time, searching the open web for the same exact Boolean thread, we found 8,690 mentions. Some 4,560 of them were on Twitter with Reddit (352) coming in a distant second. And those numbers do not factor in disambiguation (meaning the open web search would net far fewer results).

Twitter is the darling of the social analytics industry because it’s free and open to analyze. Facebook is a walled garden that protects its users’s posts from one-by-one cataloging and analysis by the social media softwares of the world. While the challenge exists that Facebook Topic Data does not provide post-level data, meaning you can’t index and analyze every single post individually, the sheer volume of conversation makes it important.

But make no mistake about it: Facebook is where most online conversations happen. And Facebook Topic Data is going to be essential research fodder for anyone interested in understanding their customers.

Need help finding and analyzing Facebook Topic Data for your company? Drop us a line. We would love to help you understand the conversation.

October 4, 2016

The Achille’s Heel of Social Listening Software

If you use social listening software there’s a good chance you share a frustration with thousands just like you: You can never get the right data. Disambiguating online conversation searches is part Boolean Logic mastery, part linguistics and part voodoo. Or so it seems.

Disambiguation refers to weeding out all the data that comes back in your search query that isn’t relevant. It is a fundamental skill in the practice of conversation research. Type in the brand name “Square” for instance, and you’re going to have a hard time finding anything that talks about Square, the credit card processing app and hardware. Instead, you’ll find a sea of mentions of the word “square” including stories about Times Square, the square root of things and 1950s parents their children referred to as squares.

Disambiguation is a big problem for social listening platforms, yet most of them completely ignore the end user’s need for help. Some have build Boolean logic helpers in their software. Sysomos and Netbase have nice ones. But the only marketing professionals (who this type of software is targeted for) who understand Boolean logic switched majors in college.

What happens when someone who isn’t fluid in Boolean logic searches for conversation topics? You get a lot of results you aren’t interested in. And sadly, most end users of these software platforms don’t know any better. They see results, know they can output a couple charts or graphs for the monthly report and they’re done.

But the results they’re looking at are still littered with irrelevant posts. You can tweak your Boolean string all you want, but you’re likely to come up with something that looks right, but isn’t. And we haven’t even gotten to the Achille’s Heel yet!?!

Case in point: I did a recent brand search for a major consumer company last week. This was a simple brand benchmarking project where I was simply trying to identify all the conversations online that mentioned the brand, then decipher what major topics or themes emerged in those conversations.

My first return from the software was 21,000 conversations. As a reviewed them, I realized there was a lot of spam. After three hours of Boolean revisions, I narrowed the automatic results list to 1,654 conversations. But guess what? While they all were valid mentions of the brand, many of them were job board postings, stock analysis and retweets of news items mentioning the brand. None of these categories — which will likely show up in the automated searches for any sizable brand — are relevant to what the client asked of me: What are the topics of conversation when people talk about us?

So I manually scored the 1,654 conversations, creating categories and sub-categories myself. I also manually scored sentiment for any that made it to the “relevant” list. Here’s what I found:

  • 339 relevant conversations (* — Achille’s Heel coming)
  • 50% were negative; 32% positive and 18% were neutral (compared to the automated read of 92% neutral, 5% positive and 3% negative)

And here’s the Achille’s Heel: (Some topics redacted for client anonymity)


Despite manual scoring and categorizing, the majority of results I found were in a category I called “News Reaction.” These were almost all re-tweets of people reacting to a news article, which were removed in my automatic disambiguation process. The client doesn’t care about the news article (for this exercise) but for what consumers are saying.

The Achille’s Hell of Social Listening platforms is they generally do not automatically disambiguate your data well and even when you manually score it, there are reactions and by-products of original posts included that you don’t care about. (There are probably also ones not included that you do, but my guess is those are of less concern if your search terms are set well.)

This is the primary reason conversation research cannot be left to machines alone. For the platforms by themselves will make you believe something that isn’t actually true.

For more on how conversation research can help your brand or agency, give us a call or drop us a line.



Here’s where deep conversation research comes in. This is a topic chart for a major consumer company’s online conversations for a three month span

Interested in learning more?
Subscribe to our free newsletter and get a free style advice every week. We will also notify You about new offers and discounts. Check it!