Filtering Facebook search spiders/bots and other automated requests (FB_IAB)

TL;DR: Filter out requests with user agents that include the keyword "FB_IAB" to track who visits your page from Facebook Ads. Filtering out this keyword will reduce the difference between Facebook Ad clicks in their reporting and requests that you might be tracking in your tracking tool.

At Firmhouse, one of our services is supporting startups and corporates in running product landing page experiments. Usually, we drive traffic to these landing pages by paid advertisements from Facebook, Google Adwords, and Twitter.

Lately, we've switched from using Google Analytics and Mixpanel to our on-premise tracking product called Airstrip (more about this soon). With our on-premise solution, we don't have to load JavaScript or give privacy-sensitive data to 3rd parties (usually hosted under jurisdiction in countries we don't agree with). An added advantage is that we aren't affected by ad blockers blocking JavaScript calls.

There is one problem with server-side tracking and running Facebook Ad campaigns. The issue is that Facebook sends a lot more automated requests to your URLs than actual real users. We figured this out with one of our latest proposition testing campaigns for a corporate client. We saw tens of thousands of unique visitors tracked in our reporting, while the Facebook Ad reporting dashboard only showed us a few hundred clicks.

It turns out there are two types of additional requests that Facebook sends to URLs. You can safely ignore these in your tracking tool. These requests are:

  1. Spider/Bot requests. These are pretty straightforward and recognizable by the "facebookexternalid" string in the user-agent. I assume this is what Facebook uses to index and verify the authenticity of the Ads target URL.
  2. Requests that come in when the Facebook mobile app preloads ads in the user's timeline.

This second ad type took a while to be filtered out and eventually we figured out it was responsible for the thousands of erroneous events tracked. What made spotting these requests difficult, was that the user-agents of these requests look like regular mobile phones. So at first glance, they looked like regular Facebook requests, and we deemed Facebook's Ad reporting faulty.

But eventually, I figured it out. When I compared the user-agent from someone who converted into a signup with the user-agents from all the other requests, I noticed the difference. Almost all non-signup visitors had some line like this as their user agent string:

Mozilla/5.0 (Linux; Android 6.0.1; D6503 Build/23.5.A.1.291; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/54.0.2840.68 Mobile Safari/537.36 [FB_IAB/FB4A;FBAV/101.0.0.18.70;]  

An actual signup's user agent string looked like this:

Mozilla/5.0 (Linux; Android 6.0; EVA-L09 Build/HUAWEIEVA-L09) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.91 Mobile Safari/537.36  

The difference is the addition of the "[FB_IAB/FB4A;FBAV/101.0.0.18.70;]" string in the first user agent string sample. It seems that Facebook mobile apps make a request with this user agent string part from inside the timeline of the app. This gives thousands of additional requests compared to people that clicked the ad.

So long story short: when I filtered out any event that included "FB_IAB" my event data the event numbers suddenly dropped to a realistic number that matched well with the clicks the Facebook Ad dashboard was showing me.