Back to Engineering Blog

Know Your Audience: a tool that helps you discover demographic bias on your website

  • 4 min read

Technology today has made it really easy to create your own website. Through open-source content management systems like WordPress, you are ready to publish your ingenious blogs or demonstrate your amazing products with just a few clicks. Meanwhile, thanks to Google Analytics, you can easily track the traffic to your website by plotting the trend of number of visitors through time, where they are from, what kind of device they are using, etc. You feel in control. But there is still so much more you don’t know about those traffic. I’ll explain.

Suppose you just published a new article or launched a new product and shared on social media. You see a big growth in the traffic. Everything looks great. However, it didn’t achieve the number of shares/comments or sales you expected. You may also find a very low retention rate, meaning people don’t tend to come back to your website. What could possibly be wrong?

The thing is, you do see visitors coming but you don’t know who they are. Unless you ask them to sign up and provide personal information (or you work for Google/Facebook and already have that kind of information), you do not know anything about them besides the devices they are using. You might want to know things like gender, age, profession, i.e. things more relevant to what might interest them and how you could keep them on your website. That is why we built a tool to provide some insights here.

A conceptual diagram demonstrating what data connected by Acxiom and LiveRamp technology is used to build “Know your audience”.

 

The tool introduced here — “Know your audience” — is the accomplishment of a week-long project during the Hackweek XLIII of LiveRamp. Combining the gigantic offline dataset Acxiom has been cultivating since the 60’s and the online dataset LiveRamp (now an Acxiom company) has been developing for over a decade, we were able to match online browsing history and various demographic information including gender, education, ethnicity and so on. This demonstration of “Know your audience” will give you a sense of the power of our data and how it could help you build a better website.

For now we only have a Jupyter Notebook interface, which looks like this:

 

The first argument “keywords” takes in the keywords of your website. You can cherry-pick some words that you think are most relevant, but you can also copy-paste in an intro paragraph of your blog or sales pitch. No need to worry about punctuation, other symbols and URLs; the function will drop them automatically.

Then you need to specify the demographic variable you would like to look at. For this demo we only enabled four variables: ‘age’, ‘education’, ‘ethnicity’ and ‘gender’ but it is worth mentioning that Acxiom has information on over 8,000 demographic and purchase-habit-related variables. In the above example I put in ‘education’.

The last argument ‘display’ specifies how you want to display the result. If the value is ‘compare’, for each category of the specified variable, the percentage of visitors in that category will be compared between your website and the average of all websites. Naturally if the percentage for your website (the red column) is taller, that category is over-represented in the visitors coming to your website. Here we can see that for a website with keywords ‘Japanese anime’, people with high school diploma is over-represented while people with college degrees and higher are under-represented.

 

The alternative option for ‘display’ would be ‘ratio’, which simply show the ratio between the percentage in your website and the average percentage over all websites. Therefore if the ratio is >1, the category is over-represented, and vice versa.

If your keywords are ‘Football’, men are over-represented in your visitors and women under-represented.

 

Another interesting example is for the keyword “cartoon” using the demographic variable “age”:

 

Apparently both young and old people are very interested in “cartoon” related websites while middle-aged people are not so much. If the keyword is “children”, the result is completely reversed:

 

Middle-aged people are all about children while younger and older people are not so crazy about that topic.

In summary, “Know your audience” provides you a way to check if the visitors to your website are really who you think they are. Even better, it can help you modify the theme of your website to attract the desired population. Of course this is just a tiny example of the possible applications of the data here at LiveRamp. We are excited to explore more and would love your input!

How we did it

We developed a minimalistic distributed web crawler to fetch and parse out keywords from the top one million web pages that send web traffic to the pixel server of LiveRamp. Using a technology enabled by Abilitec, we were able to associate demographic information to each web browser and then to the keywords. Then we processed the keywords with Natural Language Processing techniques. Using the word frequencies (tfi-idf) as input features, we trained a linear model for each demographic variable to predict the proportions of different demographic categories in the visitors of a website in terms of the keywords on that website.