Jun 03 2007
Relationships are a thing of the past…
Or are they? I would argue that at the heart of statistics is a line. A line that best fits a sample of data. What is this line, well it’s called a Least Squares Regression Line.
A <linear> regression line can be used to describe a collage of data points that have a linear relationship … no more, no less. There is power in knowing whether your data has a linear relationship or not and if it does how good a fit this line really is to the data.
Say you wanted to understand why you were seeing various spikes in bounce rates on your site and you had a hunch that the paid keywords that you recently added were the culprit. To prove, or better yet hopefully disprove that this hypothesis is to true, run a regression on a few numbers to see if there is a relationship between the two.
If you were awake in algebra II back in high school, you may remember the equation of a line, but just in case you didn’t here it is…
y = mx + b where m = slope and b = y-intercept
In most statistics books you probably won’t find reference to a line with the same letters but what they substitue still retians the same meaning. Since statisticians like to separate themselves from mathematicians we like to have our own way of writting an equation. So from here on out I’ll refer to the slope as b1 and the y-intercept as b0.
So, back to my problem: I think that my new keywords I added to my SEM campaign are driving my site bounce rate up but I am really not for sure. To check this out, I took % of Searches (delivery of impressions) and Bounce Rates for a 10 week period since the new keywords were added and regressed site bounce rates against % of searches. What I found out was what I feared, there was a direct relationship between the two. So in essence, these new keywords had a negative impact on my site traffic. Yikes! I am going to take those down right now!
Ok, back to the regression details, to run your regression there are a few simple calculations that you need to prepare that can be inserted into the bigger formula. In my excel file Least Squares Regression you can walk through each step with formulas as I walk through them here:
In the excel file all the variables and calculations are laid out piece by piece and initially it helps to calculate each X*Y, X*X (aka X2), etc… then sum or multiply through where needed. Excel even provides shortcut formulas but it helps to understand what they are doing first before you use them. I have also included a few excel short cut formulas for calculating the slope, y-intercept, and the correlation coefficient.
Take some time to dig through the excel file and next time I am going to go further into what all the pieces mean.
Until next time… safe analyzing.
3 Responses to “Relationships are a thing of the past…”
Leave a Reply
You must be logged in to post a comment.
Hi Wendi. Wouldn’t it be a whole lot easier and more valid to look at bounce rates by keyword/source? Furthermore (and I am so not a statistician, that’s why I read your stuff, to learn), wouldn’t you really just learn about correlation and not causation? So bounce rates increased during those weeks when you ran that new campaign but maybe it was your competitor’s great new pricing, everyone was aware of it, and as soon as they landed, they saw that they couldn’t get the same price?
Hi Robbin, All very great points and questions. In a business setting data is not always “nice” in that you can’t control for everything and you may run into confounding issues but if you know that 1. you have a semi-controlled environment (you didn’t just launch a redsign of your site during the same time period) and 2. there is an underlying business relationship between the two factors then correlation makes sense. Of course you don’t want to compare changes in wind speed to your online SEM campaigns even though you could possibly prove some kind of relationship, but it just wouldn’t make logical sense. You are right in that correlation doesn’t equal causation (and never will) but knowing that there is a statistical relationship and there is a business relationship (obviously you know that running an SEM campaign should ultimately affect your site traffic otherwise you wouldn’t pay) you can make inferences from those relationships. I am not suggesting to completely drop your SEM campaign but I would certianly probe more into testing the existince of these new keywords in my campaign.
And, yes I would certianly suggest looking into the details as well and take a hard look at bounce rates by keyword. For smaller campaigns this may be easier to go straight to but what if you added over 2000 new keywords. I would first suggest to see whether there was an impact at all as a whole. If not then move on, but if there was a slight correlation and linear relationship then work your way down the segment chain, first by source, then ad group, then keyword so that you can isolate the possible culprit.
Thanks for the comments and thought provking questions. I hope I answered some of them.
Wendi
[…] promise, I wanted to take a deeper dive into the regression data posted in my previous discussion. To bring you back to the topic, I was discussing the relationship between the percent of […]