Jun 11 2007

Relationships Take 2

Published by Wendi at 7:33 pm under statistics

As promised, I wanted to take a deeper dive into the regression data posted in my previous discussion. To bring you back to the topic, I was discussing the relationship between the percent of searches for new SEM keywords and site bounce rate. In my example I regressed these two variables and determined there appeared to be a strong linear relationship between the two. Of course there are other ways of analyzing the response to new keywords in your paid search campaign but with this approach you can derive a statistically sound relationship at a high campaign level which can point you in the direction to look deeper. That way you do not have to look at every single keyword and minimize any unnecessary work and more importantly you can possibly even save time in determining your course of action.

Initially, I calculated the variables of a regression line. Those include the slope and y-intercept. From a calculation standpoint it easier to calculate the slope first then derive the y-intercept as a calculation based on the slope. But technically you don’t have to do this by hand using long algebra as I have included in my excel sample. Instead excel has built-in formulas that make your life so much easier. SLOPE(known Y’s, known X’s) and INTERCEPT(known Y’s, known X’s).

In addition to the linear regression variables, excel provides a quick formula for deriving the Pearson r correlation coefficient. CORREL(known Y’s, known X’s) . From this calculation, it very easy to calculate the coefficient of determination which is what will tell you how strong your relation is as expressed in the correlation coefficient. Most people can eyeball the strength but if you want to get down to an actual measurement of strength I would advise you to calculate the coefficient of determination; especially since it’s relatively easy to compute. All you have to do is square the CORREL() value and your done. Easy as that. So not only do you know the exact relationship being expressed between two variables but you also know how strong that relationship really is.

OK, back to my example. So in my sample I calculated the correlation coefficient and I found that the r = 0.9297. By inspection the rule of thumb of correlation strength is roughly (this applies to negative and positive r values):

· Between .8 and 1.0 very strong

· Between .6 and .8 moderately strong

· Between .4 and .5 moderate

· Between .2 and .4 moderately weak

· Between 0 and .2 very weak

In my case, I have a pretty strong relationship by initial review, but how strong is it really? Calculating the coefficient of determination will tell us how strong. r2 = 0.8645. Interpreting this value means that 86.4% of the variance in bounce rate can be explained by percent of searches of the new SEM terms. Or you can look at it in the opposite fashion and say that 13.4% of variability in bounce rate is unexplained at this point. Technically speaking, 86% coverage of variability to pretty darn good. This would give me enough reason to dig a little deeper into the new search terms to find the true culprits of increasing the bounce rate on my site.

Until next time… safe analyzing.

Trackback URI | Comments RSS

Leave a Reply