Archive for the 'web analytics' Category

Aug 17 2007

Google Analytics – Ups & Downs from an Analyst’s Perspective

Published by Wendi under web analytics

There have been numerous discussions about Free vs. Fee Web Analytics tools. I have to agree with Marshall Sponder’s opinion on the whole topic.

My boss had a recent conversation with Yahoo! the other day about web analytics tools; specifically Google Analytics, so I am curious to find out if they too will be releasing a “free” tool soon. They wanted to know the strategies behind our utilization and our general opinion about the capabilities of Google Analytics.

Here was my response:

There is a time and place for everything and that applies to making a choice of what analytics vendor to go with on a website. Google Analytics is NOT an enterprise solution and nor should it be used as a long term solution for businesses investing in their web presence. However; I do believe there are reasons why some companies should opt to utilize a free solution like Google Analytics. Small to Medium businesses are the target audience for this type of solution. Small traffic, small online commerce presence is the user audience. Ignorance breeds bad decisions and if resources are limited; including funding, then I will always recommend the minimum investment of implementing a free solution. Having had the opportunity to implement GA on small business sites as well as large sites where we are pushing the limits the strengths and weaknesses of GA become apparent. There are ups and downs to every story:

Ups

· Free & Easy to Implement

· Direct integration into Google AdWords (no need for additional tagging)

· Flexibility of setting up multiple profiles to track multiple domains or support testing needs

· GA Filters enable advanced insights

· Large community willing to share customizations for free (GA filtering techniques, Cart tracking customizations, etc…)

· Interface is easy to use/navigate and load times are relatively quick

Downs

· Inability to customize reports and setup auto-delivery

· Limited commerce tracking (only 4 success events can be tracked)

· Inability to define customized metrics and integrate into reports

· Tracking exit links and downloads take additional coding

· Inability to customize Page-Overlay click reports; don’t display actual values, limited metrics

· Inability to integrate with third-party systems easily (CRM, Email, Ad Serving, etc…)

· Customer Support is inadequate and must rely on GA community for most answers and customizations

You really need to take the time to weigh all the pros and cons to each tool that is in your price range but if you are looking for flexibility Free is probably not where you want to be. The “Free” tier is growing as we eagerly await the release of Microsoft’s “Gatineau” and these tools are working to provide enhanced features but they still can’t replace the functionality a paid solution can provide.

Until next time… Safe Analyzing.

6 responses so far

Jul 09 2007

The Butterfly Effect.. or is it just coincidence?

The NY Times had an interesting article today about the eerie postings of death predictions in Wikipedia like the most recent one regarding the death of Nancy Benoit. The article moves into discussing the fine line between real-time late breaking news and predicting future events. I have to admit that I find this article disturbing but yet on some level intriguing. The article gets better once you can get past all the weird death notable mentions. One thing it reminded me of was the notion of Bill Tancer’s ‘searchonomics’ theory.

Bill Tancer, GM of Hitwise, initially proposed thoughts back in 2005 on ‘searchonomics’ and predicting consumer interest or rather public fear of possible a epidemic outbreak based on search history on the technical term “H5N1” and it’s more consumer friendly version “bird flu”. He has also dabbled with more fun data and predicting winners for American Idol and the UK version of Dancing with the Stars – both of which he was right on the money with predicting the winners.

I have found ‘searchonomics’ rather an interesting phenomenon that I thought I’d start my own predictions to see if there is any predictive power on the 2008 presidential candidates.

Unfortunately I don’t work for Hitwise, nor do I own a membership either; so I am limited to free versions of similar data – which limits my visibility a little. Using Google Trends you can see the early few months of the year on some of the top Democratic candidates:

Google Trends Democratic Presidential Candidates

 

Based on the traffic so far, it looks like it’s going to be a close race at this point. I’ll wait to make my predictions on the democratic side as soon as Google Trends decides to update their data a bit more (or if someone is willing to pull data in some fancier tool with more up to date data and send it my way, I might be able to make my prediction sooner).

Is this ‘searchenomics’ phenomenon the result of a “Butterfly Effect” or is it just a set of data points that are merely related by coincidence?

Until next time… safe analyzing.

No responses yet

Jun 17 2007

Comparing Population Proportions – A/B Testing

Published by Wendi under statistics, web analytics

Comparing Population Proportions – A/B Testing

Many metrics in web analytics are conveyed as percentages, or population proportions as statisticians like to call it. As I mentioned in my previous post on the Statistics for People Who (Think They) Hate Statistics, percentages are useful in the real world (business data) and I was surprised there was not a section dedicated to this topic. So I thought I would cover a post on comparing population proportions; namely conversion rate for landing pages.

Landing page optimization is one aspect of testing in web analytics. It’s great. You can test almost anything – layout, content, color, tag line, call to action, media, etc… In my scenario we were testing the tag line. Since we are only testing one aspect of the page you can refer to this testing methodology as A/B testing. This is very much different than multivariate testing where numerous “variables” or parts of the page are tested at once and I’ll leave that for another conversation. So for now, we tested one variable – the tag line. The call to action that was defined for a measure of success was a submission of an online lead generation form. Since the form was small in nature, the form was a portion of the landing page and there was no interim steps/pages that may have increased conversion failure.

In testing, you must first define your hypothesis. The hypothesis in this case is that landing page #1 out performed landing page #2. In metrics terms, we are saying that the conversion rate for landing page #1 was better than landing page #2 (with a statistical significance).

Null Hypothesis H0: p1 = p2 (or can be written as p1 - p2 = 0)

“conversion was not different”

Alternative Hypothesis Ha: p1 p2 (or can be written as (p1 - p2 0)

“conversion is different”

Alpha = .05

The delivery of the pages were equally distributed among both pages but there were slight differences and that difference will be included in our calculation.

Landing page #1

Delivered 6,906 times

Conversion yield = 1.71%

Landing page #2

Delivered 6,534 times

Conversion yield = 1.44%

Some might just stop here and say, landing page #1 out performed landing page #2 and move on. But is that really a valid inference? Let’s see.

To test two population proportions you use the following equation:

two population proportions test

The p with the ^ on top is referred to as “p-hat”. “P-hat” is the sample population proportion (the %’s from your data) and is used to estimate the true population proportion.

All the calculations from the above formula can be easily done in excel and can be seen in a sample file here.

The calculated z-score is 1.2563. In excel you can calculate the p-value by utilizing the NORMSDIST() formula. You can determine the critical region or sometimes referred to the “rejection region” for the null hypothesis just by the z-score but from an interpretation standpoint, it’s easier to compare the p-value to the previously defined alpha. Calculating the p-value will help understand whether the difference between the two percentages are statistically different. The p-value is just 1-NORMSDIST(z-score).

= 2*[1 - NORMSDIST(1.2563)]

= 2*[0.105]

= .209

Now, our alpha value was set at .05 per our testing criteria listed previously. Since our p-value > alpha; 0.209 > .05, we Fail to Reject the Null Hypothesis. What does that mean? It means that the difference between the two conversion rates are not statistically significant. Thus technically, even though the conversion rate for landing page #1 was higher than landing page #2 there really wasn’t much of a difference to warrant one having a “better” tag line.

Until next time… safe analyzing.

4 responses so far

Jun 03 2007

Relationships are a thing of the past…

Published by Wendi under statistics, web analytics

Or are they? I would argue that at the heart of statistics is a line. A line that best fits a sample of data. What is this line, well it’s called a Least Squares Regression Line.

A <linear> regression line can be used to describe a collage of data points that have a linear relationship … no more, no less. There is power in knowing whether your data has a linear relationship or not and if it does how good a fit this line really is to the data.

Say you wanted to understand why you were seeing various spikes in bounce rates on your site and you had a hunch that the paid keywords that you recently added were the culprit. To prove, or better yet hopefully disprove that this hypothesis is to true, run a regression on a few numbers to see if there is a relationship between the two.

If you were awake in algebra II back in high school, you may remember the equation of a line, but just in case you didn’t here it is…

y = mx + b where m = slope and b = y-intercept

In most statistics books you probably won’t find reference to a line with the same letters but what they substitue still retians the same meaning. Since statisticians like to separate themselves from mathematicians we like to have our own way of writting an equation. So from here on out I’ll refer to the slope as b1 and the y-intercept as b0.

So, back to my problem: I think that my new keywords I added to my SEM campaign are driving my site bounce rate up but I am really not for sure. To check this out, I took % of Searches (delivery of impressions) and Bounce Rates for a 10 week period since the new keywords were added and regressed site bounce rates against % of searches. What I found out was what I feared, there was a direct relationship between the two. So in essence, these new keywords had a negative impact on my site traffic. Yikes! I am going to take those down right now!

Ok, back to the regression details, to run your regression there are a few simple calculations that you need to prepare that can be inserted into the bigger formula. In my excel file Least Squares Regression you can walk through each step with formulas as I walk through them here:

Regression Variables

In the excel file all the variables and calculations are laid out piece by piece and initially it helps to calculate each X*Y, X*X (aka X2), etc… then sum or multiply through where needed. Excel even provides shortcut formulas but it helps to understand what they are doing first before you use them. I have also included a few excel short cut formulas for calculating the slope, y-intercept, and the correlation coefficient.

Take some time to dig through the excel file and next time I am going to go further into what all the pieces mean.

Until next time… safe analyzing.

3 responses so far