Aug 17 2007

Google Analytics – Ups & Downs from an Analyst’s Perspective

Published by Wendi under web analytics

There have been numerous discussions about Free vs. Fee Web Analytics tools. I have to agree with Marshall Sponder’s opinion on the whole topic.

My boss had a recent conversation with Yahoo! the other day about web analytics tools; specifically Google Analytics, so I am curious to find out if they too will be releasing a “free” tool soon. They wanted to know the strategies behind our utilization and our general opinion about the capabilities of Google Analytics.

Here was my response:

There is a time and place for everything and that applies to making a choice of what analytics vendor to go with on a website. Google Analytics is NOT an enterprise solution and nor should it be used as a long term solution for businesses investing in their web presence. However; I do believe there are reasons why some companies should opt to utilize a free solution like Google Analytics. Small to Medium businesses are the target audience for this type of solution. Small traffic, small online commerce presence is the user audience. Ignorance breeds bad decisions and if resources are limited; including funding, then I will always recommend the minimum investment of implementing a free solution. Having had the opportunity to implement GA on small business sites as well as large sites where we are pushing the limits the strengths and weaknesses of GA become apparent. There are ups and downs to every story:

Ups

· Free & Easy to Implement

· Direct integration into Google AdWords (no need for additional tagging)

· Flexibility of setting up multiple profiles to track multiple domains or support testing needs

· GA Filters enable advanced insights

· Large community willing to share customizations for free (GA filtering techniques, Cart tracking customizations, etc…)

· Interface is easy to use/navigate and load times are relatively quick

Downs

· Inability to customize reports and setup auto-delivery

· Limited commerce tracking (only 4 success events can be tracked)

· Inability to define customized metrics and integrate into reports

· Tracking exit links and downloads take additional coding

· Inability to customize Page-Overlay click reports; don’t display actual values, limited metrics

· Inability to integrate with third-party systems easily (CRM, Email, Ad Serving, etc…)

· Customer Support is inadequate and must rely on GA community for most answers and customizations

You really need to take the time to weigh all the pros and cons to each tool that is in your price range but if you are looking for flexibility Free is probably not where you want to be. The “Free” tier is growing as we eagerly await the release of Microsoft’s “Gatineau” and these tools are working to provide enhanced features but they still can’t replace the functionality a paid solution can provide.

Until next time… Safe Analyzing.

6 responses so far

Jul 29 2007

Regional Online Marketing Summit – Stop #6: Houston, TX

Published by Wendi under best practices, marketing, seo/sem

I first want to send a “Thank You” to the Web Analytics Association for the free pass to the Houston, TX Online Marketing Summit.  If you are a member of the WAA and don’t read the monthly newsletter, you should.  Case in point – free passes to conferences (summits, forums, seminars, etc…) and other great discounts for just being a member.    

The conference was packed with great presentations covering a vast amount of information.  Between my co-worker and me we attempted to attend each talk within both tracks.  The Houston, TX location was setup in two tracks one focusing on “Search Marketing & Website Strategies” and the other on “Email Marketing, Analytics, & Social Media.”  Below are some highlights of those talks I sat in personally – but overall the conference was great and I learned a great deal and would recommend it to anyone in the vicinity of the remaining locations. 

Highlights from the talks I joined in on:

·         Google Website Optimizer; Dave Underwood, CEO, TopSpot:  Test minor changes yet don’t test things you already know don’t work.  Top variables to test include headline, image position, ‘call-to-action’ placement/look & feel, length of page, registration requirement for downloads, and contact form field list.  Key points from Dave included listening to your audience while testing, plan ahead and identify what you want to test up front, make sure to run the test long enough yet make sure you don’t over run the test (you are only hurting in the long run if you allow the ‘bad’ versions run longer than needed), and lastly “Just Test It.” 

·         Top 10 Email Campaigns; Joel Book, Dir. Of eStrategy, ExactTarget:  Joel’s presentation focused on permission based email and covered a great deal of information that I can’t give justice in a few short lines, but here goes.  Joel stressed that email strategies should be used to maintain customer engagement with your company while Search is used to attract and the design of a landing page is to convert.  Within a plan (step 1) one should design a communication that will give your customers a reason to Opt-In.  Test, test, test and integrate web analytics to understand the whole picture.  Top five things to test were landing page copy, A/B testing the offer, Subject lines, and A/B testing email creative.  Focus on understanding what your customers want and leverage a customer preferences center to deliver customized content to fit their needs.   Joel mentioned a few supporting tools and resources that can assist in design with email campaigns – PivotalVeracity.com, EyeTools.com, and EmailExperience.org.  I haven’t personally used them but they sound promising. 

·         Beyond Google – Vertical Search; Chris Hulse, Business.com:  With no great surprise there is no common list of available vertical searches but here are some that were mentioned during the presentation – business.com / ThomasNet.com / GlobalSpec.com / CitySearch.com / SourceTool.com / Shopzilla.com / Shopping.com / KnowledgeStorm.com / VerticalSearch.com.  G Y M is the new acronym for the top 3 search engines – “Google Yahoo MSN.”  Vertical search engines should be used to enhance, not replace, online paid placement marketing.  Your marketing plan should include “Core & Other.”  Scan the landscape of your users and understand their needs and other resources they use day to day to enhance placements. 

·         Social Media – Beyond the Buzz; Jason Breed, Vice President, Neighborhood America:  When integrating social media in to an existing marketing strategy, social media should enhance, not replace traditional online media outlets.  Start out by identifying the goals and select the right technology.  Ensure that the infrastructure can support the anticipated response times ten.  Create the right environment for your community.  Develop a community that provides value for those members.  Ensure they have a common interest and that environment is trustworthy and safe.  And most importantly, establish clear expectations up front.  Measure metrics that matter which include those that increase revenue, decrease cost, and those that help drive those two faster.  Lastly, ensure scalability and reliability of the community – Repeat.  Some social media sites mentioned were MySpace.com, Facebook.com, and Digg.com. 

·         What’s next: 21st Century Lead Cultivation; Nate Pruitt, Regional VP, Eloqua Corp:  Nate brought to the table the idea of Lead Scoring. Lead Scoring is the process of assigning a numerical value to each incoming lead that is then used to rank them for priority processing.  Developing a lead score is pretty straight forward includes identifying interest indicators that best predict behavior.  Align these indicators to lead quality and then assign a weight (positive or negative as you need accelerators and decelerators).  Nate also discussed the idea of Lead Nurturing and the process of marketing to so called “bad” leads or dead leads. 

 

After reading all this, you may wonder, what does this have to do anything with statistics?  Well, if you think about it… it has everything to do with statistics.  In each and every discussion, metrics/measurement was mentioned in some form or fashion.  Whatever the strategy, campaign, or initiative, measurement of success is at the heart of each and every plan.

Until next time… safe analyzing. 

No responses yet

Jul 09 2007

The Butterfly Effect.. or is it just coincidence?

The NY Times had an interesting article today about the eerie postings of death predictions in Wikipedia like the most recent one regarding the death of Nancy Benoit. The article moves into discussing the fine line between real-time late breaking news and predicting future events. I have to admit that I find this article disturbing but yet on some level intriguing. The article gets better once you can get past all the weird death notable mentions. One thing it reminded me of was the notion of Bill Tancer’s ‘searchonomics’ theory.

Bill Tancer, GM of Hitwise, initially proposed thoughts back in 2005 on ‘searchonomics’ and predicting consumer interest or rather public fear of possible a epidemic outbreak based on search history on the technical term “H5N1” and it’s more consumer friendly version “bird flu”. He has also dabbled with more fun data and predicting winners for American Idol and the UK version of Dancing with the Stars – both of which he was right on the money with predicting the winners.

I have found ‘searchonomics’ rather an interesting phenomenon that I thought I’d start my own predictions to see if there is any predictive power on the 2008 presidential candidates.

Unfortunately I don’t work for Hitwise, nor do I own a membership either; so I am limited to free versions of similar data – which limits my visibility a little. Using Google Trends you can see the early few months of the year on some of the top Democratic candidates:

Google Trends Democratic Presidential Candidates

 

Based on the traffic so far, it looks like it’s going to be a close race at this point. I’ll wait to make my predictions on the democratic side as soon as Google Trends decides to update their data a bit more (or if someone is willing to pull data in some fancier tool with more up to date data and send it my way, I might be able to make my prediction sooner).

Is this ‘searchenomics’ phenomenon the result of a “Butterfly Effect” or is it just a set of data points that are merely related by coincidence?

Until next time… safe analyzing.

No responses yet

Jul 05 2007

Determining your Sample Size

Published by Wendi under statistics, A/B testing

Robbin Steif asked me today how long she needed to let her test run before she could call it a day and assume that there is really no difference between the treatments (since she isn’t seeing one right now). She sent over the following screen shot of her outcomes in Google’s Website Optimizer from the last two weeks:

GA Website Optimizer A/B Test

As you can see, right now she isn’t seeing any lift in her conversion rate. Actually she is seeing similar values and a small drop. But is the drop significant? Does she have enough data to support an outcome at this point?


Before you run a test of significance you first need to know if you have enough data to support the test in the first place. For population proportions the formula for sample size “n” is:

 

n = z2(pq/δ2)

where

p = % of Success (conversions in this example)

q = % of Failures (i.e. 1 – p)

*note: use the Conversation Rate from your control landing page

To finish out this equation you need to make a few assumptions.

Assumptions

1. The Confidence Level - α (alpha): the level of certainty that you are willing to accept

2. Error - δ (delta): the margin of error that you are willing to accept

With these assumptions set, lastly you need to calculate the Z value based on your Confidence Level. It’s easy to do in excel with the NORMSINV() formula. Since we are determining the existence of a “difference” among the conversion rates versus if the conversion rate is specifically higher or lower than the control we need to divide alpha in half for a two-sided test structure.

=ABS(NORMSINV(α/2))

In this example our Z = 1.96. Now we have all the pieces in our formula to calculate the needed sample size.

n = z2(pq/δ2)

= (1.96)2 * [(.0472*.9528)/(.01)2]

 

= 1728

Thus Robbin is going to need 1,728 page views before she can make the determination that the treatments she is testing did or did not make a difference in her conversion rate. You can download this excel file I put together (nothing fancy) where you can toggle the alpha and delta values so that you can see how each one impacts the needed sample size.

I also included a reference to the maximum sample size one would need if you don’t have a control to set your “p” and “q” values. It’s rather astonishing but it you are a conservative then you can always fall back on this calculation and know that if you get approximately 10,000 samples you are good to go.

Part II of this question is – Is there a difference? This is different than asking how many samples you need to determine if there is a difference. Applying a hypothesis test is needed to actually determine if the difference in the conversion rates are statistically significant. You can read more about how to do this on my previous post about A/B testing. You can find a downloadable excel file in this post that you can toggle various sample sizes and determine if the conversion rates are different – statistically speaking.

 

Until next time… safe analyzing.

 

*UPDATE* 7/16/07:  Make sure that you use the first page view per visitor - “Unique Page Views” in Google analytics Terms when making this calculation.  The sample size calculation assumes each event is independent  of each other.

Thank You to Mike & Chris!

15 responses so far

Jul 02 2007

Stop Collecting So Much Data…

Published by Wendi under best practices, data mining

… and stop misusing data mining - is Peter Fader’s message to CIO’s. CIO Insight interview with Peter highlights the strengths and weaknesses of applied data mining in the business world and I have to agree with some of his thoughts; especially on the topic of utilizing probabilities to measure the propensity of behavior.

Measuring the probability of users actions can be strong and powerful if used properly. And can be easily done in Excel.

The trap I see so many people fall into is trying to analyzing too many variables at once and not taking the time to even look at what they are throwing into the model. If you really want to you could probably find relationships between how fast the sun rises and the stock market closing rates but does that really make any logical sense? Then why would you try to build relationships between buying behaviors and the fact that they own an Apple iPhone if you are selling shoes? So many marketers want to know every little detail about their customer – demographics, psychographics, what kind of car they drive, etc…

When you throw too much data at a problem you will have a hard time with independency and you need to take careful consideration the structure of your data otherwise your predictions can lead to false outcomes.

Some rules of thumb from my perspective:

  1. Enhance your data with the VOC – take surveys online or telephonically (mailed surveys are costly and too time consuming). This is a great way to get anecdotal data you don’t see in click stream data.
  2. Familiarize yourself with all the variables and truly understand what they mean – not what you think they mean.
  3. Don’t use variables that you can’t reproduce easily. If it’s too hard to calculate, find, or collect from the database then you probably shouldn’t use it. It’s impractical.
  4. Only include variables that make sense and add questionable variables later and determine if they degrade or enhance the predictability. In the end you may not even find a reason to test out the use of those questionable variables. *Make sure to not include variables that are variations of each other. If you include % of visits this month then don’t include the frequency of visits this month too. This can cause problems with multi-collinearity.
  5. Save enough data for testing! Minimum split is 90/10 but recommend at least 80/20 split. That is at a minimum use 90% of your data is used for development and the remaining 10% is for validation of the model. You need to know how predictive your model is before you take it to the market.

Bonus Point -

  1. If you want to get fancy, look at a repeated measures DOE structure for analyzing transactional data.

Until next time… safe analyzing.

3 responses so far

Jun 17 2007

Comparing Population Proportions – A/B Testing

Published by Wendi under statistics, web analytics

Comparing Population Proportions – A/B Testing

Many metrics in web analytics are conveyed as percentages, or population proportions as statisticians like to call it. As I mentioned in my previous post on the Statistics for People Who (Think They) Hate Statistics, percentages are useful in the real world (business data) and I was surprised there was not a section dedicated to this topic. So I thought I would cover a post on comparing population proportions; namely conversion rate for landing pages.

Landing page optimization is one aspect of testing in web analytics. It’s great. You can test almost anything – layout, content, color, tag line, call to action, media, etc… In my scenario we were testing the tag line. Since we are only testing one aspect of the page you can refer to this testing methodology as A/B testing. This is very much different than multivariate testing where numerous “variables” or parts of the page are tested at once and I’ll leave that for another conversation. So for now, we tested one variable – the tag line. The call to action that was defined for a measure of success was a submission of an online lead generation form. Since the form was small in nature, the form was a portion of the landing page and there was no interim steps/pages that may have increased conversion failure.

In testing, you must first define your hypothesis. The hypothesis in this case is that landing page #1 out performed landing page #2. In metrics terms, we are saying that the conversion rate for landing page #1 was better than landing page #2 (with a statistical significance).

Null Hypothesis H0: p1 = p2 (or can be written as p1 - p2 = 0)

“conversion was not different”

Alternative Hypothesis Ha: p1 p2 (or can be written as (p1 - p2 0)

“conversion is different”

Alpha = .05

The delivery of the pages were equally distributed among both pages but there were slight differences and that difference will be included in our calculation.

Landing page #1

Delivered 6,906 times

Conversion yield = 1.71%

Landing page #2

Delivered 6,534 times

Conversion yield = 1.44%

Some might just stop here and say, landing page #1 out performed landing page #2 and move on. But is that really a valid inference? Let’s see.

To test two population proportions you use the following equation:

two population proportions test

The p with the ^ on top is referred to as “p-hat”. “P-hat” is the sample population proportion (the %’s from your data) and is used to estimate the true population proportion.

All the calculations from the above formula can be easily done in excel and can be seen in a sample file here.

The calculated z-score is 1.2563. In excel you can calculate the p-value by utilizing the NORMSDIST() formula. You can determine the critical region or sometimes referred to the “rejection region” for the null hypothesis just by the z-score but from an interpretation standpoint, it’s easier to compare the p-value to the previously defined alpha. Calculating the p-value will help understand whether the difference between the two percentages are statistically different. The p-value is just 1-NORMSDIST(z-score).

= 2*[1 - NORMSDIST(1.2563)]

= 2*[0.105]

= .209

Now, our alpha value was set at .05 per our testing criteria listed previously. Since our p-value > alpha; 0.209 > .05, we Fail to Reject the Null Hypothesis. What does that mean? It means that the difference between the two conversion rates are not statistically significant. Thus technically, even though the conversion rate for landing page #1 was higher than landing page #2 there really wasn’t much of a difference to warrant one having a “better” tag line.

Until next time… safe analyzing.

5 responses so far

Jun 17 2007

Statistics Book Review

Published by Wendi under statistics

I have had a few weeks to read through the book I purchased and wanted to share my thoughts on it’s level of readability, coverage, and ease.  At a first glance, its grabs the attention of the readers and puts you at ease within context.  Statistics isn’t scary and the authors try to remind the readers of that through out the book.  They throw in warm, fuzzy happy faces for a difficulty rating (cute for high school, maybe not so much for the adult readers) for each chapter/topic.

The tips intertwined in the chapters are nice and sometimes inform the reader of advanced topics beyond the book.  But I might find some tips confusing had I not had prior knowledge of the topic at hand (easy enough though, just ignore what you don’t grasp - those tips are not applicable to the content of the book anyway).  All in all, the book is basic in nature but it does go beyond what I was expecting.  For one example the book covers Factorial ANOVAs (analysis of variance).  They don’t go into deep detail of a factorial ANOVA but I was surprised there was a dedicated section to this advanced method.

One thing I found missing was testing with population proportions.  From an applied statistics perspective, I would find a chapter on population proportions to be very helpful in the business world.  I wouldn’t say this missing chapter would be the show stopper for not recommending it but I might hold off and see if they are adding this in round 3 due out  this year.

Also, the ordering of the chapters seems a little odd to me in that they jump into correlation coefficients in chapter 5 then skip around and pause discussing linear regression until chapter 14 (they reintroduce correlation coefficients in chapter 13).  This maybe something that will be changed in the third edition (per some comments I see on Amazon.com).

Overall, the content is easy to read and comprehend, but there is certainly some room for improvement (as most books are always a work in progress).  If you are looking to understand how to do everything in excel for work, I might suggest getting the Excel Edition but keep in mind that excel doesn’t hold the tools for advanced statistical analysis.

Until next time… safe analyzing.

No responses yet

Jun 11 2007

Relationships Take 2

Published by Wendi under statistics

As promised, I wanted to take a deeper dive into the regression data posted in my previous discussion. To bring you back to the topic, I was discussing the relationship between the percent of searches for new SEM keywords and site bounce rate. In my example I regressed these two variables and determined there appeared to be a strong linear relationship between the two. Of course there are other ways of analyzing the response to new keywords in your paid search campaign but with this approach you can derive a statistically sound relationship at a high campaign level which can point you in the direction to look deeper. That way you do not have to look at every single keyword and minimize any unnecessary work and more importantly you can possibly even save time in determining your course of action.

Initially, I calculated the variables of a regression line. Those include the slope and y-intercept. From a calculation standpoint it easier to calculate the slope first then derive the y-intercept as a calculation based on the slope. But technically you don’t have to do this by hand using long algebra as I have included in my excel sample. Instead excel has built-in formulas that make your life so much easier. SLOPE(known Y’s, known X’s) and INTERCEPT(known Y’s, known X’s).

In addition to the linear regression variables, excel provides a quick formula for deriving the Pearson r correlation coefficient. CORREL(known Y’s, known X’s) . From this calculation, it very easy to calculate the coefficient of determination which is what will tell you how strong your relation is as expressed in the correlation coefficient. Most people can eyeball the strength but if you want to get down to an actual measurement of strength I would advise you to calculate the coefficient of determination; especially since it’s relatively easy to compute. All you have to do is square the CORREL() value and your done. Easy as that. So not only do you know the exact relationship being expressed between two variables but you also know how strong that relationship really is.

OK, back to my example. So in my sample I calculated the correlation coefficient and I found that the r = 0.9297. By inspection the rule of thumb of correlation strength is roughly (this applies to negative and positive r values):

· Between .8 and 1.0 very strong

· Between .6 and .8 moderately strong

· Between .4 and .5 moderate

· Between .2 and .4 moderately weak

· Between 0 and .2 very weak

In my case, I have a pretty strong relationship by initial review, but how strong is it really? Calculating the coefficient of determination will tell us how strong. r2 = 0.8645. Interpreting this value means that 86.4% of the variance in bounce rate can be explained by percent of searches of the new SEM terms. Or you can look at it in the opposite fashion and say that 13.4% of variability in bounce rate is unexplained at this point. Technically speaking, 86% coverage of variability to pretty darn good. This would give me enough reason to dig a little deeper into the new search terms to find the true culprits of increasing the bounce rate on my site.

Until next time… safe analyzing.

No responses yet

Jun 03 2007

Relationships are a thing of the past…

Published by Wendi under statistics, web analytics

Or are they? I would argue that at the heart of statistics is a line. A line that best fits a sample of data. What is this line, well it’s called a Least Squares Regression Line.

A <linear> regression line can be used to describe a collage of data points that have a linear relationship … no more, no less. There is power in knowing whether your data has a linear relationship or not and if it does how good a fit this line really is to the data.

Say you wanted to understand why you were seeing various spikes in bounce rates on your site and you had a hunch that the paid keywords that you recently added were the culprit. To prove, or better yet hopefully disprove that this hypothesis is to true, run a regression on a few numbers to see if there is a relationship between the two.

If you were awake in algebra II back in high school, you may remember the equation of a line, but just in case you didn’t here it is…

y = mx + b where m = slope and b = y-intercept

In most statistics books you probably won’t find reference to a line with the same letters but what they substitue still retians the same meaning. Since statisticians like to separate themselves from mathematicians we like to have our own way of writting an equation. So from here on out I’ll refer to the slope as b1 and the y-intercept as b0.

So, back to my problem: I think that my new keywords I added to my SEM campaign are driving my site bounce rate up but I am really not for sure. To check this out, I took % of Searches (delivery of impressions) and Bounce Rates for a 10 week period since the new keywords were added and regressed site bounce rates against % of searches. What I found out was what I feared, there was a direct relationship between the two. So in essence, these new keywords had a negative impact on my site traffic. Yikes! I am going to take those down right now!

Ok, back to the regression details, to run your regression there are a few simple calculations that you need to prepare that can be inserted into the bigger formula. In my excel file Least Squares Regression you can walk through each step with formulas as I walk through them here:

Regression Variables

In the excel file all the variables and calculations are laid out piece by piece and initially it helps to calculate each X*Y, X*X (aka X2), etc… then sum or multiply through where needed. Excel even provides shortcut formulas but it helps to understand what they are doing first before you use them. I have also included a few excel short cut formulas for calculating the slope, y-intercept, and the correlation coefficient.

Take some time to dig through the excel file and next time I am going to go further into what all the pieces mean.

Until next time… safe analyzing.

3 responses so far

Jun 03 2007

Books, To be continued…

Published by Wendi under statistics

I have had several inquiries about good books on entry statistics and to be honest I don’t own really any basic statistics books. A majority of the books I have collected over the years have been those chosen by my professors and read like, well, math books (sorry to all my wonderful professors - but really, if you would have chosen anything less, I guess I would have had no reason to come listen to all of your insightful lectures). So from the long list of those that I have studied from, I am not sure if I would really recommend any of them as a starter book.

I do know of a great on-line resource book published by StatSoft that I have referenced in the past. The great thing is the on-line book is free and it walks you through basic concepts of statistical terminology as well as basic statistics and goes through more advanced methodologies as well. The down side is that the examples included in the material are focused on the StatSoft software but the theories and practices a great starter. Especially if you are not sure if you are ready to invest into a hardback resource.

In an effort to stock my library with more “user friendly” statistics books I recently purchased Statistics for People Who (Think They) Hate Statistics with SPSS Student Version 13.0 2nd Edition. Once it arrives and I have had some time to thumb through it a bit I will let you know how this books reads. The one reason why I chose this book was it’s title… very catchy. Ok, not really but it is catchy. I chose it because it comes with supporting software that is a great tool for analyzing web data from a behavioral standpoint. SPSS stands for Statistical package for the Social Sciences and I have always believed that analyzing web data is the same as analyzing a data set from a social sciences experiment. I look forward to reviewing the book and will let you know my thoughts very soon. Note: If the book is good, there is a 3rd edition soon to be released so if you are interested in this book you may want to wait for the 3rd edition.

For those of you who are eager to purchase now…. As a rule of thumb, I find that statistics books pertaining to “business” statistics seem to be fairly basic in nature.

Happy reading.

Until next time… safe analyzing.

4 responses so far

Next »