<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.1.3" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Stop Collecting So Much Data…</title>
	<link>http://coremarkanalytics.com/blog/2007/07/02/stop-collecting-so-much-data%e2%80%a6/</link>
	<description>Web Analytics Blog - Paving the way to understanding web data as it relates to statistics and other methodologies.</description>
	<pubDate>Thu, 20 Nov 2008 20:58:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.1.3</generator>

	<item>
		<title>By: Jacques Warren</title>
		<link>http://coremarkanalytics.com/blog/2007/07/02/stop-collecting-so-much-data%e2%80%a6/#comment-22</link>
		<author>Jacques Warren</author>
		<pubDate>Tue, 03 Jul 2007 12:39:22 +0000</pubDate>
		<guid>http://coremarkanalytics.com/blog/2007/07/02/stop-collecting-so-much-data%e2%80%a6/#comment-22</guid>
					<description>Hi Wendi,

Nice post. I just read that article in CIO Insight after reading Jim Novo's blog. I think you illustrate well how false dependencies can be made. 

Your point:

1. is right on: I have been an advocate for adding attitudinal analysis to the behavioral one for a long time. But wouldn't "reasons" for visits add to the extra variables? Isn't there a risk to add to false dependencies? This being said, I am a big proponent of free will, and believe there is still some left in us, the consumers. This means I find a lot of explanation in the "why" people say they do what they do.

3. Nice advise. Too much friction definitely don't help the whole process.

4. "variables that make sense", yes, but I think this is the whole question here: how does one recognize what makes sense, I mean, with the stuff that's not obvious (i.e. the wheather, etc.)? Isn't there an element of discovery and learning?

5. Could you explain a little more?

6. ?</description>
		<content:encoded><![CDATA[<p>Hi Wendi,</p>
<p>Nice post. I just read that article in CIO Insight after reading Jim Novo&#8217;s blog. I think you illustrate well how false dependencies can be made. </p>
<p>Your point:</p>
<p>1. is right on: I have been an advocate for adding attitudinal analysis to the behavioral one for a long time. But wouldn&#8217;t &#8220;reasons&#8221; for visits add to the extra variables? Isn&#8217;t there a risk to add to false dependencies? This being said, I am a big proponent of free will, and believe there is still some left in us, the consumers. This means I find a lot of explanation in the &#8220;why&#8221; people say they do what they do.</p>
<p>3. Nice advise. Too much friction definitely don&#8217;t help the whole process.</p>
<p>4. &#8220;variables that make sense&#8221;, yes, but I think this is the whole question here: how does one recognize what makes sense, I mean, with the stuff that&#8217;s not obvious (i.e. the wheather, etc.)? Isn&#8217;t there an element of discovery and learning?</p>
<p>5. Could you explain a little more?</p>
<p>6. ?</p>
]]></content:encoded>
				</item>
	<item>
		<title>By: Wendi</title>
		<link>http://coremarkanalytics.com/blog/2007/07/02/stop-collecting-so-much-data%e2%80%a6/#comment-23</link>
		<author>Wendi</author>
		<pubDate>Tue, 03 Jul 2007 13:42:22 +0000</pubDate>
		<guid>http://coremarkanalytics.com/blog/2007/07/02/stop-collecting-so-much-data%e2%80%a6/#comment-23</guid>
					<description>Hi Jacques, Thanks for the thoughts.  

4. When building models you don't want to include variables that may not make sense when you try to explain it.  In some cases you have to be cognoscent of protected classes in law.  You can't build a credit score with demographics like age, race, etc....  Also, I am thinking of this from a practicality standpoint.  Try to make the model as simple as possible.  That makes it easier to implement to future events.    But you are right in it does take away some of the surprise element.  

5.  When you use data mining techniques to build models you need to test the predictive accuracy, computational speed, robustness, scalability, and interpretability (point #4 above).  Think of this as taking a test twice with the same questions.  You'd expect to get a better score the second time around on the same set of questions but if you take a test on the same topic but with two sets of questions you may or may not do better the second time around.  So many people don't hold back enough data to conduct a sound validation of the model.  The validation process helps identify the accuracy of the overall model.  

6.  There is a DOE (design of experiment) that I think applies better to transactional data or data that has an order/sequence which is called Repeated Measures.  This technique is not used enough (in my opinion).  So many people just use aggregate data which may be enough but the strength in knowing when something will happen with better precision is golden.  Peter Fader touches on this in his interview.    

Hope this helps!  
Wendi</description>
		<content:encoded><![CDATA[<p>Hi Jacques, Thanks for the thoughts.  </p>
<p>4. When building models you don&#8217;t want to include variables that may not make sense when you try to explain it.  In some cases you have to be cognoscent of protected classes in law.  You can&#8217;t build a credit score with demographics like age, race, etc&#8230;.  Also, I am thinking of this from a practicality standpoint.  Try to make the model as simple as possible.  That makes it easier to implement to future events.    But you are right in it does take away some of the surprise element.  </p>
<p>5.  When you use data mining techniques to build models you need to test the predictive accuracy, computational speed, robustness, scalability, and interpretability (point #4 above).  Think of this as taking a test twice with the same questions.  You&#8217;d expect to get a better score the second time around on the same set of questions but if you take a test on the same topic but with two sets of questions you may or may not do better the second time around.  So many people don&#8217;t hold back enough data to conduct a sound validation of the model.  The validation process helps identify the accuracy of the overall model.  </p>
<p>6.  There is a DOE (design of experiment) that I think applies better to transactional data or data that has an order/sequence which is called Repeated Measures.  This technique is not used enough (in my opinion).  So many people just use aggregate data which may be enough but the strength in knowing when something will happen with better precision is golden.  Peter Fader touches on this in his interview.    </p>
<p>Hope this helps!<br />
Wendi</p>
]]></content:encoded>
				</item>
</channel>
</rss>
