<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Remodeling Precision</title>
	<atom:link href="http://comonad.com/reader/2009/remodeling-precision/feed/" rel="self" type="application/rss+xml" />
	<link>http://comonad.com/reader/2009/remodeling-precision/</link>
	<description>types, (co)monads, substructural logic</description>
	<lastBuildDate>Sat, 15 Oct 2022 17:33:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Aide Chatfield</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-91959</link>
		<dc:creator>Aide Chatfield</dc:creator>
		<pubDate>Fri, 23 Dec 2011 16:53:22 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-91959</guid>
		<description>Hello, you used to write excellent, but the last few posts have been kinda boring¡­ I miss your great writings. Past few posts are just a little bit out of track! come on!</description>
		<content:encoded><![CDATA[<p>Hello, you used to write excellent, but the last few posts have been kinda boring¡­ I miss your great writings. Past few posts are just a little bit out of track! come on!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward Kmett</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11828</link>
		<dc:creator>Edward Kmett</dc:creator>
		<pubDate>Wed, 16 Sep 2009 22:33:06 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11828</guid>
		<description>Hello Bob,

Thanks for the references! The TREC IR Measures overview seems to be exactly what I was looking for.</description>
		<content:encoded><![CDATA[<p>Hello Bob,</p>
<p>Thanks for the references! The TREC IR Measures overview seems to be exactly what I was looking for.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11812</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Wed, 16 Sep 2009 18:01:31 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11812</guid>
		<description>For IR, no one ever measures true precision or recall, precisely because the denominator is so large that you can&#039;t annotate all the docs as relevant or not.  As &quot;Pseudonym&quot; said, IR researchers often measure the precision of the top results using measures like precision-at-N (the precision after N documents), or mean average precision (MAP), a mean of precisions-at-N for a sequence of N.  Sometimes they measure area under the precision/recall or ROC curves (aka AUC).  

The big web search engines, in particular, are concerned with precision &quot;above the fold&quot; (in the newspaper sense).  That is, if you take a default install of IE or Firefox and do a search on Bing, Yahoo, or Google, what&#039;s the precision for the number of results you can see on the screen.  The chance to continue browsing is not continuous.  Basically, that would induce a kink in your probability of keeping going to the next items.

There are also applications which are highly recall oriented, like curating databases of protein interactions from the literature or intelligence analysis over the news.  It&#039;d still fit your model, it&#039;d just give you a different likelihood of looking at another item.  We&#039;re particularly interested in precision at 99% recall or 99.9% recall for these situations.  

The other major issue to consider is diversity of results.  If I send you ten different versions of the same information, it&#039;s not very useful even if they&#039;re all &quot;relevant&quot; in the binary relevant/not relevant sense.  The problem is that to measure this idea, you need a notion of relative information contribution of a new result given a set of other results.

What the IR folks call precision is what the epidemiologists call &quot;positive predictive accuracy&quot;.  It&#039;s basically the likelihood that you have a condition if you test positive for it, and it&#039;s very useful exactly as stated in that context.

You might want to consult the section of the Wikipedia entry &lt;a href=&quot;http://en.wikipedia.org/wiki/Information_retrieval#Performance_measures&quot; rel=&quot;nofollow&quot;&gt;Information Retrieval&lt;/a&gt; about performance measures, or the &lt;a href=&quot;http://trec.nist.gov/pubs/trec15/appendices/CE.MEASURES06.pdf&quot; rel=&quot;nofollow&quot;&gt;TREC IR Measures&lt;/a&gt; overview.</description>
		<content:encoded><![CDATA[<p>For IR, no one ever measures true precision or recall, precisely because the denominator is so large that you can&#8217;t annotate all the docs as relevant or not.  As &#8220;Pseudonym&#8221; said, IR researchers often measure the precision of the top results using measures like precision-at-N (the precision after N documents), or mean average precision (MAP), a mean of precisions-at-N for a sequence of N.  Sometimes they measure area under the precision/recall or ROC curves (aka AUC).  </p>
<p>The big web search engines, in particular, are concerned with precision &#8220;above the fold&#8221; (in the newspaper sense).  That is, if you take a default install of IE or Firefox and do a search on Bing, Yahoo, or Google, what&#8217;s the precision for the number of results you can see on the screen.  The chance to continue browsing is not continuous.  Basically, that would induce a kink in your probability of keeping going to the next items.</p>
<p>There are also applications which are highly recall oriented, like curating databases of protein interactions from the literature or intelligence analysis over the news.  It&#8217;d still fit your model, it&#8217;d just give you a different likelihood of looking at another item.  We&#8217;re particularly interested in precision at 99% recall or 99.9% recall for these situations.  </p>
<p>The other major issue to consider is diversity of results.  If I send you ten different versions of the same information, it&#8217;s not very useful even if they&#8217;re all &#8220;relevant&#8221; in the binary relevant/not relevant sense.  The problem is that to measure this idea, you need a notion of relative information contribution of a new result given a set of other results.</p>
<p>What the IR folks call precision is what the epidemiologists call &#8220;positive predictive accuracy&#8221;.  It&#8217;s basically the likelihood that you have a condition if you test positive for it, and it&#8217;s very useful exactly as stated in that context.</p>
<p>You might want to consult the section of the Wikipedia entry <a href="http://en.wikipedia.org/wiki/Information_retrieval#Performance_measures" rel="nofollow">Information Retrieval</a> about performance measures, or the <a href="http://trec.nist.gov/pubs/trec15/appendices/CE.MEASURES06.pdf" rel="nofollow">TREC IR Measures</a> overview.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward Kmett</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11810</link>
		<dc:creator>Edward Kmett</dc:creator>
		<pubDate>Wed, 16 Sep 2009 16:57:17 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11810</guid>
		<description>Jim, that sounds promising. I&#039;ll take a look!</description>
		<content:encoded><![CDATA[<p>Jim, that sounds promising. I&#8217;ll take a look!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jim F</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11805</link>
		<dc:creator>Jim F</dc:creator>
		<pubDate>Wed, 16 Sep 2009 15:35:28 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11805</guid>
		<description>Check out the work by Dr. John Wilbur at the National Center for Biotechnology Information at the National Library of Medicine.  He was writing papers ca. 1991-93 about measuring performance of document searching, mostly dealing with the medical literature in MEDLINE.

What he ended up with was a similar idea, but base on the information theoretic entropy.  He called it relevance information.

If we assume R relevant documents in a corpus of D documents, the probability of any given document being relevant is uniformly R/D.  If we score and then rank them by some procedure we can then assume the probability is no longer uniform, but decreasing (hopefully) sharply over the rankings. The effectiveness of the scoring scheme is measured by the decrease in entropy over the whole distribution.

He examined some of the properties of this measure and felt that it captured the best of both precision and recall in one number and was reasonably robust, but AFAIK it never caught on in the literature.  

I can&#039;t find the original paper, and he seems to have moved away from it in any of his more recent papers.

(Disclosure -- I used to work for him)

Wait -- I found it:

 An Information Measure of Retrieval Performance (1992)
by W J Wilbur 

http://citeseerx.ist.psu.edu/showciting?cid=1837654</description>
		<content:encoded><![CDATA[<p>Check out the work by Dr. John Wilbur at the National Center for Biotechnology Information at the National Library of Medicine.  He was writing papers ca. 1991-93 about measuring performance of document searching, mostly dealing with the medical literature in MEDLINE.</p>
<p>What he ended up with was a similar idea, but base on the information theoretic entropy.  He called it relevance information.</p>
<p>If we assume R relevant documents in a corpus of D documents, the probability of any given document being relevant is uniformly R/D.  If we score and then rank them by some procedure we can then assume the probability is no longer uniform, but decreasing (hopefully) sharply over the rankings. The effectiveness of the scoring scheme is measured by the decrease in entropy over the whole distribution.</p>
<p>He examined some of the properties of this measure and felt that it captured the best of both precision and recall in one number and was reasonably robust, but AFAIK it never caught on in the literature.  </p>
<p>I can&#8217;t find the original paper, and he seems to have moved away from it in any of his more recent papers.</p>
<p>(Disclosure &#8212; I used to work for him)</p>
<p>Wait &#8212; I found it:</p>
<p> An Information Measure of Retrieval Performance (1992)<br />
by W J Wilbur </p>
<p><a href="http://citeseerx.ist.psu.edu/showciting?cid=1837654" rel="nofollow">http://citeseerx.ist.psu.edu/showciting?cid=1837654</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward Kmett</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11800</link>
		<dc:creator>Edward Kmett</dc:creator>
		<pubDate>Wed, 16 Sep 2009 12:10:50 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11800</guid>
		<description>Pseudonym:

I agree, although the precision metric for a given recall level can be calculated using the same machinery.

I&#039;m happy to learn though. References are welcome! I hardly suspect that I came up with anything novel during a 2 hour whiteboard session two years back and a similar amount of time hacking things up in perl. ;) 

I just haven&#039;t seen anything similar written up.</description>
		<content:encoded><![CDATA[<p>Pseudonym:</p>
<p>I agree, although the precision metric for a given recall level can be calculated using the same machinery.</p>
<p>I&#8217;m happy to learn though. References are welcome! I hardly suspect that I came up with anything novel during a 2 hour whiteboard session two years back and a similar amount of time hacking things up in perl. ;) </p>
<p>I just haven&#8217;t seen anything similar written up.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pseudonym</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11791</link>
		<dc:creator>Pseudonym</dc:creator>
		<pubDate>Wed, 16 Sep 2009 07:52:16 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11791</guid>
		<description>Precision isn&#039;t usually interesting on its own.  What&#039;s more important for a ranked search (like Google) is what proportion of the first n results are relevant for ALL n.  This is why in real IR papers, you usually see a precision-recall curve.

You will occasionally also see precision-at-n for various n which are multiples of a hypothetical screenful (e.g. n=20).</description>
		<content:encoded><![CDATA[<p>Precision isn&#8217;t usually interesting on its own.  What&#8217;s more important for a ranked search (like Google) is what proportion of the first n results are relevant for ALL n.  This is why in real IR papers, you usually see a precision-recall curve.</p>
<p>You will occasionally also see precision-at-n for various n which are multiples of a hypothetical screenful (e.g. n=20).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edward Kmett</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11782</link>
		<dc:creator>Edward Kmett</dc:creator>
		<pubDate>Wed, 16 Sep 2009 02:34:39 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11782</guid>
		<description>Wren: 

Yeah, you&#039;ve got the metric. I used this once a couple of years back to pretty good effect. 

That is a very good point about diversity. The metric I actually used did a much simpler version of what you proposed in that I just considered a pre-ranked training set. And if you found an item that should be ranked up to, say, 4th in the 3rd slot you earned 3/4ths of the points, clamped to 1. Ties can them be allowed in the list of expected rankings when the distinctions between them aren&#039;t clear. This doesn&#039;t address diversity directly, but it does ensure that you can reach 100% precision given perfect ordering.</description>
		<content:encoded><![CDATA[<p>Wren: </p>
<p>Yeah, you&#8217;ve got the metric. I used this once a couple of years back to pretty good effect. </p>
<p>That is a very good point about diversity. The metric I actually used did a much simpler version of what you proposed in that I just considered a pre-ranked training set. And if you found an item that should be ranked up to, say, 4th in the 3rd slot you earned 3/4ths of the points, clamped to 1. Ties can them be allowed in the list of expected rankings when the distinctions between them aren&#8217;t clear. This doesn&#8217;t address diversity directly, but it does ensure that you can reach 100% precision given perfect ordering.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: wren ng thornton</title>
		<link>http://comonad.com/reader/2009/remodeling-precision/comment-page-1/#comment-11775</link>
		<dc:creator>wren ng thornton</dc:creator>
		<pubDate>Tue, 15 Sep 2009 23:37:50 +0000</pubDate>
		<guid isPermaLink="false">http://comonad.com/reader/?p=151#comment-11775</guid>
		<description>So you&#039;re suggesting \sum_n (p^n)*R(n) where R is a function taking the nth document to a relevance score on the 0..1 interval? That sounds very familiar, though I can&#039;t quite pull up a name or reference for it. In particular it&#039;s similar to measures of effective utility. The (true) utility of a particular state is fixed over time, but the effective utility of an action resulting in that state will be lesser the longer it takes between the action and the resulting payoff (due to lost opportunity cost, random chance of not reaching the goal, etc). This sort of model is used often in game theory and similar approaches to AI and complex systems; so that may be somewhere to start looking.

Another interesting enhancement for this metric is that the R function needn&#039;t select only from {0,1}. One particular direction to take this is to define R(n) dynamically in terms of R(0)..R(n-1) such that, for example, duplicate content is not considered relevant. This would allow for a more information theoretic approach where we want to maximize the diversity of content up front rather than presenting the same content over and over again. This is especially important if we assume the searcher will read from multiple documents, but it also provides some stability if we don&#039;t trust our ability to discern relevance.</description>
		<content:encoded><![CDATA[<p>So you&#8217;re suggesting \sum_n (p^n)*R(n) where R is a function taking the nth document to a relevance score on the 0..1 interval? That sounds very familiar, though I can&#8217;t quite pull up a name or reference for it. In particular it&#8217;s similar to measures of effective utility. The (true) utility of a particular state is fixed over time, but the effective utility of an action resulting in that state will be lesser the longer it takes between the action and the resulting payoff (due to lost opportunity cost, random chance of not reaching the goal, etc). This sort of model is used often in game theory and similar approaches to AI and complex systems; so that may be somewhere to start looking.</p>
<p>Another interesting enhancement for this metric is that the R function needn&#8217;t select only from {0,1}. One particular direction to take this is to define R(n) dynamically in terms of R(0)..R(n-1) such that, for example, duplicate content is not considered relevant. This would allow for a more information theoretic approach where we want to maximize the diversity of content up front rather than presenting the same content over and over again. This is especially important if we assume the searcher will read from multiple documents, but it also provides some stability if we don&#8217;t trust our ability to discern relevance.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
