[ncdnhc-discuss] [long] The NCDNHC's .org report is numerically inconsistent.

Tue Aug 20 22:48:29 CEST 2002

On 2002-08-20 15:18:42 -0400, Milton Mueller wrote:

>Annex 5 (an Excel file) was not formatted for public view, it was 
>used purely for calculation purposes. That is why it appears  
>confusing. We didn't sort the data by the final scores, which is 
>what you complained about in your first message, because we didn't 
>think it would show up in the final report.

Well, the table on page 49 is in fact a bit worse than just that:  
The names column is sorted by the actual final score.  The  
responsiveness column is sorted in decreasing order (ups, not  
precisely - neustar and register.org fall out of that pattern - ok,  
it's just not sorted in any sensible way at all ;).  The support and 
differentiation columns are once again sorted by the actual final  
score.  Consequently, the "total" column of that table is just  
garbage, since it adds up scores coming from different applications.

The more I think about this, the more this looks like someone did  
serious damage by trying to sort this table.

Mh.  Looking more closely, it seems like the same numbers also made  
it into the table on page 47, but in a different order.  This entire 
appendix 5 is extremely confusing and should be cleaned up and  
re-published.

>ISOC really is a 21.25, check your arithmetic. I think you made 
>the arithmetic mistake this time ;-)

I don't think so (I hope this makes it to you in reasonably readable 
form):

ISOC	3	3	5	5	3	5	2
weight	2.0	0.25	0.5	1.0	1.0	1.0	0.5

	6	+ 0.75	+ 2.5	+ 5	+ 3	+ 5	+ 1 = 23.25

I'm attaching a spreadsheet for you.  Column I has your results,  
column J has a forumla.  They _should_ agree...

>I don't agree with your critique of averaging the
>rankings. The problem is that the different dimensions
>of evaluation - differentiation, public support, and 
>responsiveness/governance - are not commensurable. 
>In ranking applications on each of the three dimensions,
>we created distinct numerical scales - one for "public 
>support" one for differentiation, and one for governance. 
>Each of these scales is SOMEWHAT arbitrary, but 
>does have internal consistency in measuring the specific
>thing it is measuring.  

Agreed, with the exception of "public support".  More about that  
further below.

>But to then treat all 3 of those scales as if they could be 
>measured against each other takes the arbitrariness well past the 
>breaking point. We don't really know HOW a score of 21.75 in 
>"responsiveness" relates to a score of 84 in "public support." And 
>it is, in my opinion, bad practice to "normalize" or combine them 
>in any way. So the ONLY useful measure of an overall ranking, 
>imho, is to average the rankings themselves, or simply to look at 
>the three rankings together.

Looking at the three rankings together - in the way the Gartner  
people did this with the technical evaluation - is probably the best 
approach: Give the respective scores in a matrix, and color-code the 
fields according to the tiers in the individual categories (see  
results.xls, attached).  That makes for a very nice, graphical  
presentation to the board which clearly, and at a glance,  
demonstrates who has what strenghts and weaknesses,

Of course, you are right that arbitrary normalizations are a  
considerable problem.  However, you are in that business anyway by 
summing up scores for various aspects of governance and 
responsiveness in table 2, for instance.  Your judgement on the 
relative importance of these aspects is encoded in the weights you 
apply.

Pretending that you don't make that judgement by averaging ranks 
just doesn't work: Because, by just averaging ranks, you give the  
same weight to relatively unimportant, minor differences (for  
instance, the difference between Unity and GNR in the responsiveness 
scoring; same tier) as to major differences (like the one between  
Unity and register.org on the differentiation criteria scale;  
different tiers).  

On their respective scales, you even make the difference of  
assigning these to different tiers, and in the final evaluation, you 
just say that the differences are basically the same thing.  That  
doesn't make much sense to me.

Now, for the "public support" category (see my blog for a much more  
 - possibly too - polemic version of this): In this category, I have  
the strongest reservations.  To begin with, the other categories  
may include some arbitrary judgement on your side, but there is  
certainly a lot of valuable information in this, at the very least  
on the "tier" level - in particular since tiers have a reasonable  
safety margin between them in most cases.

The only source you can draw from for public support are the  
postings to ICANN's .org forum.  The problem with this is that it's  
a classic, self-selected (or, even worse, orchestrated) survey.   
What do these numbers really tell us?  The answer is: They only tell 
us how good the various proponents were at mobilizing their  
respective "fan clubs".  We just don't know how representative these 
results may be for the population of .org domain name holders.   
Further, we can be pretty sure that these are not informed comments  
 - who does actually have the time to read all that material?

(I notice that you tried to find out about this last point in your 
verification e-mail, but I don't seem to be able to find the results 
of this undertaking.)

Thus, this is, from the very beginning, the worst and most  
insignificant input you have.  To make it still worse, you have to  
_estimate_ the number of ISOC class B responses on page 23, because  
you can't reasonably make the distinction between class As and class 
Bs for this application (which may indicate that the distinction was 
the wrong approach to solve this particular problem).  Also, in the 
evaluation in that chapter, you assign the weight of one class A 
response to 5 class Bs (you even write that this is arbitrary). 
That's the starting point for the scores and values on page 22, and 
for the "averaged rating" evaluation.

But for the "score-based" approach, you then use some kind of  
(supposed to be) pseudo-logarithmic rating with additional weights,  
described on page 43.  There is no rationale for this, and it leads  
to interesting results.  Just look at the assignment of applications 
to tiers: In the score-based approach, GNR (like Neustar) has a  
score of 3, but it's in the C tier (Neustar is in B), while dotorg  
foundation has 1, and is in the B tier.  That doesn't make sense.

Now, what does this mess tell us, ultimately?  That you have tried  
to make sense out of numbers which don't make any.  You are trying  
to force some meaning onto these numbers which just may not be  
there.

My suggestion: Don't even try.  Don't try to use these particular  
numbers as a basis for anything - except, perhaps, in the case that  
_all_ other scores are equal.  And, in particular, please don't  
intermingle them with the numbers from the evaluation of the other  
criteria.  Mixing these numbers in _any_ way has a strong smell of  
manipulation around it.  That smell doesn't make your evaluation  
stronger - in fact, it even weakens the stronger points of your  
evaluation, responsiveness and differentiation.

Finally, one question which doesn't have anything to do with  
numbers: I have looked a bit at your evaluation of the probable  
"winner" of the entire process, ISOC.  In the "responsiveness"  
category, you write:

>ISOC proposes a number of very innovative services designed to  
>respond to the needs of noncommercial entities, not just  
>registrants generally. ISOC therefore received a High rating in  
>this category.  Finally, the Committee notes that although it has  
>made no commitment to support good works, profits from the  
>registry will go to ISOC.  On the arguable proposition that  
>support for IAB/IETF standards processes constitutes good works we 
>awarded ISOC a Low ranking in this category rather than a None. 

I'm sorry, but I fail to find these services.  I find services which 
are generally useful for registrants, and services useful for IP  
owners.  But none specifically targeted at noncommercial entities.  
Maybe you can shed some light on this?

I apologize for the length of this letter.

Kind regards,
-- 
Thomas Roessler                        <roessler at does-not-exist.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sc.xls
Type: application/excel
Size: 14848 bytes
Desc: not available
URL: <http://lists.ncuc.org/pipermail/ncuc-discuss/attachments/20020820/a2265784/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: results.xls
Type: application/excel
Size: 8192 bytes
Desc: not available
URL: <http://lists.ncuc.org/pipermail/ncuc-discuss/attachments/20020820/a2265784/attachment-0001.bin>