[discuss] Possible approaches to solving "problem no. 1"

Mon Feb 17 13:45:14 UTC 2014

Steve:
Thanks for taking the time to provide this extremely useful breakdown of the issues. I am munching on it for breakfast....

From: Steve Crocker [mailto:steve at shinkuro.com]
Sent: Sunday, February 16, 2014 7:36 PM
To: Milton L Mueller
Cc: Steve Crocker; Patrik Fältström; Ian Peter; George Sadowsky; discuss at 1net.org
Subject: Re: [discuss] Possible approaches to solving "problem no. 1"

Milton,

I think different people are focused on different risks.  Let me offer a framework that will allow us to describe the different risks and the various ways to mitigate them.

There are essentially two kinds of errors, false positives and false negatives, and three possible sources of these errors.

The false positive form of error is when a change is made that shouldn't have been made.  This may be through innocent error, hardware or software malfunction or deliberate, malicious intent.

The false negative form of error is when a change that should have been made wasn't.  As with the false positives, this may be through innocent error, hardware or software malfunction or deliberate, malicious intent.

The three sources of either of these errors are, broadly:

  *   Within the operations of the TLD, i.e. by the staff or the systems within the registry.
  *   Within the management of the root zone.  This includes ICANN, Verisign and NTIA.  For purposes of this note, these all function together.  I would have included the root server operators if I had written this a few years ago, but with DNSSEC now operational, the root server operators really don't have an opportunity to introduce any errors into the system.  Well, I suppose they might change or omit entries, but it would be detected instantly.
  *   External to either the TLD operator or the root zone managers, i.e. by a third party interfering with the communications between them or inserting false communications.

For each of the six combination of source and type of error it's relevant to consider the potential that the error will occur, the consequences of such an error, and the possible ways to prevent or mitigate the potential error.

The first thing to consider is the difference in consequences of false positives versus false negatives.  False positives are generally very bad.  False negatives are generally no more than minor inconveniences.  A false negative is when a requested change is not put into effect.  Hence, the status remains the same, and the parties get another chance to attempt a change.  Thus, false negatives are delays in making a requested change.  As I indicated earlier, changes are typically fairly rare.  It's considered common practice among the TLD operators to plan changes well in advance of the actual time they're needed and then proceed through the changes carefully and slowly.  Addition or removal of an NS record corresponds to putting a new name server into operation or removing one from operation.  A normal TLD operator usually allows some number of weeks to do this.  It can, of course, be done much more quickly, and in very rare circumstances, it's been done so, but the norm is *much* slower.  So, three of the six combinations are essentially benign and we can focus our attention on the three sources of false positives.

Recall that a false positive is when a change happens that should not have.  The change might be the addition of a new name server, the change of address associated with a name server, or the removal of a name server.  I am leaving out the creation of a new TLD or the complete removal of a TLD.  These could also be fit into this framework, but I think it's useful to treat them as qualitatively distinct.

If the error occurs within the TLD operator, it's a bit difficult to catch.  How is one to know whether a requested change is coming from the proper person within the TLD operator or whether there's been some sort of system failure?  Although there's no perfect solution, the root operations team, i.e. ICANN's IANA team, Verisign's team and NTIA, all exert some level of oversight and raise a flag if they seem something that seems amiss.  ICANN's IANA team is the one that receives requests and they do the primary vetting before passing it onward, so they're the ones that catch most of possible false positives originating from the TLD operator.

Errors created by third parties have been attempted occasionally but not often and, to my knowledge, never successfully.  The IANA folks check the credentials of the person requesting the change and cross check the communications carefully.

The last possible source of false positive error is within the root managers.  You and others have focused on the potential that politically motivated actors might force a change.  To my knowledge, this has never happened, there is a strong ethic in place not to attempt to do so, and there is really little value in creating such an error.  The usual nightmare scenario that's discussed is to imagine the U.S. government decides to take country X off the Internet by removing its entry from the root zone, and that it does so by bringing irresistible force against ICANN and Verisign to do this.  Root zone entries have a 48 hour time-to-live (TTL), which means it will take two days before all of the caches around the world are drained of the old entry.  Put a another way, the impact would be about 2% per hour.  However, the change would be noted very quickly, probably within minutes, and alarms would sound around the world.  System administrators and other technical people would quickly adjust the operations of their systems to mitigate the impact.  And the damage to the credibility of the United States would be quick and brutal.

A more likely scenario is some sort of accidental error due to either human or system error.  There was an erroneous publication of a part of an entry about a decade ago.  It did not disable the TLD and it was fixed quickly.  It also led to some improvements in the coordination and checking among the root managers.

Turning to your points, yes, it is theoretically possible for someone within the root zone management process to make unauthorized changes to the root zone, and there may be value in strengthening the technical processes to make it impossible to make a change without the cooperation of the TLD operator or an extraordinary process that requires enough people to be safe from capture.

The other thing you allude briefly to is that Verisign makes changes to the COM zone quickly and frequently with far simpler procedures.  That's true, but it is also true that some of the names registered under COM have been hijacked.  The COM zone is large.  It contains more than 100,000,000 names -- far more; I'm using an old and easily remembered estimate -- and the number of hijacks is fairly small, so the rate of hijacking is quite low.  Nonetheless, it does happen, and the consequences are usually quite unpleasant.  Most of the incidents are not reported publicly.  However, see SSAC's reports SAC 007 [1], SAC 028 [2] and
SAC 040 [3] for some examples and analyses.  The root zone is, of course, much much smaller, roughly five orders of magnitude smaller even after the addition of the new gTLDs, and much greater care is taken to avoid accidental or malicious changes to a TLD operator's root zone entry.

I hope this helps you and others to understand the various places where errors might occur, their consequences and their possible mitigations.

Steve

[1] SAC 007: http://archive.icann.org/en/announcements/hijacking-report-12jul05.pdf

[2] SAC 028: http://www.icann.org/en/groups/ssac/documents/sac-028-en.pdf

[3] SAC 040: http://www.icann.org/en/groups/ssac/documents/sac-040-en.pdf

On Feb 16, 2014, at 4:09 PM, Milton L Mueller <mueller at syr.edu<mailto:mueller at syr.edu>> wrote:

Steve:

-----Original Message-----
From: Steve Crocker [mailto:steve at shinkuro.com<http://shinkuro.com>]

Add and remove are obviously sensitive operations and require careful approval
by authorities outside of IANA's clerical role.

I would question this, or at least ask for elaboration.

Obviously there is a security risk in any process that alters RZF data, in that it could be exploited by someone to achieve mischievous or malevolent objectives, especially when such changes are a result of an automated process.

What I don't understand is how adding a "careful approval by authorities" makes anything more secure - if by that you mean manual review of every change by an essentially political entity such as occurs now. So question #1 is "What kind of 'authorities' are you talking about?" Question #2 is "what risks are added or magnified by circumventing root zone changes via a political entity?"

I would note that Verisign updates its TLD zones many times a day. Some of the zones under .com and .net are more significant economically and structurally than many entries in the root zone. And yet no 'authority' other than VRSN engages in "careful approval" of those changes. Verisign can, however, be sued if it mucks something up and damages organizations or people, can it not? Are we confusing the internal security of IANA/ICANN's process with a governance process? In that respect, I think David Conrad's suggestion of structural separation of certain functions makes more sense than a notion of "careful approval by authorities."

Milton Mueller
Professor, Syracuse University School of Information Studies
http://faculty.ischool.syr.edu/mueller/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://1net-mail.1net.org/pipermail/discuss/attachments/20140217/26eb3478/attachment-0001.html>