Jun 4, 2007

Magic Numbers or Snake Oil?

The Common Vulnerability Scoring System

Can a single number sum up the full significance of a security vulnerability? The CVSS attempts to prove that it can, but it has its weak points.

The Common Vulnerability Scoring System (CVSS) is a relatively new attempt at consistent vendor-independent ranking of software and system vulnerabilities. Originally a research project by the US National Infrastructure Advisory Council, the CVSS was officially launched in February 2005, and hosted publicly by the Forum for Incident Response and Security Teams (FIRST) from April 2005. By the end of 2006 it had been adopted by 31 organisations and put into operational service by 16 of these, including (significantly) the US-CERT, a primary source of vulnerability information.

The CVSS attempts to reduce the complicated multi-variable problem of security vulnerability ranking to a single numerical index in the range of 0 to 10 (maximum), that can be used in an operational context for such tasks as prioritising patching. It uses a total of 12 input variables each of which can take from two to four alternative pre-determined values. The calculation is broken into three cascaded stages, the first two of which yield visible intermediate results each of which is fed into the following stage. This three-stage process consists of an absolute Base Score calculation that describes ease of access and scale of impact, followed by a Temporal Score calculation that applies a zero to moderate negative bias depending on the current exploitability and remediation position (both of which may well change over time), and, finally, an Environmental Score calculation that is performed by end users to take into account their individual exposure landscape (target space and damage potential).

The third (Environmental) stage has the greatest influence on the final result, and without it a CVSS ranking is really only a partial index. Therefore it must be recognised that a published CVSS score is, unlike the public conception of common vendor rankings (e.g. Microsoft "Critical"), not the final answer. Of course in reality they are not final indices either. The end user should always expect to complete the ranking process by applying some kind of environmental calculation to any published index to allow for local priorities, and the task becomes very difficult where vendor-specific rankings are derived using differing proprietary methods. In the case of the CVSS, a maximal Temporal score of 10 may be reduced by the Environmental calculation even to zero, or alternatively even very low Temporal scores raised up to around 5, once the user's exposure landscape is taken into account. The second condition is significant, as, while nobody would ignore, for example, a Microsoft "Critical" rating, vulnerabilities classified as low priority by vendors could have major impact on certain users, depending on the criticality of the vulnerable systems to their specific business processes.

The Good, the Bad and the Ugly

So what are the pros and cons of the CVSS? On the positive side, it attempts to formalise and objectify the decision process applied to a very complicated problem, potentially improving consistency, both over time and across platforms, vendors and products. It is quite simple, and the input variables and their pre-determined alternative numerical values in the main appear well chosen. It is transparent in that its mechanism is publicly documented. It breaks new ground in attempting to include formal recognition of the user's all-important exposure landscape. But on the other hand, no system is better than its inputs. Choices have to be made as to which value of each variable to select, and the quality of the result depends entirely on the quality of all the choices that lead to it. These choices are externally expressed in natural language in the available calculators. Fortunately, the alternatives contributing to the Base and Temporal scores are relatively unambiguously expressed, and as these decisions will normally be made by experienced security specialists in reporting organisations, the opportunity for significant error is minimised.

However, while the inclusion of the environmental component in the calculation is one of the greatest potential strengths of CVSS, it could also prove to be its Achilles' heel. Not only does the Environmental calculation have the greatest single influence on the final score, but the values of the two variables that contribute to it (collateral damage potential and target distribution) are expressed as "low", "medium" and "high": a notoriously subjective classification system. Poor decisions here will lead to serious errors that can completely undermine the quality of the more objective earlier stages. Furthermore the techno-centric thinking of the originators of CVSS is most apparent here. The guidance notes describe these two environmental variables solely in terms of the percentage of hosts that are vulnerable and the potential for physical damage. This completely misses the point of the differing business criticality of individual systems, which cannot in the real world be assessed "off the cuff" by technical personnel alone.

How can I use the CVSS?

Given the above, how can you currently use CVSS in the real world? In its most basic application (ignoring for now the questionable Environmental parameters), the published Base or Temporal scores for the vulnerabilities in hand at any given moment should simply be sorted into descending numerical order and addressed as swiftly as possible from the top of the list downwards, whatever the actual range or absolute values of the scores. Treat it as a relative rather than an absolute ranking system and get on with the job of patching on a continuous basis. Of course the list and its order really have to be updated regularly as new bugs are announced. This is a completely different approach from the widely advocated calendar-interval regime: "patch Tuesday", "medium severity = 1 to 4 weeks", which is of course in reality patch team workload management not corporate exposure minimisation (but of course we all really know that, even if we take the easy way out in practice).

Whichever of the two you choose, it is important to be consistent in always using either the Base or Temporal score in such a simple application, and the Temporal score is to be preferred as it partially reflects whether a fix is available to be implemented. Despite the familiar tendency to bracket ratings into such categories, "critical", "medium", "low", this is not useful given the extra detail offered by the numerical scoring. How do you prioritise among a dozen simultaneous "criticals"? The quite granular numerical scoring method makes it much less likely that a significant number of vulnerabilities on your current list will have exactly the same ranking. Plus, it is a transparent system. You can often see how the score was arrived at, so you might learn something of use for the future.

At a more sophisticated level, the relationship between the Base and Temporal scores can be used to extract further guidance. If the two scores are essentially identical (within 5 per cent or so) this generally indicates that you are more exposed than if the Temporal score is lower than the Base score by ,say, 10 to 30 per cent. It means that a viable exploit exists and limited (or no) remediation is available. A bigger difference in the scores indicates that exploits are to some degree unproven or imperfect and/or that a fix at some level is available. So diverging Base and Temporal scores are a flag that the vulnerability should be reviewed to find out the new state of play, and the vulnerability may have to be moved up or down your priority list. This obviously depends on your sources of intelligence updating the Temporal scores, but supposing the information is available, somewhat better prioritisation can result.

The Environmental score, although at present primitively implemented, can be used to some extent but the existing parameters will tend to return scores on the low side in non-homogeneous environments where individual systems are business critical or where the landscape is not dominated by a small number of platforms or products. It should only be applied by newbies where too many of the Base (or Temporal) scores in the sorted list have the same value and are therefore not effectively ranked, and then only with caution, as local homework will be needed to validate the results.

Better results can be made of the Environmental score if you are prepared to redefine its input parameters to suit your business context. Selection of the appropriate collateral damage parameter must include the cost to the business of a successful exploit, not just the cost of technical damage and remediation. Choice of target distribution parameter must include the business significance of the breached asset: it may be the only server in a couple of hundred that is running a given system, but if that system is business critical the extent of the exposure is much greater than 0.5 per cent. However, unless you already have considerable detailed business intelligence at your fingertips it is probably dangerous at present to rely on the Environmental score, given its large effect on the final result. This is where we most look to the CVSS developers to improve the system. For now, environmental considerations will for the most part probably remain "seat of the pants". However, supposing revision of the Environment score calculation gets due attention, it promises to become a very powerful tool.

The way forward for the CVSS

So the CVSS has considerable potential as a simple and effective method for vulnerability ranking, but it needs further work to make it more user-friendly and to render the Environmental score more robust and meaningful. The Environmental score parameters need to be redefined to include business impact, which is something that should ideally be done by the CVSS developers rather than ad hoc by individual end users. It is likely that the Environmental score calculation will have to become more sophisticated before its true worth emerges. But from the functional perspective probably the most significant omission is that all the approved calculators currently expect the whole calculation process to be performed in a single operation by selection of the complete set of natural language parameters. None of them allow the end user simply to enter a published numerical Base or Temporal score from which to derive a local Environmental score. At this time the calculation that is most important to the end user must be done "by hand" unless an advisory happens to list the parameters used to derive the published score.

Overall, the CVSS is a relatively untried system but one which, by virtue of its transparency, potentially contains less snake oil than the closed ranking systems we are used to. We must hope that it will evolve over time into a robust universal standard: something that is much needed in this field.

See also: