How to Guide and Assess Risk Reduction using Risk Characterization Indicators

Problem statement: The risk level predictions in risk assessment often suffer from uncertainties and may, thus, overlook some adverse effects. This problem can be reduced by using risk reduction strategies that continuously guide activi ties toward lowest possible risk. Approach: This study suggested a method to guide and assess such r i k reduction strategies using multi-indicator risk characterization. It was a challenge for the method to secure robustness against unavoidable high uncertainty and to secure flexibility that embraced multiple indicators for different aspects governin g the risk level. This methodology was to protect rea l existing targets, denoted Protection Units (PU), against adverse effects and applied knowledge about all PUs, or a representative fraction of those. A set of risk indicators described different aspects of the risk level for each PU. A scenario in this c ontext contained the set of PUs, each having their risk le vel described by the set of different risk indicato r values. Results: The result was a multi-criterion solution that was analyzed using partial order ranking, where ambiguities between single criteria predictio n of risk level as either higher or lower were analyzed and mapped. Conclusion/Recommendations: Risk level hotspots, in which several criteria simultaneously predicted higher risk level for spec ific PUs, was used as key-elements to provide guidance and assessment of the need for risk reduct ion and the method was, therefore, called Hotspot ruled ranking (HotsRank).


INTRODUCTION
This study suggests an aggregation method for risk indicators that can guide and assess risk reduction. The methodology is developed to provide guidance for risk assessment of chemicals and biocides including pesticides, however, the usefulness is much broader in the area of handling risk. The use of chemicals including pesticides and biocides, in the following unified denoted "chemicals", may result in adverse effects, even though the risk assessment predicts them to be harmless. This is not a consequence of insufficient work of the risk assessors, but an unavoidable problem arising from the highly complicated task of risk level prediction. This is often the case in risk assessment and it is, therefore, advisable to apply risk reduction to limit the likelihood of unpredicted adverse effects and, thereby, to support the risk assessment.
Most risk indicators quantify real conditions of risk and this will often be in contrast to the risk assessment that is calculated using estimated fictive scenarios. The risk indicators need to include as many factors as possible to gain validity and each indicator involves some degree of numeric uncertainty. This problem is enforced by the fact that risk reduction is most important to apply when the risk assessment includes a critical degree of uncertainty.
In this study, the dilemma is handled by defining the indicators solely on a relative basis, where the indicators only can predict a risk level in one place as higher/lower than the risk level in another place. It is a widely accepted statement that relative analyses, in general, are more certain in the conclusion compared to quantitative analyses. This is closely related to the statement that a rough prediction is more certain about the "limited" amount of information that is delivered compared to the certainty of the "extended" information delivered by a more detailed prediction.
The risk hotspot is central in risk assessment as a realistic combination of factors that together yield the highest risk level. The risk indicator needs to include risk hotspots in the same way as is done in the risk assessment in order to avoid a conceptual mismatch between what the risk assessment protects and what the risk indicator protects. This also involves the temporal and spatial scaling of the risk parameters. A larger, but not very large, scale could be a single field application of a pesticide active ingredient (e.g., eco-toxicity assessment of a single active ingredient). While a very large scale level could be bioaccumulation of a chemical in the food chain in a sea area. Risk assessment of chemicals analyzes the risk level for risk hotspots, where harmful effects are most likely to take place. As an example, it is stated in EUs Technical Guidance Document (TGD): 'For existing substances, the rapporteur should initially make the generic "reasonable worst-case" exposure assessment based on modeling, to derive an EU environmental concentration" (European commission)' [1] . In this study, the wording "reasonable worst-case" and risk hotspots are considered synonyms. In more general terms, the risk level in the hotspots are estimated based on a set of risk parameter values, each assuming to describe central properties on the risk level. A realistic combination of risk parameter values that jointly yields the highest risk level makes a realistic worst-case scenario. Clearly, the worst-case scenario controls the outcome of the risk assessment and, in case of chemicals, a long ongoing discussion has taken place during nearly the past 20 years between involved partners (legislation, industry, NGOs) about realistic value setting of risk parameters and about the most important set of parameters. The actual setting for realistic worst-case, thus, reflects a large amount of expert knowledge, including a comprehensive degree of consensus. However, existing risk indicators for chemicals are often based on mean values, or accumulated value settings, where heterogeneous conditions reflecting hotspots are leveled out and, thus, removed from the result. The total production volume of a chemical is an example of an accumulated value type of risk indicator that is applied as criterion to guide chemical risk assessment according to the TGD. The total production volume may indicate a likelihood for a human or an eco-system to come into contact with the chemical. However, if the use and/or production resulting in emission takes place in a local area, it may enhance exposure locally, also in case of limited total volume production. Therefore, in this case, there is a mismatch between the total production volume used as risk indicator and the principle of risk assessment focusing on risk hotspots.
This study suggests a method to apply risk indicators in a way that fits risk assessment by including the similar concept of risk hotspots. The name of the method is Hotspot ruled Ranking (HotsRank).

MATERIALS AND METHODS
The basic concept in the approach is to analyze realistic worst case conditions on a relative basis as described by several risk indicators. If a benchmark condition is defined or estimated, it is possible to derive predictions that can trace the fulfillments of objectives for risk minimization.
The purpose of risk assessment is to protect a defined target that has some value for protection. A specific real physically existing target is denoted as a Protection Unit (PU). If the target is humans, then the number of PUs could be the number of humans to protect, or, if the target is lakes, then the number of PUs could be the number of lakes in the geographical area that is covered by the risk assessment activity (e.g., EU). The principle of the HotsRank method is to identify all, or at least a representative fraction, of PUs and to set up risk indicator values for each of them. E.g., the risk assessment could consider the adverse eco-toxicological effects due to application of pesticide active ingredients on the ecosystem close to a field, where the pesticide is sprayed. In this case, the PU could be the eco-systems that are close to agriculture activity such as ponds, streams or hedge rows and the risk indicators have to describe the "real" conditions of risk level for a representative fraction of those fields in the whole country or region that is being considered for protection.
A PU-scenario contains all the PUs that are described by the set of risk indicators. It is, thus, possible for two PU-scenarios to be different in two ways: (1) Not all the PUs are equivalent, so, in this case, the two scenarios are not protecting the same PUs; (2) The two scenarios include the same PUs, but the risk indicator values are not necessarily the same between the two scenarios for the same PU. In case (2), The PUscenarios have the same set of PUs in common and they are, thus, considered as belonging to the same class of PU-scenarios, while in case (1), the difference between two PU-scenarios is more fundamental and they are not considered as belonging to the same class. In case (1) the two scenarios are not protecting the same, while in case (2) the two scenarios are protecting the same. It is nearly meaningless to compare two scenarios that do not protect the same as case (1), as it then will be necessary to assign some degree of importance to every PU. Because of this, the HotsRank method can only rank two PU-scenarios using the assumption that the two PU-scenarios belong to the same class (include the same set of PUs as in case (2)) and all PUs are assumed equal valuable to protect. Figure 1 shows the principle of the HotsRank methodology. Two different PU-scenarios contain the same set of three PUs and they all have values for the same three risk indicators, as shown by different shapes. The values of these, respectively, in PUscenario A and PU-scenario B are illustrated by different shades of darkness/lightness. The values for the same risk indicators are compared with the PUs in the other PU-scenario. There are no comparisons between two different risk indicators, as they describe different and, thus, incomparable properties. In real cases, the number of PUs will be high, so the number of comparisons between two PUs for the same indicator will, typically, be very high. A PU that has large risk indicator values for several indicators simultaneously in one PU-scenario will be ranked above many PUs in the other PU-scenario by this principle, which makes the method highly sensible to the existence of risk hotspots. The HotsRank method aggregates based only on ranking and, thus, not numerical weightings, single risk indicator values to predict a ranking between two PUscenarios. Aggregation of information, including application of benchmarks, is a well known principle in multi-criteria methods, e.g., discordance-concordance analysis; Figuere et al. [2] presents a comprehensive description of multi-criteria methods. Selecting the principle for multi-criteria analysis is often a matter of judgment and, thus, opens for discussion, where different schools of principles argue for their approach as being superior. The method presented in this study is based on the Partial Order Theory (POT) as mathematically described e.g. by Davey and Priestly [3] . Figuere et al. [2] does not explicitly describe this form of multi-criteria analysis, as the POT is fundamentally different from most multi-criteria methods. A brief description of the multi-criteria methodology based on POT is given in Brüggemann and Voigt [4] . Basically and in general terms, the conventional multi-criteria methods focus on how to aggregate different criteria that conflict in their predictions. Differences between different methods are highly dominated by differences in the way aggregation of the conflicting information takes place. On the contrary, the focus in POT is, primarily, to conclude based on the non-conflicting fraction of information [4][5][6][7][8] . The strength of using POT for identification of risk hotspots is due its ability to handle a larger set of PUs using highly transparent rules of ranking, which makes it possible to describe variations in risk level ranking and, in a very transparent way, identify hotspots. The basic idea of using POT for ranking scenarios was for the first time presented by Sørensen et al. [9] .
Let PU z be the z'th PU out of totally Z PUs. Let d m z be the value of the m'th risk indicator out of totally M risk indicators for the z'th PU. Two different PUscenarios must be different with respect to at least one value for at least one risk indicator and at least one PU. This is shown in Table 1.
The risk indicators need to be ordinal, but there are no further restrictions on the type. The POT ranks the PUs in relation to each other, as shown in Fig. 1, based on the risk indicator values shown in Table 2. A simple example shows the principle of HotsRank in the following paragraph. Only three PUs are included for illustration, but in reality the number of PUs is much higher than shown in Table 2, which increases the methodological decision power.   The indicator values in Table 2 are used to make a partial order, where a ranking is made only in case of no disagreement in the ranking among the risk indicators. In this partial order, PU 1 for PU-scenario B (B1) is ranked above PU 2 in PU-scenario A (A2) because the indicators d1, d2 and d3 all predict this ranking between them (d 1 : 7.2>1.6; d 2 : 2>1; d 3 : 2.2>1.5). Thus, there are no disagreements between the indicator values for the ranking B1>A2. The Hasse Diagram (HD), shown in Fig. 2, displays all the rankings between PUs, where there is no disagreement among the indicators.
It is now a simple task to count the number of ranked pairs in the HD, where a PU in one PU-scenario is ranked above a PU in another PU-scenario. The result is: Scenarios A>B occurs 3 times and B>A occurs 2 times. So, based on a simple 'voting' algorithm, as visualized in the HD, this analysis indicates that scenario A, in general, tends to have higher risk than scenario B. A principle of making such rankings between different groups in a HD is presented by Restrepo et al. [10] as the dominance degree method based on the equation: where the sum is the number of times a rank exists in the HD, where an object belonging to group number s1 is ranked above an object belonging to group number s2 and N s1 and N s2 is the number of objects belonging to respectively group number s1 and s2. In the context of ranking scenarios, the number of objects (PUs) are equal for all groups so N s1 = N B = Z and the groups are similar scenarios. The total budget for the rankings between a PU in one scenario with a PU in another scenario in the simple example is shown in Table 3 together with the calculated Dom(,) values. There are 3 PUs and, thus, 9 different comparisons (3 2 ). The Dom(,) values in Table 3 are both below 0.5 and this shows that there are so many conflicts left that it could be claimed that each scenario dominated the other, in case some of the conflicts were assigned to rank this scenario above the other..
The special case of s1 = s2 yields the number of times a PU in PU-scenario s1 is ranked above a PU in the same PU-scenario. The Indicator Ordering matrix, I, is defined as: The I i,j value is in the interval between -1 and 1 and I s1,s2 >0 indicates that PU-scenario s1 is ranked above s2 and reverse for I s1,s2 <0. In this way, the I matrix describes the pair wise ranking between each pair of PU-scenarios, but a final consistent ranking of every PU-scenario in relation to all the other PU-scenarios is only possible to derive if the rankings, as defined by the I matrix, are transitive, as explained in the next section. This is not, necessarily, true and a partially ordered set (POset) for the set of PU-scenarios denoted PS is defined using the following statement:

Statement of transitivity:
The pair s1, s2 is an order relation in PS if and only if: I s1,s2 >0 Λ I s1,s <0 Λ I s2,s <0 for all s = 1..S The reasoning behind this statement is that all rakings of PU-scenarios need to be consistent: If the PU-scenario s2 is ranked below PU-scenario s1 (I s1,s2 >0) and the PU-scenario s1 is ranked below PUscenario s (I s1,s <0), then it must be true that PU-scenario s2 is ranked below the PU-scenario s (I s2,s <0) in order to obey consistency. Another way to apply the I matrix, is to define a PU-scenario as a reference (target) scenario and then rank all the PU-scenarios in relation to this, where all scenarios are characterized using the I matrix by the values in relation to this specific scenario. The calculation of the I matrix only includes pairs of PUs that can be ordered in the partial ordering by concordant rankings; i.e., the I matrix excludes discordant rankings between the single indicator values. It is, however, important to quantify the fraction of PU pairs where such conflicting rankings exist. A so-called Ranked Fraction matrix (RF) is defined in this study for the purpose of quantifying the degree of discordance: The element RF s1,s2 is equal to the ratio between the number of concordant rankings between the scenarios s1 and s2 and the total maximum number of different comparisons between two PUs and two scenarios (Z 2 ).
Both positive and negative correlation can take place between the risk indicator values and this will rule the existence or non existence of risk hot spots. Such correlation is described by a characteristic figure in this study called the Aggregated Correlation matrix AC and defined as: AC s1,s2 >1 shows a positive aggregated correlation between the different risk indicators for two PU-scenarios s1 and s2, while AC s1,s2 <1 shows a negative correlation between the indicators. Thus, AC s1,s2 >1 indicates that the risk indicator values are clustered and there will tend to be formed risk hotspots, where some PUs are much more likely to be at risk compared to others. Contrary, in case of AC s1,s2 <1, the risk indicators tend to level out the difference in risk levels between the PUs and the problem of risk hotspots are more limited.
For two PUs (PU 1 and PU 2 ), two PU-scenarios (1 and 2) and two risk indicators (d 1 1 (1), d 2 1 (1) and d 1 1 (2), d 2 1 (2) respectively), nine possible stages of ranking exist: A pair of risk indicators can be analyzed in relation to how they rank two PU-scenarios by counting the number of events for each of the listed stages above. It is a matter of judgment how to interpret the weak discordant pairs, this depends on the conditions described by the indicators. In this study, only the concordant and discordant rankings are included in the following correlation analysis using the Kendal Tau type correlation [11] from where the correlation matrix τ is defined as: where, C and D, respectively, are the number of concordant and discordant rankings between two PUs for the indicators I i and I j . Rank correlation and partial order is further described by Sørensen et al. [11] . If two risk indicators show high positive correlation, then they will tend to reproduce each other in the rankings between scenarios and there will not be a dramatic change in results if one of them is removed. So the individual importance of an indicator is low if such a correlation exists in relation to, at least, one other indicator, while an indicator that has low correlation to all the other indicators will tend to have high influence on the scenario ranking. If the aggregated correlation matrix shows that there is a low number of concordant rankings between PUs, then the τ matrix can identify the indicator/s that is/are the major reason for this. Sub sets of risk indicators, that together make risk hotspots, can be identified using this correlation matrix.

RESULTS
The results gained by using the HotsRank method are illustrated by the following. This considers planning and evaluation of strategies for risk reduction in the area of ecotoxicological effects due to pesticide usage in agriculture. This research includes a total set of 18 different risk indicators, three of which are selected for illustration in this example, representative of the geographical area of Denmark. The three selected indicators describe adverse effects on the terrestrial eco-system close to the agricultural field. In accordance with approved pesticide laboratory testing procedures, the selected PU are bees (d 1 ), other terrestrial invertebrates (d 2 ) and plants (d 3 ). The position of the agricultural fields is estimated using GIS and combined with information about the position of a series of relevant terrestrial habitats. The exposure to the habitats is assumed only to be a result of spray drift and to follow a Ganzelmeier type of relation [12] . The indicators are calculated for every 1 km 2 grid that contains agricultural fields and the total set for Denmark includes 41.400 such grids, each of which is considered to be a single PU having three indicator values. The argument for having every single km 2 as PUs is that the ecological risk assessment of pesticide considers local ecosystems like hedgerows and meadows and the scale of them is local of few hundred meters, so 1 km grids will mimic this scale. A more detailed description of the pretreatment of data behind and calculation of all 18 risk indicators will be given in a future paper under preparation by the first author.
The purpose of this example is to illustrate the application of HotsRank to analyze strategies that can limit the risk of adverse effects on the terrestrial habitats close to agricultural fields. The analysis is based on pesticide application that took place during the year 2007. Different means for risk reductions are tested based on the usage during this year. The amount of active ingredients in pesticides used in agriculture during year 2007 are distributed on 8 crop types using expert knowledge and national use and sale statistics as reported by the Danish EPA [13] . The area of each crop type is calculated for each grid using the General Agriculture Register from the Ministry of Food and Agriculture in Denmark. The following equation is used to calculate the risk indicators for each grid: Where: d m z = The m'th risk indicator that describes the condition of the z'th PU A = The area of the grid (10 6 m 2 ) and a is index for the a'th m 2 x a,z = The closest distance to the boundary to a terrestrial habitat outside the agricultural field, e.g., a hedge row or an edge of a wood for the a'th m 2 in the z'th PU AR j,a = The mean application rate during one year (kg/(m 2 ·year)) for pesticide active ingredient j on the a'th m 2 Tox m,j = The toxicity in terms of the standard tested lethal concentration (LC 50 ) killing 50% of the population for the organism m'th specie type and the j'th pesticide active ingredient This equation describes a toxic "pressure", data being organized in such a manner that a higher value indicates increased toxic pressure [15] .
Three different risk indicators are included in the analysis based on three different specie types as shown in Table 4.
Three risk reduction strategies are defined and used to simulate PU-scenarios for the pesticide applications at field scale. The 2007 use scenario and the three risk reduction strategies are shown in Table 6.
Two PU-scenarios are ranked using all 41400 PUs (1 km 2 grids). HotsRank counts the number of cases where a PU from scenario 1 is ranked above/below a PU from the scenario 2 and this ranking is done for each indicator separately and for the all three indicators simultaneously as a partial order. In this case, the number of discordant rankings are also counted, where at least one indicator predicts a rank that contradicts at least one other indicator. The results are shown in Table 7, where all 6 combinations of ranking scenarios 1 and 2 are listed in rows. The first column from the left shows Ids for the possible ranking combinations as referred to in the following discussion. The next two columns from the left show the scenarios selected from Table 5 that are assigned to PU-scenarios 1 and 2, respectively.  The condition of application and the agriculture structure for year 2007 Substitution Substitution of replaceable and more toxic active ingredients with lesser toxic ones. Usage and agriculture structure like the condition for 2007 10 m zone Assuming a 10 m unsprayed zone along all agriculture field edges Red Insect Reduced application of insecticides Table 6: Ranking of the 4 scenarios defined in Table 5 Partial  Obviously, the PU-scenario "Use 2007" has the highest risk of adverse effects compared to all the other PU-scenarios, because they all are designed to limit the risk of adverse effects. This is shown in Table 6 as negative I values every time the PU-scenario "Use 2007" is assigned to be PU-scenario 1 in the analysis and when all indicators are used in the partial order (column #7). However, the three strategies behave differently when they are ranked in relation to the "Use 2007" PU-scenario. For Id 1, there are many conflicts (1112·10 6 ) and the reason for this is seen by consulting the ranking of the single indicators, where d 2 predicts highest rank to "Substitution (= Scenario 2)" (negative I value) in contrast to d 1 , while d 3 is close to being neutral by having a value close to 0. The correlation results for Id 1 confirm the discrepancy between the ranking of d 1 and d 2, respectively, with the negative correlation. This shows that only the bee toxicity is improved (d 1 ), while the arthropod toxicity (d 2 ) is increasing as a result of the substitution. The substitution should be reconsidered for improvements in order to avoid the observed increase in arthropod toxicity. For the Ids 2 and 3 in Table 6, there are less conflicts compared to Id 1 and this is also confirmed by the positive correlation between all the three indicators in these cases.  Table 6 The Ids 4-6 in Table 6 rank the PU-scenarios reflecting the alternative risk reduction strategies in relation to each other. Such a ranking is meaningful in order to find the best strategy to use in future activities for limitation of adverse effects on terrestrial ecosystems close to agricultural fields. However, the fact that the PU-scenario "Substitution" is creating many conflicting rankings due to negative correlation between the indicators, may violate the statement of transitivity. In Table 6, the Id 4 shows that (Substitution) < (10 m zone), where the notation of "(higher risk level) < (lower risk level)" is used. In the same way, the Id 5 shows that (Substitution) > (Red insect) and Id 6 shows (10 m zone) > (Red insect). In short, this means A < B, A > C → B > C and the set of the three reduction strategies is, thus, transitive. It is, therefore, possible to estimate a complete rank of the alternative scenarios as shown in Fig. 3. The result is useful in order to decide the best risk reduction strategy for best possible limitation of adverse effects.

DISCUSSION
This study suggests a method that supports risk reduction strategies using risk indicators. The risk indicators are integrated to support the risk assessment concept based on worst case or risk hotspot analysis. It is obvious that the validity of the sets of risk indicators is critical and they have to be carefully evaluated before being applied in any model that uses them as input. The name given to the method is HotsRank from the wish to reflect the governing principle of using risk indicators to focus on risk hotspots.
A Protection Unit (PU) is defined as a real existing target that is protected by regulatory approval schemes. The definition of PU is general, in the way that all activities of risk assessment aim to protect something that is real, so some kind of PUs will always exist. The hotspot of risk is estimated by setting up risk indicator values for each single PU without any aggregation of risk indicator values. In this way the multi-criteria methodology HotsRank attempts to avoid hiding extreme risk indicator value combinations reflecting risk hotspots. Furthermore, evaluation and interpretation of the ranking results can be performed with direct reference to the input risk indicator values and, in this way, the methodology has a high degree of transparency. A major condition for the HotsRank method is that it is possible to make a representative description using a set of different risk indicators for every PU or for a representative fraction of them. This includes simultaneously handling of multiple risk indicators and investigation of discordant (conflicting) information about the relative risk level between the single indicators. The latter is a technical and mathematical challenge for any method, where the HotsRank method focuses on the fraction of information that can be gathered from the set of risk indicators without doing any aggregation of different indicator values. This is done by counting the rankings between two PUs, where there is no discordance (conflict) within the sets of risk indicators about how to rank the two PUs. An argument against this approach is to claim that information is ignored when all the PU pairs that have conflicting rankings between the risk indicators are disregarded. It is important to make clear that there are two "classes" of rankings between two PUs: (1) Concordant rankings, where the risk indicators agree; (2) Discordant rankings, where the risk indicators disagree. The result of the concordant ranking is certain because all risk indicators point to the same rank of the two PUs. But the result of the discordant rankings is more uncertain because, in this case, a decision about a rank of the two PUs will depend on additional assumptions about the importance and weighing of each indicator in relation to each other in order to solve the conflicting rankings. So, the concordantly ranked PU pairs are more certainly ranked than the discordantly ranked PU pairs. The argument behind the HotsRank method is that the most certain discordant rankings delivers a decision regarding the PU-scenario ranking without being "polluted" by more uncertain discordant rankings of PUs. Furthermore, assuming that the indicators are valid predictors of the relative risk, the discordant rankings identify hotspots of risk with the highest power of certainty; i.e. where several indicators simultaneously agree about the PU as being associated to highest risk level. The drawback of this approach is that only a fraction of all potential rankings between two PUs are included and this induces uncertainty about the ranking of the PUscenarios. But this uncertainty, due to the discordance between the risk indicators, can be mapped and evaluated, as shown in the example about pesticide risk. This evaluation of discordant rankings is a valuable property of the HotsRank method, as it may form the basis for elucidation of underlying factors governing the conflicting rankings for some scenarios and/or PU units.

CONCLUSION
Definition and application of Protection Units (PU) is a good basis in the development of risk indicators, where each PU is described by a set of indicators that can rank the risk level. This yields a multi criterion problem that needs to be handled. In order to do this the HotsRank method is a valuable method, where the concept of avoiding risk hotspots is used. HotsRank is useful both as stand alone analysis and in many cases as first step assessment tool, where other and more complex, multi-criterion methods are applied to find rankings between the discordant ranked pairs of PUs. In this case, the HotsRank can analyze the concordant rankings (higher quality of information) to evaluate and guide the handling of discordant rankings (lower quality of information) by the more complex multicriterion method.