VAR-models for Predicting Biodiversity

The aim of this work is to introduce a methodology for monitoring diversity of a biological population by using the tool of diversity profiles. In particular, we develop a theory for studying th e diversity profile along the temporal axis. We propo se forecasting techniques for diversity profile by using the VAR model on multivariate time series of abundance vector and ARMA models both on univariate time series of abundances and on the agg regated series of diversity profiles. A Monte Carlo simulation is performed in order to test the goodne ss of forecasts obtained applying the three differe nt methods proposed.


INTRODUCTION
The concept of diversity arises both in ecological and non-ecological subject areas. Diversity is related to the apportionment of some quantity into a number of categories. What the actual quantity is, depends by the problem on hand. When we measure diversity we should take into account different aspects such as the number of different species and the relative abundance of different species. Most diversity measures can be classified as being heavily dependent on rare species (species richness) or on the abundance of the commonest (dominance). Consider a population of s species for which N k and p k denotes the abundance and the relative abundance of species k, for k=1,2,…,s, respectively.
Pielou [1] gave two characteristics an index of diversity should possess: * for given s, the index should be a maximum when the p k are equal; * if the p k are equal, the index should be an increasing function of s. Patil and Taillie [2] proposed a general class of diversity index allowing all diversity measures to be encompassed into a single diversity spectrum. They started by defining diversity as the "average rarity of species within a community". More formally, given a community as a measure of rarity for a species k, then the average rarity of species in the community is given by so that we get the diversity profile for community C as The value β denotes the relative importance of richness and evenness. For β =-1 we get the richness index, for β=0 the Shannon diversity index and for β=1 the Simpson index. Both Simpson and Shannon indexes are affected by the number of species and the evenness of species abundance, but they are affected differently. Thus, diversity profile can be plotted to compare communities in space and/or time over a range of different evenness emphasis. In the following we will be interested in a range of values of β belonging to the set picture of the structure of the community under study [2] . Monitoring biodiversity is a growing concern of environmental agencies. While species and habitat are disappearing, it is crucial to be able to evaluate, even roughly, the biodiversity loss and predicting it. A lot of works in the literature deal with abundance and biomass prediction [3] . Nevertheless, a few works attempt to predict biodiversity [4,5] . There are no specific mathematical tools for predicting biodiversity but techniques used for predicting abundance also could be used for predicting biodiversity by using a wide range of multivariate techniques. This approach has the shortcoming to assume a linear relationship between the level of biodiversity and species abundances or any other measurable biological variable. Indeed, the literature data have failed to detect simple and linear relationship between the studied variables and diversity. For a thorough and critical review of the matter [6,7] .
In this work we propose to understand, predict and manage biodiversity by viewing it as a function of species interaction or any other additional environmental variables. In particular, instead of using one single index we focus our attention to the diversity profiles of Patil and Taillie [2] in order to better describe the diversity of a community. Finally, we examine the behaviour of forecasts of the diversity profile along the temporal axes by using multivariate VAR model and univariate ARMA model.

DIVERSITY INDEX ESTIMATION
Let us suppose that the ecological population is made up of N units and is partitioned into s species. Moreover, let N k be the abundance of the k-th species (k=1,2,…,s). Hence, Accordingly, since 1 n − = p n , then p turns out to be an unbiased and consistent estimator for p . The variancecovariance matrix of p is given by Finally, the use of the Central Limit Theorem provides that

Being
( ) β ∆ p function of a random variable, we may consider it also as a random variable. Moreover, by using the Delta method Tong [8] proves that as are defined in a neighbourhood of p and non-null at p, Suppose that for each of s species of a given community C the time series of absolute abundances is available. It is plausible to think that information about abundance ˆk t n observed at time t=1,2,…,T for the species k=1,2,…,s is contained in the past values observed for the abundance vector.
In other words, we suppose that time series of absolute abundance of the specie k is a realisation of the discrete stochastic processes { } ( ) 1, 2,..., k t t T = N and the set of series observed in the whole community is a realisation of a vector stochastic process In particular, we assume that absolute abundances are realisation of a stable Vector Autoregressive Process of order l (VAR(l)) [9] , such that: In particular, we focus our attention on the abundance vector of (2), so that we say that t n is the abundance vector observed at time t. Consequently, once t n has been observed, we can straightforwardly derive the relative abundance vector observed at time t, say ˆt p and the diversity profile observed at time t, say To simplify the notation let ( ) Then (2) can be written compactly as 1 1 ...

T h T h l T h l
A A + + − + − = + + + n v n n (6) which is the optimal forecast obtained from a vector autoregressive model of the form (4).
Naturally, we might focus our attention to diversity profiles. Under the assumption that absolute abundances are realisations of a VAR(l) process, the β -diversity profile is a non linear combination of abundance vector observed at time t, so it is also a random variable. In particular, for a fixed value where m is the cardinality of the set Β . The function Obviously, for a any fixed value Β β ∈ , ( ) t β ∆ p , t=1,2,…,T, is a stochastic temporal process. For simplicity we assume that time series of β -profile is a realisation of a linear ARMA(p,d,q) process [10] . For a fixed B β ∈ , using an ARMA(p,d,q) model , forecast of diversity for period (T+h) might be expressed as: under the assumption that order of integration d is estimated outside the model for non stationary mean time series. Suppose that the goal is to forecast the diversity profile (1) in order to analyse the dynamic structure of a given biological community. When temporal observations for absolute abundances are available, it is theoretically possible to obtain a forecast for the diversity profile in three different ways.
First of all, using the multivariate forecasting model in (5) and (6) it is possible to obtain forecasts for absolute abundances, which aggregated by using (1) lead to forecasts for β-diversity profile. Obviously, the process under study is unknown and in practice the coefficients of assumed VAR(l) process must be estimated from a given multiple time series. Criteria for determining the order l of the model and for checking the assumptions underlying a VAR analysis have to be followed [9] .
Second, using an univariate ARIMA(p,d,q) model, forecasts for each time series of abundances can be estimated and aggregated to obtain forecasts for diversity profile (1). Finally, forecasts could be directly obtained for time series of β-diversity profiles by using an univariate ARIMA(p,d,q) model.
When univariate model is applied the order is obviously unknown and have to be estimated using one of the automatic selection criterion proposed in the literature [10] . In the next section, by means of a Monte Carlo simulation we evaluate the performance of the three different methods for forecasting diversity.

MONTE CARLO EXPERIMENT
The sample behaviour of forecasts for diversity profile is investigated by a Monte Carlo experiment. The process used in the simulation is a stable VAR process of order l=1, where the disturbance are i.i.d. 3,4,5,6,7,8. For every parameterisation 300 replications are considered. In order to test the performance of estimators, length time series is fixed at 105 simulated observations, since a stabilisation of mean square error of estimates is obtained when length is not less then 100 simulated observations. Forecasts for different hypothesis described earlier are computed as follows. Using multivariate technique a VAR(l) model is estimated for the simulated data of absolute abundances by ordinary least square; order of model is specified using AIC criterion for multiple time series whit a maximum number of parameters equal to 4 [9] .
When univariate technique is applied, an ARIMA(p,d,q) model is estimated respectively for each univariate series of simulated absolute abundances and for the related time series of diversity profiles; order of the model is specified using AIC criterion for univariate time series whit a maximum number of parameters equal to 6 and estimation is performed by ordinary least square.
For each of the above methods, estimation is performed using the first T=100 observations; the estimated model is used to produce a sequence of 5 forecasted values for β-profile and Root MSE of forecasting is computed. The results for increasing number of species are shown in Fig. 1. Results highlight that VAR models are suitable for predicting diversity with respect to ARMA models as the performance of multivariate temporal models is higher when the variables are highly dependent on each other. In fact, it is known that species of biological community are highly correlated.
Finally, we point out that, recently different case studies have implemented monitoring systems for the construction of panel-data for abundance vector of biological populations. Since our methodology suitably should fit these new research fields, our next goal is to apply the methodology proposed to a real data set.