Constructing Fuzzy Time Series Model Based on Fuzzy Clustering for a Forecasting

Problem statement: In this study researchers introduced a fuzzy time series model depending on fuzzy clustering to solve the problem in which the membership values are assumed as Song and Chissom model and to increase the performance of fuzzy time series model. Approach: Proposed model employed seven main procedures in time-invariant fuzzy time-series and time-variant fuzzy time series models. In the first step: clustering data, in the second step: determine membership values for each cluster, the third step: define the universe of discourse, in the fourth step: partition universal of discourse into equal intervals, in the fifth step: fuzzify the historical data, in the sixth step: build fuzzy logic relationships and the last step: calculate forecasted outputs to increase the performance of the proposed fuzzy time series model. Results: From the evaluations, the proposed model can further improve the forecasting results than the other model. Conclusion: The proposed model is a good model for forecasting values. Selecting membership functions based on fuzzy clustering offers an alternative approach to let the data determine the nature of the membership functions. Our results showed that this approach can lead to satisfactory performance for fuzzy time series.


INTRODUCTION
presented the concept of fuzzy time series based on the historical enrollments of the University of Alabama. They presented the timeinvariant fuzzy time series model and the time-variant fuzzy time series model based on the fuzzy set theory for forecasting the enrollments of the University of Alabama.
The fuzzy forecasting methods can forecast the data with linguistic values. Fuzzy time series do not need to turn a non-stationary series into a stationary series and do not require more historical data along with some assumptions like normality postulates. Although fuzzy forecasting methods are suitable for incomplete data situations, their performance is not always satisfactory (Kirchgassner and Wolters, 2007;Palit and Popovic, 2005).
The proposed fuzzy time series model is introduced to handle forecasting problems and improving forecasting accuracy. Each value (observation) is represented by a fuzzy set. The transition between consecutive values is taken into account in order to model the time series data.
The forecast accuracy is compared by using Normalized Root Mean Square Error (NRMSE). The Normalized Root Mean Square Error (NRMSE), in statistic is the square root of the sum of the squared deviations between actual and predicted values divided by the sum of the square of actual values: Fuzzy Clustering (FCMI): Fuzzy C Mean iterative assume the existence of pattern space X = {x1, x2,…, xm) and c fuzzy clusters, whose centers have initial values y10, y20,…,yc0. Every iteration the membership function values updated and the cluster centers also. The process terminates when the difference between two consecutive clusters centers do not exceed a given tolerance (Friedman and Kandel, 1999).
Step 1: Determine the number of all iteration N, At iteration k = 0, initialize y i = y i0 , 1≤ i≤c Step 2: For 1≤ i≤c and 1≤ j≤m: Step 3: For 1≤i≤c and 1≤j≤m: Step 4 Step 5: If: and 1≤j≤m and stop.
Step 6: If k = N then stop else k = k + 1 and go to Step 2.
Fuzzy time series: Song and Chissom (1993b) presented the concept of fuzzy time series based on the historical enrollments of the University of Alabama. Fuzzy time series used to handle forecasting problems. They presented the time-invariant fuzzy time series model and the time-variant fuzzy time series model based on the fuzzy set theory for forecasting the enrollments of the University of Alabama. The definitions and processes of the fuzzy time-series presented by Song and Chissom (1993a) are described as follows (Liu, 2007;Huarng, 2001).

Definition 2 (FTSRs):
If there exists a fuzzy logical relationship R(t-1, t), such that F(t) = F(t-1) × R(t-1, t), where "×" represents an operation, then F(t) is said to be induced by F(t-1). The logical relationship between F(t) and F(t-1) is: Definition 3 (FLR): Suppose F(t-1) = Ai and F(t) = Aj. The relationship between two consecutive observations, F(t) and F(t-1), referred to as a fuzzy logical relationship, can be denoted by: Definition 5 (IFTS and VFTS): Assume that F(t) is a fuzzy time-series and F(t) is caused by F(t-1) only and F(t) = F (t-1) × R (t-1, t). For any t, if R (t-1, t) is independent of t, then F(t) is named a time-invariant fuzzy time-series, otherwise a time-variant fuzzy timeseries. Song and Chissom (1993a) employed five main procedures in time-invariant fuzzy time-series and timevariant fuzzy time series models as follows: Define the universe of discourse U: Define the universe of discourse for the observations. According to the issue domain, the universe of discourse for observations is defined as: Where: D min = The minimum value D max = The maximum value D 1 and D 2 = The positive real numbers to divide the U into n equal length intervals Partition universal of discourse U into equal intervals: After the length of the intervals, is determined, the U can be partitioned into equal-length intervals u 1 , u 2 ,...,u n .
Define the linguistic terms: Each linguistic observation, A k can be defined by the intervals u 1 , u 2 ,...,u n , as follows: Fuzzify the historical data: Each historical data can be fuzzified into a fuzzy set.

MATERIALS AND METHODS
Proposed model: we introduce a proposed fuzzy time series model depend on fuzzy clustering. Most of authors in fuzzy time series field took the same path according to processes of the fuzzy time-series, which are presented by Song and Chissom (1993a), but we introduce this novel model to solve the problem in which the membership values are assumed as Song and Chissom model and this membership values have an important role in the forecasting values. Proposed model employed seven main procedures in timeinvariant fuzzy time-series and time-variant fuzzy time series models as follows: Step 1: Cluster data into c clusters: Apply fuzzy clustering on a time series Y(t) with n observation to cluster this time series into c (2≤c≤n) clusters. FCMI is used because it is the most popular one and well known in fuzzy clustering field: At iteration k 0,initialize y y , 1 i c y Dmin / 10 * i Where: D min = The minimum value C = The number of clusters Step 3: Define the universe of discourse U: In this step, the proposed model defines the universe of discourse as Song and Chissom (1993b) were defined it as Eq. 6.
Step 4: Partition universal of discourse U into equal intervals: According to this step, the proposed model, partition the universe of discourse into c intervals.
Step 5: Fuzzify the historical data: In this step, proposed model fuzzufy historical data, where the proposed model determine the best fuzzy cluster to each actual data.
Step 6: Build fuzzy logic relationships: Proposed model in this step build fuzzy logic relationship as Definition 3. if F(t-1) = A i and F(t) = A j then the relationship between two consecutive observations: Step 7: Calculate forecasted outputs: Proposed model forecasting values based on fuzzy logic relationship, if A i →A j then the forecasting value is the midpoint of A j .

RESULT AND DISCUSSION
Empirical study: Previous studies on fuzzy time series often used the enrollments data at the University of Alabama as the forecasting target in many forecasting studies.
Based on the enrollments of the University of Alabama from 1971-1992, we can get the universe of discourse U = [13055, 16919], partition U into 7 equal intervals, D 1 = 31 and D 2 = 55. Hence, the intervals are: u 1 ; u 2 ; u 3 ; u 4 ; u 5 ; u 6 ; u 7  The forecasted value for each year is a fuzzy set, A i , which is defined by: The forecasting value for year 1971 is 13159 while the actual value was 13055. Figure 1 shows linguistic terms and forecasting values deduced by proposed model.
This study uses the same data to evaluate the proposed model and compare with other methods. The forecasting values for (Jilani and Burney, 2008;Tsaur et al., 2005;Yu, 2005;Chen et al., 2008;Cheng et al., 2008) and our proposed model are given for comparison purpose in Table 2. By using line-chart the comparison of the forecasting results between these models in Fig. 2.
The NRMSE in (Jilani and Burney, 2008) is 0.0171, in (Tsaur et al., 2005) is 0.0393, in (Yu, 2005) is 0.0349, in (Chen et al., 2008) is 0.2955, in (Huarng, 2001) is 0.0332 and in proposed model is 0.0158. From the evaluations, the proposed model can further improve the forecasting results than the other model. Figure 3 shows the comparisons by NRMSE of different models.