Bayesian and Maximum Likelihood Solutions: An Asymptotic Comparison Related to Cost Function

Abstract: Problem statement: Wald showed that the minimax solution is the Bayesian solution with respect to the law a priori the worst. We try to establish a similar result by comparing the Bayesian solution and the solution of maximum likelihood when the parameter space is a compact metrizable group. Approach: we take as a priori law Haar measure because we reduce the problem by invariance. We construct a sequence of cost functions for which we obtain a sequence of solutions Bayesian which converges to the solution of the maximum likelihood. Results: We show that both solutions are asymptotically equal. Conclusion/Recommendation: The generalization when the parameter space is a local compact group.


INTRODUCTION
Problem position: The fundamental problem of statistical decision theory can be summarized as follows: Given the triplet (Θ, D, C). Where: Θ parameter space. D space of decision rules. C a cost function and ω∈ Ω a random element on which the law of distribution P θ depends on a parameter θ∈Θ.
What rule of decision δ (ω) ∈ D a statistician must choose?). The space Ω cited above is called space of observations, we shall provide it with the σ-algebra a. The risk function associated t with the rule δ is defined by: ( ) R( , ) E C( , ( )) θ δ = θ δ ω R (θ,δ) represents the average cost when θ is estimated by δ (ω). The issue is then to choose an optimal statistical decision rule in the sense that R (θ,δ) is uniformly minimum. Unknown define q it am and depend of to short-circuit this difficulty, we order the different rules of decision according to d 'other principles such as: The principle of minimax, or the principle of Bayes, or else we consider only rules of decision based on intuitive methods such as the principle of the maximum of likelihood. We shall assume the family (P θ ), θ∈Θ dominated by a likelihood Q on (Ω, a). We shall denote by L, (θ) specific gravity p θ all over the report in Q. The estimator of the maximum of likelihood θ is the value of θ which maximizes L (θ,ω) at the sight of ω. It is easy to point out that this definition is wholly intuitive. A rule of decision δ M is called a minimax, if it minimizes (among all δ) the risk maximum that is: M sup inf (sup)(R( , )) (R( , )) θ δ θ δ = θ δ θ In the frame Bayesian, θ define it he am considered to be an unpredictable variable and a law of distribution is µ allocated. This law is called the a priori law. A rule of decision δ B is called the rule of Bayes, if it minimizes the risk of Bayes, is to say: where mathematical expectation E µ (R (θ,δ)) is calculated in comparison with the law µ.The expression r (µ,δ) is called the risk of Bayes. For a more detailed description of these rules, the reader can see (Berger, 1980) or (Fergusson, 1969). The three methods estimation which we have just represented does not give necessarily the same valuators for unknown parameter θ. Even in an asymptotic frame, see (Cam, 1953) for instance for a comparison of performances asymptotic performances of the methods of Bayes and of the methods of the maximum of likelihood. A Wald (1950) however showed that if the rule of minimax decision exists, it is also a rule of Bayesian decision in comparison with the most disadvantageous law. That is to say a rule of Bayes in comparison with the law which would maximize (among all law) the risk of Bayes. The purpose of the present study is to establish a result similar to that of A Wald between valuator of Bayes and valuator of the maximum of likelihood. We shall try to find a frame (Θ, D, C) for which valuator of Bayes and valuator of the maximum of likelihood are equal or asymptotically equal. Before undertaking the building of this frame, let us notice two important things. We reduce the problem by invariance. This simplification is motivated by the following considerations: In case of equality of Bayes estimator and estimate of the maximum of likelihood. We can say that a priori law µ of the parameter θ has not influence. In that case µ can be interpreted as a priori law "not informative. (Jeffreys, 1998) justifies the lack of information in µ by ownership of invariance See (Florens, 1978).
It is possible to obtain Bayesian resolutions for cost functions more general than those encountered in the model of statistical decision of WALD, where the cost function is expressed as the sum of two positive cost the first one relating to the observation performed and the second one in taken a final decision. We can consider a total cost C (θ, z, ω, t) of final decision z when the parameter is worth θ, after observation of realization ω up to time t See (Lanery, 1984). In our case, the noticed system is not dynamics, that is the observation is fixed in lasted and l the information disponible is expressed by the tribu α. We have a cost function: C : ( , , ) D C( , , ) θ δ ω ∈ Θ × × Ω → θ δ ω ∈ ℝ Description of results: In first party: we remind of all preliminary notions which we need such as; the statistical decision theory; topological groups; measure of Haar.
In second party: we compare the Bayesian solution and the solution of the maximum of likelihood when the space parameters are a metrizable compact group. We construct in anticipation a sequence of bounded cost functions. We prove the existence Bayesian solutions and solutions of the maximum of the likelihood under of regularity. We show finally that these two types of solutions are asymptotically identical. The sequence of cost functions is constructed using Uryshon Lemma. The measurability of solutions (Bayesian and of the maximum of likelihood) is established using the theorem of the measurement section of K. Kuratowski and Ryll-Nardzewski.

Preliminaries:
We recall here some preliminary notion such as theory of statistical decision theory, topological groups, Haar measure, which will be used in the next chapters. These notions are elementary, most of the proofs are omitted; they are given in order to the presentation clear. The mapping:

Recalls
where, Q is a positive step, σ-ƒ inite on (Ω, α) and if Q << S where v is a measurable positive σ-ƒ inite on (Ω, α) Then: We can always and it is often convenient for theoretical calculations to choose as the dominating measure of a probability (P θ ), θ∈Θ (Barra, 1971).

Statistical decision theory:
This section reminds certain notions of the theory of decision already mentioned in the previous chapter. We repeat it there with more details. We have: (Ω, α) a measurable space (space of observations). (Θ, τ) a measurable space (space of parameters). (D, D) a measurable space (space of possible decisions).
(P θ ), θ∈Θ a family of likelihood of transition on (Ω, α) defined on (Θ, τ). p θ (.) governs the observation ω in Ω when, θ is the value of the parameter. C a function measurable: where C (θ,δ,ω) represents the expense of decision δ,when the parameter is θ and when they observed ω. A rule of decision (or strategy) δ a mapping of (Ω, α) in (D, D) measurable; it consists in deciding δ (ω) having noticed ω. We shall denote by C (θ,ω,δ(ω)) the expense of decision δ(ω) when we noticed ω and when a parameter θ is and by ∆ the set of rules of decision. To solve a problem of statistical decision consists in choosing a rule of decision in ∆ according to some criteria acknowledged to be reasonable. We have the risk the function defined by: R( , ) C( , , ( ))p d .
3)Is to assume that the parameter space Θ, we have a probability µ, that weight between the possible values of the parameter. This is the Bayesian decision theory. In this case, we consider the Bayes risk defined as: Such situations are common in statistics. For more details, see (Fougereaud and Fuchs, 1967) and (Ulmo and Bernier, 1973) Topological groups: Just like that pointed out in the introduction, command us led to study a priori law meeting ownership of invariance; hence the importance to remind of some notions on the topological groups and of the measure of Haar. It is the aim of the present section. Let G a denotes a multiplicative group (not necessarily commutative) and K its neutral element.

Definition 2.3.1:
We say that a topology on G is compatible with the structure of group if both following mappings: a) the mapping: The mapping: Is continuous. A given group with a topology compatible with its structure of the group is called topological group.

Remarks 2.3.2:
(1) For any ∈ G, the translation to the left x→ax (respectively the translation to the right x → xa) is continuous. It is a homeomorphism of d on to itself.
For any a, b ∈ G, the mapping: Definition 2.3.6: Let G be a group and E a set. An operation (either operation to the left or action) of G on E is a mapping: such as: 1) If e is the neutral element of G, then we have: e. x x, x E = ∀ ∈ ∀s, t∈ G we have: We say that d operate on E.
Definition 2.3.7: Let G be a topological group and E a topological space. We shall say that G operating continuously in E if the mapping (s, x) → s. x is continuous.
Lemma 2.3.7: If G is a topological group operating continuously in a topological space E then. for any s∈G, the mapping: Is a homeomorphism.
Remarks 2.3.8: An operation to the right of a group G in a set E is a mapping: All definitions and ownership enunciated more high remain valid in case of an operation to the right.
Haar measure notations: Let G be a topological group operating continuously to left on a local level compact. We shall denote by: (γ (s) the homeomorphism of E in E (s), defined by: If µ is a measure defined on E, we shall denote by (γ(s) µ the measure image of µ by γ (s) .We have: where, A a measurable set of E.
Definition 2.4.1: Let ameasureon E. We say that µ is invariant by G if: (s) , s G γ µ = µ ∀ ∈ Remark 2.4.2: If G is a topological group operating continuously to the right on E. We shall denote by: δ(s) the homeomorphism of E in E, defined by: If µ is a measure defined on E, we shall denote by δ (s) µ measure picture of by δ (s) Definition 2.4.3: If µ a measure defined on E. We say that µ is invariant to the right by G if: , s G δ µ = µ ∀ ∈ If G is a local level compact group operating on to itself by translation to the left and to the right: Then we can define G on the notions of invariant measure left and right.
Definition 2.4.4: Let G be a locally compact group Haar measure we call the left (respectively right) of G a positive non-zero, σfine G invariant on the left (right resptivement. The existence and Unicity of such a measure is given by the following theorem. The proof of this result is given in (Halmos, 1974) Theorem 2.4.5: On any locally compact group, there is a Haar measure on the left (respectively right) and a near constant factor, that none exists.

Remark2.4.6:
If G is compact, there exists a Haar measure µ and one on G such that µ (G) = 1. It is called a normalized Haar measure

Multivoc mappings definitions and notations:
We use multivocs mappings to establish certain results It seems to us make a small paragraph. For more details the reader can consult (Berge, 1966).
Let X and Y two set and for any x∈X we match a subset Φ (x) of the Y .We say that correspondence: is composed of only one element, we say that Φ is a valued mapping of X in Y. Upper reverses Φ + : Theorem 2.5.1: We have: The proof is immediate. To end this party, we quote a theorem of Kuratowski and Ryll-Nardzewski, which will be used. Before let us give a the following definition.
Definition 2.5.2: A Polish space is a metrisable, separable space, on which there is a metric which is compatible with the topology for which l the space is complete.
Theorem 2.5.3: Let X a Polish space and (U, u) a measurable space. We consider this multifocal mapping of U in X such as: ∀u∈U, Φ (u) is closed and non empty in X. We assume that: For all G open in X. Then exists ϕ mapping measurable of (U, u) in (X, B X ) such as: For proof, the reader can consult (Kuratowski and Ryll-Nardzewski, 1965 Assumptions, ratings and results: H1: Let be (Ω, a) a measurable space (space of observations).
H2: Θ is the parameter space, we assume that Θ is a group (written multiplicatively) compact and metrizable. We denote by e its unit element, by the d distance of translation invariant and by τ the tribu of Borel on Θ. Let (ε n ) n be a sequence of positive real numbers such that: n n n 1 n 0 ( ) decrea sin g to 0 2 (diameter 1) The sequence (v n = B(e,(ε n )) n∈N is a neighborhood base of e such that: where, B * (e,ε n ) is the closed ball with center e and radius ε n .
H6: When θ is the unknown value of the parameter, it is estimated by d called final decision and belongs to D. We assume that D = Θ (current condition of regularity).

H7:
We assume the uniqueness of the solution of maximum likelihood ie: To proof that: It suffices to proof that: For any a closed in Θ because

Is measurable (it is a countable supremum of measurable functions).
It is the same for: There D F is a countable dense subset in F. Therefore: There exists a mapping γ n,ω of Θ in [0,1] continuous, with support in U n,ω and is equal to 1 in e. In other words we have:  Since we have to evaluate quantities such as: n C ( ,z, ) L( , ) ( )d θ ω θ ω ξ θ ω ∫ It is important that C n (.,., ω) be measurable in. To show that C n (.,.,ω) is measurable in ω, is to prove the measurability of ρ n (ω) par rapport ω, since C n (θ,z,ω) is written into the form (1.1). Prove then that the mapping ρ n of (Ω, a) in ( ) The problem amounts to proving that the two following mappings: ∀ω∈ Ω ∃ ∈ Θ γ θ θ ω µ θ = γ θ θ ω µ θ ∀ ∈ Θ

∫ ∫
It remains to prove that we can choose a n d ω measurable in ω. For this consider the following multivoc mapping: n n 1 n, n : We check the conditions of Theorem K. Kuratowsk and R. Nardwski same way as for the mapping Φ (Lemma 3.2.1). We can say that there is a measurable mapping ɶ n θ of (Ω, a) in (Θ, τ) such as: Remarks 1): The result holds for any sequence of function n θ ɶ of Ω in Θ non necessarily measurable such that: n n ( ) ( ) θ ω ∈ φ ω ɶ 2)We prove that θ is measurable, because it is the pointwise limit of measurable functions.
Asymptotic comparison between n θ ɶ and θ when the estimator θ is not unique: At this party, we generalize the Theorem 3.1.1, in the sense that we are considering more hypotheses H7.
There exists a γ n,ω of Θ in [0,1] continuous, to stand (à support) in B (e,ρ n (ω)) and is equal to 1 in e. In other word we have: It is clear that ∀n ∈ ℕ C n is continuous in θ and z. C n .bounded, invariant by translation, ie: n n C (x , xz, ) C ( ,z, ), θ ω = θ ω ∀θ ∈ Ω The measurability of C n in ω is deduced from that of the two following mappings:  The proof of (3.5.1) is in the appendix. Secondly: : ( , ) d( , ( )) ψ θ ω ∈ Θ × Ω → θ φ ω ∈ ℝ Is measurable. The proof of the measurability of Ψ is in the appendix. Then: ω → θ ω θ ∈ φ ω = θ ω θ ∈ φ ω ∩ = θ ω θ ∈ is also measurable. We have constructed a sequence of cost functions, which satisfies the conditions of the theorem. To end show that for this type of cost functions,

CONCLUSION
Our study deals with the maximum likelihood estimator as a limit Bayesian estimators. We think conceptually it is not one of the first times, leads a result concerning a sequence of cost. Usually the cost is an intrinsic data of the decision problem and within a Bayesian framework focuses attention on the dependence of the optimal solution to the law a priori. For us this law is fixed and what is the cost function that varies.
where g and g' are defined by: Defines a distance in k (θ) called the Hausdorff distance. The topology induced by δ is as defined above Θ is a metric space sees (Christensen, 1974). Show that the mapping defined below is measurable: