Bayesian Framework in Repeated-Play Decision Making

Problem statement: There have been much reported on decisions from exp erience, also referred to as decisions in a complete ignorance fa shion. Approach: This note lays out a Bayesian decision-theoretical framework that provides a comp utable account for decisions from experience. Results: To make the framework more tractable, this note set s up and examines decisions in an incomplete ignorance fashion. The current discussio n asserts that well-known behavioural effects, such as the hot stove effect and the Bayesian framework may lead to different predictions. Conclusion/Recommendations: The framework is applied to the continuity form to predict a possibility from their experience. We conclude that the reasonable prediction is sometimes leads them to the unreasonable conditions.


INTRODUCTION
The Bayesian decision-theoretical framework is useful to examine behavioural tendencies in decisions under ambiguity. A concept of the Bayesian framework is one of normative framework that are ubiquitously used by behaviourists to provide a computable account for behavioural tendencies in "decisions from experience". It asserts that the Bayesian decision maker's ultimate goal is to judge the likelihood of events by updating her/his subjective probabilities in the face of new evidence as a result of sequential search process (Fujikawa, 2007).
An area of decisions from experience (also referred to as "decisions in a complete ignorance fashion") is fast moving. Many research based on laboratory experiments has been presented ( Barron and Erev, 2003;Hertwig et al., 2004;Weber et al., 2004;Erev and Barron, 2005;Yechiam and Busemeyer, 2006;Fujikawa, 2009; Barron and Yechiam, 2009). For example, Table 1 shows two choice problems presented in Fujikawa (2009), where the participants chose, at each period t (t = 1, 2, …, 400), between two unmarked buttons that provided outcomes sampled from two distributions, "R" and "S". Let (v, p) denote a distribution, where the outcome v occurs with probability p (otherwise zero). The right hand column of Table 1 (P R ) shows the aggregated proportion of R choices over 400 trials. The maximisation rate over the 400 trials was 0.28 in Problem 2, for example. This result suggested that deviations from maximisation (i.e., the participants' less selection of R) in a state of complete ignorance were the consequence of the "hot stove effect" that could lead to a bias toward Fujikawa (2009) indicated the existence of the hot stove effect in decisions from experience with analysing results of his experiments, involving the state of complete ignorance.
The participants did not receive prior information on the payoff structure, but received the feedback that was limited to obtained payoffs at each round t. That is, the experiments was run on a state of complete ignorance, where they were disclosed neither possible payoffs nor its likelihoods. The apparatus in this state, however, seems to have challenging in examining the existence of the hot stove effect in light of the Bayesian framework that combines prior information on the payoff distributions with data to obtain a posterior estimate. The participants who were in the state of complete ignorance were likely to fail to use Bayesian framework, as they were not provided with any prior information on the payoff distributions.
This note extends Fujikawa (2009) by laying out the Bayesian decision-theoretical framework that accounts for decisions from experience. Instead of a state of complete ignorance employed by Fujikawa (2009), we shall here employ a state of "incomplete ignorance" in which the Decision Makers (DMs) can obtain posterior estimates calculated by the Bayesian framework. A state of incomplete ignorance is defined as one, where the DMs are disclosed possible payoffs of available options, but not disclosed likelihood of payoffs. Making possible payoffs available to the DMs (participants in laboratory experiments) could allow them to update data through the Bayesian framework.

MATERIALS AND METHODS
A state of complete ignorance: We now define a state of complete ignorance as the state, where the DM's prior probability distribution (i.e., the distribution at t = 0) is uniform. A state of complete ignorance was experimentally manipulated in previous studies on decisions from experience introduced above. The authors used pairwise choice problems, such as the following Problem X: • Problem X. Choose between • A y : γ points with probability of P * AX ; 0 otherwise • B y : θ points with probability of P * BX ; 0 otherwise We let γ, θ > 0, P * AX , P * BX E [0,1], PP * Ay > ,P * BX . The DM is usually not provided with any prior information on the payoff structure and she is repeatedly asked to make decisions, relying on the obtained feedback in the situation in the past. Thus, it is unknown to the DM that one selection of P * AX (P * BX ) yields γ (θ) points with probability of P * AX (P * BX ) and zero point with 1-P * Ay (1-P * BX ). It is, however, known to her that one selection of each option yields certain payoffs with unknown probabilities. Suppose that the DM is asked to choose either A X or B X t (0 ≤ t ≤ T) times in Problem X. Given that each of the mutually exclusive and exhaustive outcomes x is equally likely, the prior probability distribution of A X is: In case of a state of complete ignorance, x 1 may largely vary among the DMs, as they do not have any prior information on possible outcomes and probabilities. For example, some participants in Fujikawa (2009) were likely to have high x 1 , while others low. Thus, it seems that a prior probability distribution on A X would largely vary among the participants.
A state of incomplete ignorance: We here aim at introducing the Bayesian framework to examine behavioral tendencies in decisions from experience. For this aim, let us present a state of incomplete ignorance that is concerned with the situation, where the DMs have incomplete information on the payoff structure. Let us consider the following Problem Y: • Problem Y. Choose between • A Y : γ points with probability of P * AY ; 0 otherwise • B Y : θ points with probability of P * BY ; 0 otherwise We let γ, θ > 0, P P * AY , P * BY E [0,1]. The DM makes a choice between A Y and B Y at each period t (t=1, 2, …, T). She is informed of possible payoffs of each option, but not informed of corresponding probabilities. That is, she knows that one selection of A Y (B Y ) yields γ (θ ) points with an unknown probability. Thus, the density function of her a priori formulated beliefs is: A goal of Bayesian DMs is to compute a posteriori probabilities from a priori probabilities. Since the DM does not possess the available objective prior information on P * AY , she is to compute a posteriori probabilities of P * AY from a priori probabilities of P * AY and her experienced probability P exp . Below, we shall apply a theoretical framework to the case of P * AY , as the same argument holds for the case of P * BY . For computing the a posteriori probabilities, we consider the case of n events: We define an event R k (k = 1, 2, …, n) as an event that P * AY falls within region k ( (k-1)/n ≤ P * AY < k/n) at period t. Each R k has a probability P t (R k ) = S t (k)/n that defines the likelihood of R k . Each event is exclusive to the all of the other events. Thus: t y x 1,2,...,n, y 1,2,....,n, x y : Note that: As P * AY belongs to one of the regions. For example, if n=2, there are two possible events that are R 1 (0≤P * AY <0.5) and R 2 (0.5≤p * AY ≤1) and their probabilities are P t (R 1 ) = S t (1)/2 and P t (R 2 ) = S t (2)/2. The P * AY belongs to either of the region, so the sum of the probabilities is P t (R 1 ) + P t (R 2 )= P t (R 1 U R 1 ) = 1. By definition, we can calculate the probability of the event P t+1 (W t+1 ) at period t+1, where W t+1 is an event that the highest payoff, γ, being realised at the period t+1. To calculate the probability P t+1 (W t+1 ), we separate it to P t+1 (W t+1 | R k ) when an event R k occurs. From Eq. 1 and 2, we can calculate P t+1 (W t+1 ) as follows Eq. 3: Now we estimate and calculate P t+1 (W t+1 | R k ). We approximate this as follows Eq. 4: Having observed the result of period t + 1 sequence, the DM updates the possibility S t (k)/n of an event R k by using the Bayesian framework. Thus Equation 5 is discrete and we apply it to continuous representations. By so doing, we get Eq. 6: By using infinity, we assume S t (k)/n to be a continuous probability distribution. Then, we define the continuous probability density function p t (x) with the following properties Eq. 7 and 8: where, k=1,2, ..., n and x is the imaginary possibility of the P * AY . Thus, Eq. 6 is Eq. 9: (1 )p ( ) n n n lim j j 1 (1 )p ( ) n n n n ifW isrealised From Eq. 8, we have: (1 x)p (x) y(1 y)(y)dy ifW isrealised As shown in Eq. 8, p t (x) is a probability density of P t (R k ), so that Eq. 11: Without losing much generality and accuracy, it can be said that the DMs are updating the probability density p t (x) from experienced results. Simply put, we define Eq. 12: Equation 10 is transformed as Eq. 13: f (y)p (y)dy Fig. 1: The process of probability dencity change in case of realised W t+1 and t 1 W + . We label the axes: P * AY on xaxis and p t (x) on y-axis Equation 13 holds true at any time period t, thus: Letting r be a total number of times of the highest payoff realised, we have Eq. 14: Finally, we obtain: Thus, p t+1 (x) only depends on P exp (i.e., the proportion of the highest payoff realised) at period t and initial probability density of p 1 (x). If we assume the initial probability density function has the constant value as p 1 (x)=1 then Eq. 15 only depends on P exp as Eq. 16: xy(tp )(1 y) (1 p )dy Figure 1 shows a process of the earlier stage: It shows how the probability density changes as a result of an alternative case in each trial. At period t = 1, there is no prior information on the probability of the highest payoff, so that its density is constant as p 1 (x) = 1. After the decision is made at t = 1, the DM will face either of the two events in choosing an option at t = 2:

RESULTS
(1) an event that the highest payoff is realised (i.e., an event W 2 ); (2) an event that the lowest payoff is realised (i.e., 2 W ). If the highest payoff is realised, the probability density function is updated as p 2 (x) = 2× by the result. If, on the other hand, the lowest payoff is realised, the probability density function is updated as p 2 (×) = 2−2× by the result. At t = 3, the following three cases are possible: first, p 3 (x) = 3x 2 if events W 2 and W 3 are realised.
Second, p 3 (x) = 6×(1-x) if either of the two conditions is met: (i) a condition that 2 W and W 3 are realised; (ii) a condition that W 2 and 3 W are realised. Third, p 3 (x) = 3(1−x) if 2 W and 3 W are realised. Figure 2 shows the earlier stage of the population histogram for the case of P * AY = 0.5. In this case, the population of the DMs are divided into two symmetrical situations: (1) a situation, where the lucky DMs face an event W and (2) a situation, where the unlucky DMs face an event W . Stochastically and approximately, the population is gathered to the center of the histogram, predicting the probability density function as similar to Gaussian.

DISCUSSION
On the other hand, Fig. 3 shows the earlier stage of the population histogram for the case of P * AY = 0.1. In this case, the population of the DMs are divided into the asymmetrical situation.
Stochastically and approximately, the population is gathered to P exp = 0.1, predicting the probability density function with its maximum value at x = 0.1. From these aspects, by using the Bayesian framework, we can predict P * AY intuitively by the mathematical background without any previous information and its coefficient of confidence stochastically depends on its trial number which increases the value of it.

CONCLUSION
An attention is to be given to future laboratory experiments to show robustness of the Bayesian decision-theoretical framework developed in this note, which could provide an alternative account for behavioural tendencies in decisions from experience. For example, Fujikawa (2009) presented experiments on decisions from experience and discussed the existence of the hot stove effect, the predictions of which are different from predictions implied by the Bayesian framework. Future experimental work will help us document whether people often make choices predicted by well-known behavioural effects (e.g., the hot stove effect and the effect of curiosity), or choices predicted by the Bayesian framework.