Notes on Probability Remarks: For academic use only Probability theory These notes will take for granted some familiarity with abstract(Lebesgue) integra tion theory. For further reading, see Chung(2001) 1 Probability spaces and random variables A probability space(2, F, P) where S is a non-empty set,F is a a-algebra of subsets of Q and P: F-0, 1 is a( positive) measure space such that An event is a set F∈F
Notes on Probability Remarks: For academic use only
A random variable X: Q-R is an F-measurable real-valued function on Q A random variable is said to have a property almost surely (P) if it has the property on a set F with P(F)=l. We often abbreviate and write as The expected value E[X] of a random variable X is defined via E[X]=/X(w)dP()=/X(w)P(dw) 1.1 Distribution measures and distribution functions The distribution measure of a random variable is a measure on the borel o algebra of subsets of R that tells you what the probability is that XUEBCR That is, for any borel set B C R H(B)=P(X(B)=P({u∈9:X(u)∈B}) Remark. We will sometimes write X B) when we mean X-(B) The distribution function F: R-0, 1 of a random variable X is defined via Fx(x)=(-∞,x]) That is, Fx(a)is the probability that X(w)sa If Fx is absolutely continuous, then it has a density f R-R+ such that F(x)=/f()d In particular, if Fx is differentiable everywhere, then fx(a)=Fx()
A random variable X : Ω → R is an F-measurable real-valued function on Ω. A random variable is said to have a property almost surely (P) if it has the property on a set F with P(F) = 1. We often abbreviate and write a.s. The expected value E[X] of a random variable X is defined via E[X] = Z Ω X(ω) dP(ω) = Z Ω X(ω) P(dω). 1.1 Distribution measures and distribution functions The distribution measure of a random variable is a measure on the Borel σ- algebra of subsets of R that tells you what the probability is that X(ω) ∈ B ⊂ R. That is, for any Borel set B ⊂ R, µ(B) = P ¡ X −1 (B) ¢ = P({ω ∈ Ω : X(ω) ∈ B}). Remark. We will sometimes write {X ∈ B} when we mean X−1 (B). The distribution function F : R → [0, 1] of a random variable X is defined via FX(x) = µ ((−∞, x]). That is, FX(x) is the probability that X(ω) ≤ x. If FX is absolutely continuous, then it has a density f : R → R+ such that F(x) = Zx −∞ f(y) dy. In particular, if FX is differentiable everywhere, then fX(x) = F 0 X(x). 2
1.2 Information and g-algebras When considering o-algebras g C f one may interpret g as the amount of avail- able information. Intuitively, if our information is given by g, we can distinguish between the events in g in the sense that for any event G E g we know with perfect certainty whether or not it has occurred. Given this, it makes sense to say that if g C H, then H contains no less information than g. Also, it is tempting to say that g=singletons) corresponds to full information since it should enable us to tell exactly what w has been drawn. But this turns out to be an awkward way of defining full information in general although admittedly it makes perfect sense when Q is a finite set. Instead, we will define full information as g= f, since then our information enables us to forecast perfectly the realized value of every random variable. Finally, we will say that g=Q, 0)(the trivial a-algebra)corresponds to o information Alternatively, we might tell the following story. Suppose our a-algebrag is generated by a finite partition IP. (i) Someone(Tyche, the norns, the dean, or whoever it is)chooses an outcome w E Q without telling us which (ii)While we don' t know which w E Q has been chosen, we are, however, told(by an oracle, Hugin Munin, or the gazette or whatever)in which component Pk E P w lies. In practice, this could be arranged by allowing us to observe a stochastic variable defined via )=∑kIn()
1.2 Information and σ-algebras When considering σ-algebras G ⊂ F one may interpret G as the amount of available information. Intuitively, if our information is given by G, we can distinguish between the events in G in the sense that for any event G ∈ G we know with perfect certainty whether or not it has occurred. Given this, it makes sense to say that if G ⊂ H, then H contains no less information than G. Also, it is tempting to say that G = σ {singletons} corresponds to full information since it should enable us to tell exactly what ω has been drawn. But this turns out to be an awkward way of defining full information in general although admittedly it makes perfect sense when Ω is a finite set. Instead, we will define full information as G = F, since then our information enables us to forecast perfectly the realized value of every random variable. Finally, we will say that G = {Ω, ∅} (the trivial σ-algebra) corresponds to no information. Alternatively, we might tell the following story. Suppose our σ-algebra G is generated by a finite partition P. (i) Someone (Tyche, the norns, the dean, or whoever it is) chooses an outcome ω ∈ Ω without telling us which. (ii) While we don’t know which ω ∈ Ω has been chosen, we are, however, told (by an oracle, Hugin & Munin, or the Gazette or whatever) in which component Pk ∈ P ω lies. In practice, this could be arranged by allowing us to observe a stochastic variable defined via X (ω) = Xn k=1 kIPk (ω) . (1) 3
o Hesh this out a little bit more, you may want to think that getting more in- formationin this context would correspond to having a 'finer'partition, where a partition Q finer than I arises from chopping up the components of P. It follows of course, that o(P)Co(Q), which was our original(and more general) definition of‘ more information In any case, notice that the axioms that characterize a o-algebra accord well with our intuitions about information. Obviously, we should know whether Q2, since it always occurs by definition. Also, if we know whether A, we should know whether not-A too. Similarly if we know whether A and whether B, we should know whether AUB. Countable unions are perhaps a little trickier to motivate intuitively; they are there essentially for technical reasons. In particular, they allow us to prove various limit theorems which are part of the point of the Lebesgue theory In economic modelling, it is plausible to allow decisions to depend only upon the available information. Mathematically, this means that if the agent's information is given by g, then her decision must be a g-measurable random variable. The interpretation of this is that the information in g suffices to give us perfect knowledge of the decision. Thus when it is time for the agent to act, she knows precisely what At this stage it is worth thinking about what it means for a stochastic variable X to be g-measurable. Intuitively, it means that the information in g suffices in order to know the value X(w). To make this more concrete, suppose that g is generated by a partition P. Then for X to be g-measurable, X has to be constant on each element Pk E P. It follows that knowing which element Pk has occurred is enough
To flesh this out a little bit more, you may want to think that getting ‘more information’ in this context would correspond to having a ‘finer’ partition, where a partition Q finer than P arises from chopping up the components of P. It follows, of course, that σ (P) ⊂ σ (Q), which was our original (and more general) definition of ‘more information’. In any case, notice that the axioms that characterize a σ−algebra accord well with our intuitions about information. Obviously, we should know whether Ω, since it always occurs by definition. Also, if we know whether A, we should know whether not-A too. Similarly, if we know whether A and whether B, we should know whether A∪B. Countable unions are perhaps a little trickier to motivate intuitively; they are there essentially for technical reasons. In particular, they allow us to prove various limit theorems which are part of the point of the Lebesgue theory. In economic modelling, it is plausible to allow decisions to depend only upon the available information. Mathematically, this means that if the agent’s information is given by G, then her decision must be a G-measurable random variable. The interpretation of this is that the information in G suffices to give us perfect knowledge of the decision. Thus when it is time for the agent to act, she knows precisely what to do. At this stage it is worth thinking about what it means for a stochastic variable X to be G-measurable. Intuitively, it means that the information in G suffices in order to know the value X (ω). To make this more concrete, suppose that G is generated by a partition P. Then for X to be G-measurable, X has to be constant on each element Pk ∈ P. It follows that knowing which element Pk has occurred is enough 4
to be able to tell what the value of X(w) must be As a further illustration of the fact that o-algebras do a good job in modelling information, we have the following result Definition. Let IXa, aeI be a family of random variables. Then the a-algebra generated by (Xa,CEIl, denoted by o(Xa, aE I is the smallest a-algebra g such that all the random variables in Xa, aE I are g-measurable Remark. Such a a-algebra exists. (Recall the proof: consider the intersection of all o-algebras on Q such that (Xa, cE I are measurable. Proposition. Let X=X1, X2, . Xn be a finite set of random variables. Let Z be a random variable. Then Z is o[X]-measurable iff there exists a Borel measurable function f:Rn→ R such that, for all w∈9, Z(u)=f(X1(u),X2(u),…,Xn(u) Proof. The case when oX is generated by a finite partition (i.e. when the mapping T:0-Rn defined via T(w)=(X1, X2,. Xn) is F-simple) is not too hard and is left as an exercise. For the rest, see Williams(1991).D 2 The conditional expectation Intuitively, the conditional expectation is the best predictor of the realization of a random variable given the available information. by "best" we will mean the one that minimizes the mean square error
to be able to tell what the value of X (ω) must be. As a further illustration of the fact that σ-algebras do a good job in modelling information, we have the following result. Definition. Let {Xα, α ∈ I} be a family of random variables. Then the σ-algebra generated by {Xα, α ∈ I}, denoted by σ {Xα, α ∈ I} is the smallest σ-algebra G such that all the random variables in {Xα, α ∈ I} are G-measurable. Remark. Such a σ-algebra exists. (Recall the proof: consider the intersection of all σ-algebras on Ω such that {Xα, α ∈ I} are measurable.) Proposition. Let X = {X1,X2, ..., Xn} be a finite set of random variables. Let Z be a random variable. Then Z is σ {X}-measurable iff there exists a Borel measurable function f : R n → R such that, for all ω ∈ Ω, Z (ω) = f (X1 (ω), X2 (ω), ..., Xn (ω)). (2) Proof. The case when σ {X} is generated by a finite partition (i.e. when the mapping T : Ω → R n defined via T (ω) = (X1,X2, ..., Xn) is F-simple) is not too hard and is left as an exercise. For the rest, see Williams (1991). 2 The conditional expectation Intuitively, the conditional expectation is the best predictor of the realization of a random variable given the available information. By “best” we will mean the one that minimizes the mean square error. 5