Learning Objectives¶
Understand probability spaces: Define sample spaces, events, and probability measures; apply the three axioms of probability to verify valid probability assignments.
Understand the definition of a random variable as a deterministic function mapping outcomes to real numbers.
Work with distribution functions: Compute and interpret cumulative distribution functions (CDFs) and probability density functions (PDFs); classify random variables as discrete, continuous, or mixed.
Calculate moments: Compute expectation (mean), variance, standard deviation, and higher moments; interpret these as summaries of a distribution’s location, spread, and shape.
Analyze dependence: Calculate covariance and correlation; recognize that zero correlation does not imply independence; apply properties of independent random variables to simplify calculations.
Apply conditional probability: Compute conditional distributions and expectations.
Use Bayes’ Rule: Interpret prior, likelihood, and posterior; apply Bayes’ Rule to update probabilities when new information arrives.
Motivation¶
Engineering, physics, and applied mathematics constantly face uncertainty. Probability provides a quantitative language for describing, analyzing, and predicting outcomes in the presence of uncertainty.
Some physicists believe that the universe itself is fundamentally probabilistic at the quantum level. Einstein famously protested: “God does not play dice with the universe.” But modern physics suggests that, in some situations, nature does roll the dice. Even if we prepare the same system in the same way, the outcome may vary. You don’t need to know any quantum mechanics for this course—the point is simply: Some randomness is believed to be built into the laws of physics.
Other kinds of uncertainty come from systems that are deterministic but too complex to predict exactly. Example: Rolling a die A die obeys Newton’s laws. But tiny changes in position, velocity, friction, or air flow make the outcome unpredictable. So we treat the result as a random variable. Geophysics examples:
Unknown small‑scale Earth heterogeneity
Irregularities in seismic sources
Environmental and instrument noise
Complex Earth processes that amplify tiny variations
Even when the physics is deterministic, we model our lack of knowledge using probability.
In summary, Probability is useful whether nature is truly random or merely too complicated. Either way, we have to “play dice” in our models.
Probability Spaces and Axioms¶
This section introduces the mathematical foundation of probability.
Sample Space ¶
The sample space is the set of all possible outcomes of an experiment.
Rolling a die:
Toss two coins:
Choosing a random day of the week:
The sample space should include every outcome that could happen, and nothing else.
Also note that can contain non-numerical values. This is a description of the real world we like to model.
Events as Subsets of ¶
An event is any subset of the sample space.
Examples (die roll):
“Even number”:
“Number greater than 4”:
The empty event (impossible):
The sure event (always happens):
Operations on events follow set operations:
Union:
Intersection:
Complement:
In a finite or countable setting, every subset is a valid event.
Probability Measure¶
A probability measure assigns a number to each event , following three axioms (Kolmogorov, simplified). Note that takes a subset of (an event) as input, not individual elements .
Axiom 1: Non-negativity
Axiom 2: Normalization
Axiom 3: Additivity (Simplified)
A Note on the Uncountable Case¶
So far we have focused on finite or countable sample spaces, where probabilities are assigned by listing the probability of each individual outcome. However, many real-world quantities in geophysics (and physics in general) take real-number values, so the sample space becomes uncountable, typically a subset of or .
This requires more structure than in the discrete case. If done without care, one can run into contradiction and paradoxes. We will not be dealing with the mathematical complexities in this course. Despite skipping the full measure‑theoretic machinery, everything we do in this course is completely rigorous for the kinds of distributions and integrals used in geophysics and engineering. The advanced theory is only needed for pathological cases we will rarely encounter, so the tools we use remain fully sound and mathematically correct.
Random Variables¶
Random variables allow us to translate uncertain outcomes into numerical quantities that we can analyze mathematically. They provide the connection between abstract probability spaces and the real-valued measurements used in geophysics and engineering. Although may be abstract, is a real number we can compute with.
Definition¶
A random variable is a function
assigning a real number to each outcome .
A random variable is often misunderstood because of the word random. Mathematically, a random variable is not itself random—it is a deterministic function from the sample space to the real numbers,
The randomness comes entirely from the underlying experiment, which selects an outcome according to the probability measure . Once is fixed, the value is completely determined. The role of a random variable is simply to assign numerical values to outcomes. All probability statements about a random variable (such as ) are really probability statements about the underlying set of outcomes
Thus, a random variable is simply a function—its “randomness” comes entirely from the randomness of the experiment that selects , not from the function itself.
Connecting Outcomes to Probabilities¶
Once we have a random variable , statements about real numbers like “” or “” correspond to events in the original probability space:
The probability of these events is inherited from on . This is how we build a distribution for —the assignment of probability to intervals and sets of real numbers. The formal description of this distribution is the subject of the next section.
Distribution Functions¶
The distribution of a random variable describes how probability is assigned to different real values. We formalize this through the cumulative distribution function (CDF) and, when it exists, the probability density function (PDF).
Cumulative Distribution Function (CDF)¶
For any random variable , the cumulative distribution function (CDF) is defined by
Universal Properties:
is non‑decreasing: if , then .
Limits: and .
Right‑continuity: .
Every random variable — discrete, continuous, or mixed — has a CDF. The CDF fully determines the probability distribution.
Probability Density Function (PDF)¶
If the CDF of is differentiable, can be described by a probability density function (PDF) satisfying
and
Computing Probabilities from PDF:
Interval probabilities: .
Individual points: for all (continuous distributions have no point masses).
Discrete, Continuous, and Mixed Random Variables¶
We classify by how its probability is distributed over , as revealed by its CDF.
Discrete¶
is discrete if it places probability on a finite or countable set :
The CDF is a step function with jumps at the atoms , where each jump has size
Continuous¶
is continuous if its CDF is differentiable and has a probability density function such that
For such a distribution, all individual points have zero probability:
Mixed¶
is mixed if it has both discrete (jump) and continuous parts. The CDF has the form
where:
Jumps represent discrete masses (point probabilities).
The integral represents the continuous part (with density ).
Example: Travel time in seismology may be continuous when a wave arrives, but with some probability the wave fails to arrive, contributing a discrete mass at (or equivalently, a point mass for “no arrival”).
Moments¶
Moments summarize key numerical properties of a random variable. They describe averages, variability, and dependence, and are central to modeling uncertainty in geophysics and engineering.
Expectation¶
The expectation (or mean) of a random variable , written , is the “average value” of under repeated sampling.
For any random variable,
where is the CDF of .
is interpreted as a Riemann–Stieltjes integral. It automatically covers discrete, continuous, and mixed cases. We will expand in these cases below.
Discrete case¶
If takes values (finite or countable) with point probabilities , then
Continuous case (with a PDF)¶
If has a probability density function then
Mixed case¶
If has atoms at with masses and a continuous part with density , then
Expectation is linear¶
That is:
Second Moment¶
One can easily generalize the expectation to define the second moment of
For discrete :
For continuous with PDF :
The second moment captures the “typical squared magnitude” of . It plays a central role in defining variance.
Variance and Standard Deviation¶
The variance measures spread around the mean. It is the second central moment (i.e., the second moment about the mean):
By expanding the square and using linearity of expectation, we obtain a useful computational formula in terms of the first and second moments:
Interpretation: Variance quantifies how “spread out” the values of are around the mean. A small variance means values cluster tightly; a large variance means wider dispersion.
The standard deviation is the square root:
Standard deviation has the same units as , making it easier to interpret physically (e.g., “typical deviation from the mean”).
Scaling Properties of Variance and Standard Deviation¶
An important and frequently used property is how variance and standard deviation scale when a random variable is multiplied by a constant.
For any constant and random variable :
Example: If a measurement has standard deviation meters, and you measure the same quantity in centimeters (multiplying by ), the new standard deviation is centimeters, and the variance becomes .
Higher Moments¶
More generally, the ‑th moment of is
The ‑th central moment (moment about the mean) is
Examples:
: First moment = mean ()
: Second central moment = variance ()
: Third central moment relates to skewness (asymmetry)
: Fourth central moment relates to kurtosis (tailedness)
Higher moments quantify shape features of the distribution beyond location and spread.
Covariance and Correlation¶
For random variables and with finite second moments, the covariance is
Equivalent form:
Covariance measures whether and tend to increase together (positive), move oppositely (negative), or behave independently (zero for many but not all cases).
The correlation coefficient is the normalized covariance:
where and are the standard deviations.
Correlation satisfies:
: perfect linear relationship
: uncorrelated (not necessarily independent)
Random Variables in Multiple Dimensions¶
Many problems in geophysics and applied mathematics involve multiple uncertain quantities at once. For example:
The horizontal and vertical components of velocity vector.
Velocity and density in an Earth model.
Travel time and amplitude of an arrival.
To model such situations, we introduce random vectors and their joint distributions.
Random Vectors¶
A random vector is a function
Just as a single random variable is a deterministic function of , a random vector assigns an –tuple of real numbers to each outcome. All randomness still comes from the selection of .
We will focus mainly on the two‑dimensional case , but all definitions extend naturally to higher dimensions.
Joint Distributions¶
Joint Cumulative Distribution Function (Joint CDF)¶
For two random variables , the joint CDF is
This fully characterizes the joint distribution of .
Properties:
Non‑decreasing in each argument.
.
.
Right‑continuous in each variable.
Joint Probability Density Function (Joint PDF)¶
If the joint CDF is differentiable, we can define the joint PDF
The joint PDF satisfies:
and the normalization condition:
Marginal Distributions¶
The individual (1D) distributions of and are obtained by integrating out the other variable from the joint PDF.
Marginal PDFs¶
If has joint PDF , then:
Marginal PDF of :
Marginal PDF of :
Interpretation:
The joint PDF describes the full 2D distribution, while a marginal PDF gives the distribution of one variable considered on its own.
Independence¶
Definition via Joint CDF¶
Random variables and are independent if and only if
Equivalent Definition via Joint PDF¶
If densities exist, independence is equivalent to:
Independence means: knowing one variable tells you nothing about the other.
Important Example of Dependence Without Correlation¶
It is possible for and to be dependent even if . Thus:
Zero covariance does not imply independence.
Consequences of Independence¶
Independence dramatically simplifies many computations.
Expectation of Sums¶
Using linearity of expectation:
Expectation of Products (requires independence)¶
If and are independent:
This is not generally true without independence.
Variance of sum of random variables (requires independence)¶
If random variables are independent, then the variance of their sum is the sum of their variances:
This is not generally true without independence.
Conditional Probability¶
Conditional probability allows us to update or refine probability assessments when new information becomes available. It is fundamental in statistics, inversion theory, and Bayesian inference used throughout geophysics.
Conditional Probability of Events¶
For events and with , the conditional probability of given is
Interpretation: We restrict our sample space to (the information we now know has occurred), then renormalize the probability of within this reduced space.
Equivalently,
This identity is often used as the starting point for defining conditional densities.
Conditional PDFs¶
Let be jointly continuous with joint PDF and marginal PDF .
The conditional PDF of given is defined by
This parallels the discrete formula
but in density form.
It has the properties
Normalization
Relation to joint and marginal PDFs
Independence If and are independent, then
so conditioning on has no effect on the PDF of .
Conditional Expectation¶
The conditional expectation of given is the expectation taken with respect to the conditional PDF:
This defines a function of the variable .
In many statistical applications (including inversion and filtering), this function acts as the “best predictor” of given knowledge of .
We note the key identities
Law of total expectation
Independence If and are independent,
(conditioning on provides no information about ).
Bayes’ Rule¶
Conditional probability allows us to update beliefs when new information becomes available.
Bayes’ Rule formalizes how to reverse the conditioning: it expresses in terms of .
Bayes’ Rule¶
For events and with ,
This follows directly from the definition:
Likelihood, Prior, and Posterior¶
Bayes’ Rule is often interpreted in terms of three components:
Prior
Your initial belief about event (A) before seeing new evidence.Likelihood
How probable the evidence (B) is if (A) were true.Posterior
Your updated belief about (A) after observing (B).
Bayes’ Rule becomes:
The denominator
acts as a normalization constant.
Example: Medical Testing and Base Rates¶
Medical tests illustrate Bayes’ Rule clearly—especially how rare events can dramatically affect posterior probabilities.
Suppose:
Prior: the prevalence of this disease in the general population (1% of people have it)
Likelihood: test sensitivity (probability of a positive result if the patient has the disease):
Test false positive rate (probability of positive if patient does NOT have the disease):
Posterior: We want the probability the patient actually has the disease given a positive test:
Plug in the numbers:
Prior:
Complement:
Likelihoods: 0.95 and 0.05
Compute:
Interpretation¶
Even though the test is quite accurate, a positive result gives only a 16% chance of actually having the disease.
This is because the disease is rare, and most positives come from false positives among the 99% of healthy patients.
This phenomenon—counterintuitive but ubiquitous—is the base-rate effect.
Key Points¶
Foundations
Probability is a quantitative framework for reasoning about uncertainty—whether from fundamental randomness, measurement noise, or complexity too large to predict.
A sample space contains all possible outcomes of an experiment; events are subsets of ; a probability measure assigns probabilities to events following three axioms (non-negativity, normalization, additivity).
Random Variables and Distributions
A random variable is a deterministic function mapping outcomes to real numbers. All randomness comes from selecting , not from itself.
The cumulative distribution function (CDF) fully determines a distribution; every random variable has one.
If the CDF is differentiable, the probability density function (PDF) describes how probability is distributed over .
Discrete random variables concentrate probability on a countable set; continuous ones have a PDF; mixed variables have both.
Moments and Dependence
Expectation is the mean; variance measures spread; standard deviation has the same units as .
Covariance measures whether two variables increase/decrease together; correlation normalizes covariance to —but zero correlation does not imply independence.
Independence means or equivalently . Independence allows product rules: and .
Conditioning and Bayes’ Rule
Conditional probability and conditional distributions represent updated beliefs when information becomes available.
Bayes’ Rule reverses conditioning: prior (initial belief) likelihood (data under hypothesis) evidence (marginal probability of data) posterior (updated belief).
Base-rate effect: Even accurate tests produce misleading posteriors if the prior probability is very low, because false positives among the large population of unaffected individuals outnumber true positives.