\documentclass[a4paper]{article}

\def\npart {III}
\def\nterm {Lent}
\def\nyear {2018}
\def\nlecturer {R.\ Bauerschmidt}
\def\ncourse {Stochastic Calculus and Applications}

\input{header}

\begin{document}
\maketitle
{\small
\setlength{\parindent}{0em}
\setlength{\parskip}{1em}
\begin{itemize}
  \item \textit{Brownian motion.} Existence and sample path properties.
  \item \textit{Stochastic calculus for continuous processes.} Martingales, local martingales, semi-martingales, quadratic variation and cross-variation, It\^o's isometry, definition of the stochastic integral, Kunita--Watanabe theorem, and It\^o's formula.
  \item \textit{Applications to Brownian motion and martingales.} L\'evy characterization of Brownian motion, Dubins--Schwartz theorem, martingale representation, Girsanov theorem, conformal invariance of planar Brownian motion, and Dirichlet problems.
  \item \textit{Stochastic differential equations.} Strong and weak solutions, notions of existence and uniqueness, Yamada--Watanabe theorem, strong Markov property, and relation to second order partial differential equations.
\end{itemize}
\subsubsection*{Pre-requisites}
Knowledge of measure theoretic probability as taught in Part III Advanced Probability will be assumed, in particular familiarity with discrete-time martingales and Brownian motion.
}
\tableofcontents

\setcounter{section}{-1}
\section{Introduction}
Ordinary differential equations are central in analysis. The simplest class of equations tend to look like
\[
  \dot{x}(t) = F(x(t)).
\]
\emph{Stochastic} differential equations are differential equations where we make the function $F$ ``random''. There are many ways of doing so, and the simplest way is to write it as
\[
  \dot{x}(t) = F(x(t)) + \eta(t),
\]
where $\eta$ is a random function. For example, when modeling noisy physical systems, our physical bodies will be subject to random noise. What should we expect the function $\eta$ to be like? We might expect that for $|t - s| \gg 0$, the variables $\eta(t)$ and $\eta(s)$ are ``essentially'' independent. If we are interested in physical systems, then this is a rather reasonable assumption, since random noise is random!

In practice, we work with the idealization, where we claim that $\eta(t)$ and $\eta(s)$ are independent for $t \not= s$. Such an $\eta$ exists, and is known as \emph{white noise}. However, it is not a function, but just a Schwartz distribution.

To understand the simplest case, we set $F = 0$. We then have the equation
\[
  \dot{x} = \eta.
\]
We can write this in integral form as
\[
  x(t) = x(0) + \int_0^t \eta(s)\;\d s.
\]
To make sense of this integral, the function $\eta$ should at least be a signed measure. Unfortunately, white noise isn't. This is bad news.

We ignore this issue for a little bit, and proceed as if it made sense. If the equation held, then for any $0 = t_0 < t_1 < \cdots$, the increments
\[
  x(t_i) - x(t_{i - 1}) = \int_{t_{i - 1}}^{t_i} \eta(s) \;\d s
\]
should be independent, and moreover their variance should scale linearly with $|t_i - t_{i - 1}|$. So maybe this $x$ should be a Brownian motion!

Formalizing these ideas will take up a large portion of the course, and the work isn't always pleasant. Then why should we be interested in this continuous problem, as opposed to what we obtain when we discretize time? It turns out in some sense the continuous problem is easier. When we learn measure theory, there is a lot of work put into constructing the Lebesgue measure, as opposed to the sum, which we can just define. However, what we end up is much easier --- it's easier to integrate $\frac{1}{x^3}$ than to sum $\sum_{n = 1}^\infty \frac{1}{n^3}$. Similarly, once we have set up the machinery of stochastic calculus, we have a powerful tool to do explicit computations, which is usually harder in the discrete world.

Another reason to study stochastic calculus is that a lot of continuous time processes can be described as solutions to stochastic differential equations. Compare this with the fact that functions such as trigonometric and Bessel functions are described as solutions to ordinary differential equations!

There are two ways to approach stochastic calculus, namely via the It\^o integral and the Stratonovich integral. We will mostly focus on the It\^o integral, which is more useful for our purposes. In particular, the It\^o integral tends to give us martingales, which is useful.

To give a flavour of the construction of the It\^o integral, we consider a simpler scenario of the Wiener integral.

\begin{defi}[Gaussian space]\index{Gaussian space}
  Let $(\Omega, \mathcal{F}, \P)$ be a probability space. Then a subspace $S \subseteq L^2(\Omega, \mathcal{F}, \P)$ is called a \emph{Gaussian space} if it is a closed linear subspace and every $X \in S$ is a centered Gaussian random variable.
\end{defi}

An important construction is
\begin{prop}
  Let $H$ be any separable Hilbert space. Then there is a probability space $(\Omega, \mathcal{F}, \P)$ with a Gaussian subspace $S \subseteq L^2(\Omega, \mathcal{F}, \P)$ and an isometry $I: H \to S$. In other words, for any $f \in H$, there is a corresponding random variable $I(f) \sim N(0, (f, f)_H)$. Moreover, $I(\alpha f + \beta g) = \alpha I(f) + \beta I(g)$ and $(f, g)_H = \E[I(f) I(g)]$.
\end{prop}

\begin{proof}
  By separability, we can pick a Hilbert space basis $(e_i)_{i = 1}^\infty$ of $H$. Let $(\Omega, \mathcal{F}, \P)$ be any probability space that carries an infinite independent sequence of standard Gaussian random variables $X_i \sim N(0, 1)$. Then send $e_i$ to $X_i$, extend by linearity and continuity, and take $S$ to be the image.
\end{proof}

In particular, we can take $H = L^2(\R_+)$.

\begin{defi}[Gaussian white noise]\index{Gaussian white noise}
  A \emph{Gaussian white noise} on $\R_+$ is an isometry $WN$ from $L^2(\R_+)$ into some Gaussian space. For $A \subseteq \R_+$, we write $WN(A) = WN(\mathbf{1}_A)$.
\end{defi}

\begin{prop}\leavevmode
  \begin{itemize}
    \item For $A \subseteq \R_+$ with $|A| < \infty$, $WN(A) \sim N(0, |A|)$.
    \item For disjoint $A, B \subseteq \R_+$, the variables $WN(A)$ and $WN(B)$ are independent.
    \item If $A = \bigcup_{i = 1}^\infty A_i$ for disjoint sets $A_i \subseteq \R_+$, with $|A| < \infty, |A_i| < \infty$, then
      \[
        WN(A) = \sum_{i = 1}^\infty WN(A_i)\text{ in $L^2$ and a.s.}
      \]
  \end{itemize}
\end{prop}

\begin{proof}
  Only the last point requires proof. Observe that the partial sum
  \[
    M_n = \sum_{i = 1}^n WN(A)
  \]
  is a martingale, and is bounded in $L^2$ as well, since
  \[
    \E M_n^2 = \sum_{i = 1}^n \E WN(A_i)^2 = \sum_{i = 1}^n |A_i| \leq |A|.
  \]
  So we are done by the martingale convergence theorem. The limit is indeed $WN(A)$ because $\mathbf{1}_A = \sum_{n = 1}^\infty \mathbf{1}_{A_i}$.
\end{proof}
The point of the proposition is that $WN$ really looks like a random measure on $\R_+$, except it is \emph{not}. We only have convergence almost surely above, which means we have convergence on a set of measure $1$. However, the set depends on which $A$ and $A_i$ we pick. For things to actually work out well, we must have a fixed set of measure $1$ for which convergence holds for all $A$ and $A_i$.

But perhaps we can ignore this problem, and try to proceed. We define
\[
  B_t = WN([0, t])
\]
for $t \geq 0$.
\begin{ex}
  This $B_t$ is a standard Brownian motion, except for the continuity requirement. In other words, for any $t_1, t_2, \ldots, t_n$, the vector $(B_{t_i})_{i = 1}^n$ is jointly Gaussian with
  \[
    \E[B_s B_t] = s \wedge t\text{ for }s, t \geq 0.
  \]
  Moreover, $B_0 = 0$ a.s.\ and $B_t - B_s$ is independent of $\sigma(B_r: r \leq s)$. Moreover, $B_t - B_s \sim N(0, t - s)$ for $t \geq s$.
\end{ex}
In fact, by picking a good basis of $L^2(\R_+)$, we can make $B_t$ continuous.

We can now try to define some stochastic integral. If $f \in L^2(\R_+)$ is a step function,
\[
  f = \sum_{i = 1}^n f_i \mathbf{1}_{[s_i, t_i]}
\]
with $s_i < t_i$, then
\[
  WN(f) = \sum_{i = 1}^n f_i (B_{t_i} - B_{s_i})
\]
This motivates the notation
\[
  WN(f) = \int f(s)\; \d B_S.
\]
However, extending this to a function that is not a step function would be problematic.

\section{The Lebesgue--Stieltjes integral}
In calculus, we are able to perform integrals more exciting than simply $\int_0^1 h(x) \;\d x$. In particular, if $h, a: [0, 1] \to \R$ are $C^1$ functions, we can perform integrals of the form
\[
  \int_0^1 h(x) \;\d a(x).
\]
For them, it is easy to make sense of what this means --- it's simply
\[
  \int_0^1 h(x) \;\d a = \int_0^1 h(x) a'(x) \;\d x.
\]
In our world, we wouldn't expect our functions to be differentiable, so this is not a great definition. One reasonable strategy to make sense of this is to come up with a measure that should equal ``$\d a$''.

An immediate difficulty we encounter is that $a'(x)$ need not be positive all the time. So for example, $\int_0^1 1 \;\d a$ could be a negative number, which one wouldn't expect for a usual measure! Thus, we are naturally lead to think about \emph{signed measures}.

From now on, we always use the Borel $\sigma$-algebra on $[0, T]$ unless otherwise specified.
\begin{defi}[Signed measure]\index{signed measure}
  A \emph{signed measure} on $[0, T]$ is a difference $\mu = \mu_+ - \mu_-$ of two positive measures on $[0, T]$ of disjoint support. The decomposition $\mu = \mu_+ - \mu_-$ is called the \term{Hahn decomposition}.
\end{defi}

In general, given two measures $\mu_1$ and $\mu_2$ with not necessarily disjoint supports, we may still want to talk about $\mu_1 - \mu_2$.
\begin{thm}
  For any two finite measures $\mu_1, \mu_2$, there is a signed measure $\mu$ with $\mu(A) = \mu_1(A) - \mu_2(A)$.
\end{thm}

If $\mu_1$ and $\mu_2$ are given by densities $f_1, f_2$, then we can simply decompose $\mu$ as $(f_1 - f_2)^+\;\d t + (f_1 - f_2)^- \;\d t$, where $^+$ and $^-$ denote the positive and negative parts respectively. In general, they need not be given by densities with respect to $\d x$, but they are always given by densities with respect to some other measure.
\begin{proof}
  Let $\nu = \mu_1 + \mu_2$. By Radon--Nikodym, there are positive functions $f_1, f_2$ such that $\mu_i(\d t) = f_i(t) \nu(\d t)$. Then
  \[
    (\mu_1 - \mu_2)(\d t) = (f_1 - f_2)^+(t) \cdot \nu(\d t) + (f_1 - f_2)^- (t) \cdot \nu(\d t).\qedhere
  \]
\end{proof}

\begin{defi}[Total variation]\index{total variation}
  The total variation of a signed measure $\mu = \mu_+ - \mu_-$ is $|\mu| = \mu_+ + \mu_-$.
\end{defi}

We now want to figure out how we can go from a function to a signed measure. Let's think about how one would attempt to define $\int_0^1 f(x)\;\d g$ as a Riemann sum. A natural option would be to write something like
\[
  \int_0^t h(s) \;\d a(s) = \lim_{m \to \infty} \sum_{i = 1}^{n_m} h(t_{i - 1}^{(m)}) \Big(a (t_i^{(m)}) - a(t_{i - 1}^{(m)})\Big)
\]
for any sequence of subdivisions $0 = t_0^{(m)} < \cdots < t_{n_m}^{(m)} = t$ of $[0, t]$ with $\max_i |t_i^{(m)} - t_{i - 1}^{(m)}| \to 0$.

In particular, since we want the integral of $h = 1$ to be well-behaved, the sum $\sum (a(t_i^{(m)}) - a(t_{i - 1}^{(m)}))$ must be well-behaved. This leads to the notion of
\begin{defi}[Total variation]\index{total variation}
  The \emph{total variation} of a function $a: [0, T] \to \R$ is
  \[
    V_a(t) = |a(0)| + \sup \left\{\sum_{i = 1}^n |a(t_i) - a(t_{i - 1})|: 0 = t_0 < t_1 < \cdots < t_n = T\right\}.
  \]
  We say $a$ has \term{bounded variation} if $V_a(T) < \infty$. In this case, we write $a \in BV$.\index{BV}
\end{defi}
We include the $|a(0)|$ term because we want to pretend $a$ is defined on all of $\R$ with $a(t) = 0$ for $t < 0$.

We also define
\begin{defi}[C\`adl\`ag]\index{c\`adl\`ag}
  A function $a: [0, T] \to \R$ is \emph{c\`adl\`ag} if it is right-continuous and has left-limits.
\end{defi}

The following theorem is then clear:
\begin{thm}
  There is a bijection
  \[
    \left\{\vphantom{\parbox{4.5cm}{a\\b}}\parbox{4.5cm}{\centering signed measures on $[0, T]$} \right\} \longleftrightarrow \left\{\parbox{4.5cm}{\centering c\`adl\`ag functions of bounded variation $a: [0, T] \to \R$}\right\}\\
  \]
  that sends a signed measure $\mu$ to $a(t) = \mu([0, t])$. To construct the inverse, given $a$, we define
  \[
    a_{\pm} = \frac{1}{2}(V_a \pm a).
  \]
  Then $a_{\pm}$ are both positive, and $a = a_+ - a_-$. We can then define $\mu_{\pm}$ by
  \begin{align*}
    \mu_{\pm}[0, t] &= a_{\pm}(t) - a_{\pm}(0)\\
    \mu &= \mu_+ - \mu_-
  \end{align*}
  Moreover, $V_{\mu[0, t]} = |\mu|[0, t]$.
\end{thm}

\begin{eg}
  Let $a: [0, 1] \to \R$ be given by
  \[
    a(t) =
    \begin{cases}
      1 & t < \frac{1}{2}\\
      0 & t \geq \frac{1}{2}
    \end{cases}.
  \]
  This is c\`adl\`ag, and it's total variation is $v_0(1) = 2$. The associated signed measure is
  \[
    \mu = \delta_0 - \delta_{1/2},
  \]
  and the total variation measure is
  \[
    |\mu| = \delta_0 + \delta_{1/2}.
  \]
\end{eg}

We are now ready to define the Lebesgue--Stieltjes integral.

\begin{defi}[Lebesgue--Stieltjes integral]\index{Lebesgue--Stieltjes integral}
  Let $a: [0, T] \to \R$ be c\`adl\`ag of bounded variation and let $\mu$ be the associated signed measure. Then for $h \in L^1([0, T], |\mu|)$, the \emph{Lebesgue--Stieltjes integral} is defined by
  \[
    \int_s^t h(r)\; \d a(r) = \int_{(s, t]} h(r) \mu (\d r),
  \]
  where $0 \leq s \leq t \leq T$, and
  \[
    \int_s^t h(r)\; |\d a(r)| = \int_{(s, t]} h(r) |\mu| (\d r).
  \]
  We also write
  \[
    h \cdot a(t) = \int_0^t h(r) \;\d a(r).
  \]
\end{defi}

To let $T = \infty$, we need the following notation:
\begin{defi}[Finite variation]\index{finite variation}
  A c\`adl\`ag function $a: [0, \infty) \to \R$ is of finite variation if $a|_{[0, T]} \in BV[0, 1]$ for all $T > 0$.
\end{defi}

\begin{fact}
  Let $a: [0, T] \to \R$ be c\`adl\`ag and BV, and $h \in L^1([0, T], |\d a|)$, then
  \[
    \left|\int_0^T h(s) \;\d a (s) \right| \leq \int_a^b |h(s)|\;|\d a(s)|,
  \]
  and the function $h \cdot a: [0, T] \to \R$ is c\`adl\`ag and BV with associated signed measure $h(s) \;\d a(s)$. Moreover, $|h(s) \;\d a(s)| = |h(s)|\;|\d a(s)|$.
\end{fact}

We can, unsurprisingly, characterize the Lebesgue--Stieltjes integral by a Riemann sum:
\begin{prop}
  Let $a$ be c\`adl\`ag and BV on $[0, t]$, and $h$ bounded and left-continuous. Then
  \begin{align*}
    \int_0^t h(s) \;\d a(s) &= \lim_{m \to \infty} \sum_{i = 1}^{n_m} h(t_{i - 1}^{(m)}) \Big(a (t_i^{(m)}) - a(t_{i - 1}^{(m)})\Big)\\
    \int_0^t h(s) \;|\d a(s)| &= \lim_{m \to \infty} \sum_{i = 1}^{n_m} h(t_{i - 1}^{(m)}) \Big|a (t_i^{(m)}) - a(t_{i - 1}^{(m)})\Big|
  \end{align*}
  for any sequence of subdivisions $0 = t_0^{(m)} < \cdots < t_{n_m}^{(m)} = t$ of $[0, t]$ with $\max_i |t_i^{(m)} - t_{i - 1}^{(m)}| \to 0$.
\end{prop}

\begin{proof}
  We approximate $h$ by $h_m$ defined by
  \[
    h_m(0) = 0,\quad h_m(s) = h(t_{i - 1}^{(m)})\text{ for }s \in (t_{i - 1}^{(m)}, t_i^{(m)}].
  \]
  Then by left continuity, we have
  \[
    h(s) = \lim_{n \to \infty}h_m(s)
  \]
  by left continuity, and moreover
  \[
    \lim_{m \to \infty} \sum_{i = 1}^{n_m} h(t_{i - 1}^{(m)}) (a(t_i^{(m)}) - a(t_{i - 1}^{(m)})) = \lim_{m \to \infty} \int_{(0, t]} h_m(s) \mu (\;\d s) = \int_{(0, t]} h(s) \mu(\d s)
  \]
  by dominated convergence theorem. The statement about $|\d a(s)|$ is left as an exercise.
\end{proof}


\section{Semi-martingales}
The title of the chapter is ``semi-martingales'', but we are not going even meet the definition of a semi-martingale till the end of the chapter. The reason is that a semi-martingale is essentially defined to be the sum of a (local) martingale and a finite variation process, and understanding semi-martingales mostly involves understanding the two parts separately. Thus, for most of the chapter, we will be studying local martingales (finite variation processes are rather more boring), and at the end we will put them together to say a word or two about semi-martingales.

From now on, $(\Omega, \mathcal{F}, (\mathcal{F}_t)_{t \geq 0}, \P)$ will be a filtered probability space. Recall the following definition:

\begin{defi}[C\`adl\`ag adapted process]\index{c\`adl\`ag adapted process}
  A \emph{c\`adl\`ag adapted process} is a map $X: \Omega \times [0, \infty) \to \R$ such that
  \begin{enumerate}
    \item $X$ is c\`adl\`ag, i.e.\ $X(\omega, \ph): [0, \infty) \to \R$ is c\`adl\`ag for all $\omega \in \Omega$.
    \item $X$ is adapted, i.e.\ $X_t = X(\ph, t) $ is $\mathcal{F}_t$-measurable for every $t \geq 0$.
  \end{enumerate}
\end{defi}

\begin{notation}
  We will write $X \in \mathcal{G}$ to denote that a random variable $X$ is measurable with respect to a $\sigma$-algebra $\mathcal{G}$.
\end{notation}

\subsection{Finite variation processes}
The definition of a finite variation function extends immediately to a finite variation process.
\begin{defi}[Finite variation process]\index{Finite variation process}
  A \emph{finite variation process} is a c\`adl\`ag adapted process $A$ such that $A(\omega, \ph): [0, \infty) \to \R$ has finite variation for all $\omega \in \Omega$. The \term{total variation process} $V$ of a finite variation process $A$ is
  \[
    V_t = \int_0^T |\d A_s|.
  \]
\end{defi}

\begin{prop}
  The total variation process $V$ of a c\`adl\`ag adapted process $A$ is also c\`adl\`ag, finite variation and adapted, and it is also increasing.
\end{prop}

\begin{proof}
  We only have to check that it is adapted. But that follows directly from our previous expression of the integral as the limit of a sum. Indeed, let $0 = t_0^{(m)} < t_1^{(m)} < \cdots < t_{n_m} = t$ be a (nested) sequence of subdivisions of $[0, t]$ with $\max_i |t_i^{(m)} - t_{i - 1}^{(m)}| \to 0$. We have seen
  \[
    V_t = \lim_{m \to \infty} \sum_{i = 1}^{n_m} |A_{t_i^{(m)}} - A_{t_{i - 1}^{(m)}}| + |A(0)| \in \mathcal{F}_t.\qedhere
  \]
\end{proof}

\begin{defi}[$(H\cdot A)_t$]\index{$(H\cdot A)_t$}
  Let $A$ be a finite variation process and $H$ a process such that for all $\omega \in \Omega$ and $t \geq 0$,
  \[
    \int_0^t H_s(\omega)|\;|\d A_s(\omega)| < \infty.
  \]
  Then define a process $((H \cdot A)_t)_{t \geq 0}$ by
  \[
    (H \cdot A)_t = \int_0^t H_s\;\d A_s.
  \]
\end{defi}
For the process $H \cdot A$ to be adapted, we need a condition.
\begin{defi}[Previsible process]\index{previsible process}
  A process $H: \Omega \times [0, \infty) \to \R$ is \emph{previsible} if it is measurable with respect to the \term{previsible $\sigma$-algebra} $\mathcal{P}$ generated by the sets $E \times (s, t]$, where $E \in \mathcal{F}_s$ and $s < t$. We call the generating set $\Pi$.
\end{defi}
Very roughly, the idea is that a previsible event is one where whenever it happens, you know it a finite (though possibly arbitrarily small) before.

\begin{defi}[Simple process]\index{simple process}\index{$\mathcal{E}$}
  A process $H: \Omega \times [0, \infty) \to \R$ is \emph{simple}, written $H \in \mathcal{E}$, if
  \[
    H(\omega, t) = \sum_{i = 1}^n H_{i - 1}(\omega) \mathbf{1}_{(t_{i - 1}, t_i]}(t)
  \]
  for random variables $H_{i - 1} \in \mathcal{F}_{i - 1}$ and $0 = t_0 < \cdots < t_n$.
\end{defi}

\begin{fact}
  Simple processes and their limits are previsible.
\end{fact}

\begin{fact}
  Let $X$ be a c\`adl\`ag adapted process. Then $H_t = X_{t^-}$ defines a left-continuous process and is previsible.
\end{fact}
In particular, continuous processes are previsible.

\begin{proof}
  Since $X$ is c\`adl\`ag adapted, it is clear that $H$ is left-continuous and adapted. Since $H$ is left-continuous, it is approximated by simple processes. Indeed, let
  \[
    H_t^n = \sum_{i = 1}^{2^n} H_{(i - 1)2^{-n}} \mathbf{1}_{((i - 1)2^{-n}, i 2^{-n}]} (t) \wedge n \in \mathcal{E}.
  \]
  Then $H_t^n \to H$ for all $t$ by left continuity, and previsibility follows.
\end{proof}

\begin{ex}
  Let $H$ be previsible. Then
  \[
    H_t \in \mathcal{F}_{t^-} = \sigma(\mathcal{F}_s : s < t).
  \]
\end{ex}

\begin{eg}
  Brownian motion is previsible (since it is continuous).
\end{eg}

\begin{eg}
  A Poisson process $(N_t)$ is not previsible since $N_t \not \in \mathcal{F}_{t^-}$.
\end{eg}

\begin{prop}
  Let $A$ be a finite variation process, and $H$ previsible such that
  \[
    \int_0^t |H(\omega, s)|\;|\d A(\omega, s)| < \infty\text{ for all }(\omega, t) \in \Omega \times [0, \infty).
  \]
  Then $H \cdot A$ is a finite variation process.
\end{prop}

\begin{proof}
  The finite variation and c\`adl\`ag parts follow directly from the deterministic versions. We only have to check that $H \cdot A$ is adapted, i.e.\ $(H \cdot A)(\ph, t) \in \mathcal{F}_t$ for all $t \geq 0$.

  First, $H \cdot A$ is adapted if $H(\omega, s) = 1_{(u, v]}(s) 1_E(\omega)$ for some $u < v$ and $E \in \mathcal{F}_u$, since
  \[
    (H \cdot A)(\omega, t) = 1_E(\omega) (A(\omega, t \wedge v) - A(\omega, t \wedge u)) \in \mathcal{F}_t.
  \]
  Thus, $H \cdot A$ is adapted for $H = \mathbf{1}_F$ when $F \in \Pi$. Clearly, $\Pi$ is a $\pi$ system, i.e.\ it is closed under intersections and non-empty, and by definition it generates the previsible $\sigma$-algebra $\mathcal{P}$. So to extend the adaptedness of $H \cdot A$ to all previsible $H$, we use the monotone class theorem.

  We let
  \[
    \mathcal{V} = \{H: \Omega \times [0, \infty) \to \R: H \cdot A\text{ is adapted}\}.
  \]
  Then
  \begin{enumerate}
    \item $1 \in \mathcal{V}$
    \item $1_F \in \mathcal{V}$ for all $F \in \Pi$.
    \item $\mathcal{V}$ is closed under monotone limits.
  \end{enumerate}
  So $\mathcal{V}$ contains all bounded $\mathcal{P}$-measurable functions.
\end{proof}

So the conclusion is that if $A$ is a finite variation process, then as long as reasonable finiteness conditions are satisfied, we can integrate functions against $\d A$. Moreover, this integral was easy to define, and it obeys all expected properties such as dominated convergence, since ultimately, it is just an integral in the usual measure-theoretic sense. This crucially depends on the fact that $A$ is a finite variation process.

However, in our motivating example, we wanted to take $A$ to be Brownian motion, which is \emph{not} of finite variation. The work we will do in this chapter and the next is to come up with a stochastic integral where we let $A$ be a martingale instead. The heuristic idea is that while martingales can vary wildly, the martingale property implies there will be some large cancellation between the up and down movements, which leads to the possibility of a well-defined stochastic integral.

\subsection{Local martingale}
From now on, we assume that $(\Omega, \mathcal{F}, (\mathcal{F}_t)_t, \P)$ satisfies the \term{usual conditions}, namely that
\begin{enumerate}
  \item $\mathcal{F}_0$ contains all $\P$-null sets
  \item $(\mathcal{F}_t)_t$ is right-continuous, i.e.\ $\mathcal{F}_t = (\mathcal{F}_{t+} = \bigcap_{s > t} \mathcal{F}_s$ for all $t \geq 0$.
\end{enumerate}

We recall some of the properties of continuous martingales.
\begin{thm}[Optional stopping theorem]
  Let $X$ be a c\`adl\`ag adapted integrable process. Then the following are equivalent:
  \begin{enumerate}
    \item $X$ is a martingale, i.e.\ $X_t \in L^1$ for every $t$, and
      \[
        \E(X_t \mid \mathcal{F}_s) = X_s \text{ for all }t > s.
      \]
    \item The \term{stopped process}\index{$X^T$} $X^T = (X^T_t) = (X_{T \wedge t})$ is a martingale for all stopping times $T$.
    \item For all stopping times $T, S$ with $T$ bounded, $X_T \in L^1$ and $\E(X_T \mid \mathcal{F}_S) = X_{T \wedge S}$ almost surely.
    \item For all bounded stopping times $T$, $X_T \in L^1$ and $\E(X_T) = \E(X_0)$.
  \end{enumerate}
  For $X$ uniformly integrable, (iii) and (iv) hold for all stopping times.
\end{thm}

In practice, most of our results will be first proven for bounded martingales, or perhaps square integrable ones. The point is that the square-integrable martingales form a Hilbert space, and Hilbert space techniques can help us say something useful about these martingales. To get something about a general martingale $M$, we can apply a cutoff $T_n = \inf \{t > 0: M_t \geq n\}$, and then $M^{T_n}$ will be a martingale for all $n$. We can then take the limit $n \to \infty$ to recover something about the martingale itself.

But if we are doing this, we might as well weaken the martingale condition a bit --- we only need the $M^{T_n}$ to be martingales. Of course, we aren't doing this just for fun. In general, martingales will not always be closed under the operations we are interested in, but local (or maybe semi-) martingales will be. In general, we define

\begin{defi}[Local martingale]\index{local martingale}
  A c\`adl\`ag adapted process $X$ is a \emph{local martingale} if there exists a sequence of stopping times $T_n$ such that $T_n \to \infty$ almost surely, and $X^{T_n}$ is a martingale for every $n$. We say the sequence $T_n$ \term{reduces} $X$.
\end{defi}

\begin{eg}\leavevmode
  \begin{enumerate}
    \item Every martingale is a local martingale, since by the optional stopping theorem, we can take $T_n = n$.
    \item Let $(B_t)$ to be a standard 3d Brownian motion on $\R^3$. Then
      \[
        (X_t)_{t \geq 1} = \left(\frac{1}{|B_t|}\right)_{t \geq 1}
      \]
      is a local martingale but not a martingale.

      To see this, first note that
      \[
        \sup_{t \geq 1} \E X_t^2 < \infty,\quad \E X_t \to 0.
      \]
      Since $\E X_t \to 0$ and $X_t \geq 0$, we know $X$ cannot be a martingale. However, we can check that it is a local martingale. Recall that for any $f \in C^2_b$,
      \[
        M^f = f(B_t) - f(B_1) - \frac{1}{2} \int_0^t \Delta f(B_s)\;\d s
      \]
      is a martingale. Moreover, $\Delta \frac{1}{|x|} = 0$ for all $x \not= 0$. Thus, if $\frac{1}{|x|}$ didn't have a singularity at $0$, this would have told us $X_t$ is a martingale. Thus, we are safe if we try to bound $|B_s|$ away from zero.

      Let
      \[
        T_n = \inf \left\{t \geq 1: |B_t| < \frac{1}{n}\right\},
      \]
      and pick $f_n \in C_b^2$ such that $f_n(x) = \frac{1}{|x|}$ for $|x| \geq \frac{1}{n}$. Then $X_t^T - X_1^{T_n} = M^{f_n}_{t \wedge T_n}$. So $X^{T_n}$ is a martingale.

      It remains to show that $T_n \to \infty$, and this follows from the fact that $\E X_t \to 0$.
  \end{enumerate}
\end{eg}

\begin{prop}
  Let $X$ be a local martingale and $X_t \geq 0$ for all $t$. Then $X$ is a supermartingale.
\end{prop}

\begin{proof}
  Let $(T_n)$ be a reducing sequence for $X$. Then
  \begin{align*}
    \E(X_t \mid \mathcal{F}_s) &= \E \left(\liminf_{n \to \infty} X_{t \wedge T_n} \mid \mathcal{F}_s\right) \\
    &\leq \lim_{n \to \infty} \E(X_{t \wedge T_n} \mid \mathcal{F}_s) \\
    &= \liminf_{T_n \to \infty} X_{s \wedge T_n} \\
    &= X_s.\qedhere
  \end{align*}
\end{proof}

Recall the following result from Advanced Probability:
\begin{prop}
  Let $X \in L^1 (\Omega, \mathcal{F}, \P)$. Then the set
  \[
    \chi = \{\E(X \mid \mathcal{G}): G \subseteq \mathcal{F}\text{ a sub-$\sigma$-algebra}\}
  \]
  is uniformly integrable, i.e.
  \[
    \sup_{Y \in \chi} \E (|Y| \mathbf{1}_{|Y| > \lambda}) \to 0\text{ as } \lambda \to \infty.
  \]
\end{prop}

Recall also the following important result about uniformly integrable random variables:
\begin{thm}[Vitali theorem]\index{Vitali theorem}
  $X_n \to X$ in $L^1$ iff $(X_n)$ is uniformly integrable and $X_n \to X$ in probability.
\end{thm}

With these, we can state the following characterization of martingales in terms of local martingales:
\begin{prop}
  The following are equivalent:
  \begin{enumerate}
    \item $X$ is a martingale.
    \item $X$ is a local martingale, and for all $t \geq 0$, the set
      \[
        \chi_t = \{X_T: T\text{ is a stopping time with }T \leq t\}
      \]
      is uniformly integrable.
  \end{enumerate}
\end{prop}

\begin{proof}\leavevmode
  \begin{itemize}
    \item (a) $\Rightarrow$ (b): Let $X$ be a martingale. Then by the optional stopping theorem, $X_T = \E(X_t \mid \mathcal{F}_T)$ for any bounded stopping time $T \leq t$. So $\chi_t$ is uniformly integrable.
    \item (b) $\Rightarrow$ (a): Let $X$ be a local martingale with reducing sequence $(T_n)$, and assume that the sets $\chi_t$ are uniformly integrable for all $t \geq 0$. By the optional stopping theorem, it suffices to show that $\E(X_T) = \E(X_0)$ for any bounded stopping time $T$.

      So let $T$ be a bounded stopping time, say $T \leq t$. Then
      \[
        \E(X_0) = \E(X_0^{T_n}) = \E(X_T^{T_n}) = \E(X_{T \wedge T_n})
      \]
      for all $n$. Now $T \wedge T_n$ is a stopping time $\leq t$, so $\{X_{T \wedge T_n}\}$ is uniformly integrable by assumption. Moreover, $T_n \wedge T \to T$ almost surely as $n \to \infty$, hence $X_{T \wedge T_n} \to X_T$ in probability. Hence by Vitali, this converges in $L^1$. So
      \[
        \E(X_T) = \E(X_0).\qedhere
      \]%\qedhere
  \end{itemize}
\end{proof}

\begin{cor}
  If $Z \in L^1$ is such that $|X_t| \leq Z$ for all $t$, then $X$ is a martingale. In particular, every bounded local martingale is a martingale.
\end{cor}

The definition of a local martingale does not give us control over what the reducing sequence $\{T_n\}$ is. In particular, it is not necessarily true that $X^{T_n}$ will be bounded, which is a helpful property to have. Fortunately, we have the following proposition:
\begin{prop}
  Let $X$ be a \emph{continuous} local martingale with $X_0 = 0$. Define
  \[
    S_n = \inf \{t \geq 0 : |X_t| = n \}.
  \]
  Then $S_n$ is a stopping time, $S_n \to \infty$ and $X^{S_n}$ is a bounded martingale. In particular, $(S_n)$ reduces $X$.
\end{prop}

\begin{proof}
  It is clear that $S_n$ is a stopping time, since (if it is not clear)
  \[
    \{S_n \leq t\} = \bigcap_{k \in \N} \left\{\sup_{s \leq t} |X_s| > n - \frac{1}{k}\right\} = \bigcap_{k \in \N} \bigcup_{s < t, s \in \Q} \left\{|X_s| > n - \frac{1}{k}\right\} \in \mathcal{F}_t.
  \]
  It is also clear that $S_n \to \infty$, since
  \[
    \sup_{s \leq t} |X_s| \leq n \leftrightarrow S_n \geq t,
  \]
  and by continuity and compactness, $\sup_{s \leq t} |X_s|$ is finite for every $(\omega, t)$.

  Finally, we show that $X^{S_n}$ is a martingale. By the optional stopping theorem, $X^{T_n \wedge S_n}$ is a martingale, so $X^{S_n}$ is a local martingale. But it is also bounded by $n$. So it is a martingale.
\end{proof}

An important and useful theorem is the following:
\begin{thm}
  Let $X$ be a continuous local martingale with $X_0 = 0$. If $X$ is also a finite variation process, then $X_t = 0$ for all $t$.
\end{thm}
This would rule out interpreting $\int H_s \;\d X_s$ as a Lebesgue--Stieltjes integral for $X$ a non-zero continuous local martingale. In particular, we cannot take $X$ to be Brownian motion. Instead, we have to develop a new theory of integration for continuous local martingales, namely the It\^o integral.

On the other hand, this theorem is very useful. We will later want to define the stochastic integral with respect to the sum of a continuous local martingale and a finite variation process, which is the appropriate generality for our theorems to make good sense. This theorem tells us there is a unique way to decompose a process as a sum of a finite variation process and a continuous local martingale (if it can be done). So we can simply define this stochastic integral by using the Lebesgue--Stieltjes integral on the finite variation part and the It\^o integral on the continuous local martingale part.
\begin{proof}
  Let $X$ be a finite-variation continuous local martingale with $X_0 = 0$. Since $X$ is finite variation, we can define the total variation process $(V_t)$ corresponding to $X$, and let
  \[
    S_n = \inf \{t \geq 0: V_t \geq n\} = \inf \left\{t \geq 0: \int_0^1 |\d X_s| \geq n\right\}.
  \]
  Then $S_n$ is a stopping time, and $S_n \to \infty$ since $X$ is assumed to be finite variation. Moreover, by optional stopping, $X^{S_n}$ is a local martingale, and is also bounded, since
  \[
    X_t^{S_n} \leq \int_0^{t \wedge S_n} |\d X_s| \leq n.
  \]
  So $X^{S_n}$ is in fact a martingale.

  We claim its $L^2$-norm vanishes. Let $0 = t_0 < t_1 < \cdots < t_n = t$ be a subdivision of $[0, t]$. Using the fact that $X^{S_n}$ is a martingale and has orthogonal increments, we can write
  \[
    \E((X_t^{S_n})^2) = \sum_{i = 1}^k \E((X_{t_i}^{S_n} - X_{t_{i - 1}}^{S_n})^2).
  \]
  Observe that $X^{S_n}$ is finite variation, but the right-hand side is summing the \emph{square} of the variation, which ought to vanish when we take the limit $\max |t_i - t_{i - 1}| \to 0$. Indeed, we can compute
  \begin{align*}
    \E((X_t^{S_n})^2) &= \sum_{i = 1}^k \E((X_{t_i}^{S_n} - X_{t_{i - 1}}^{S_n})^2)\\
    &\leq \E\left(\max_{1 \leq i \leq k} |X_{t_i}^{S_n} - X_{t_{i - 1}}^{S_n}| \sum_{i =1 }^k |X_{t_i}^{S_n} - X_{t_{i - 1}}^{S_n}|\right)\\
    &\leq \E\left(\max_{1 \leq i \leq k} |X_{t_i}^{S_n} - X_{t_{i - 1}}^{S_n}| V_{t \wedge S_n}\right)\\
    &\leq \E\left(\max_{1 \leq i \leq k} |X_{t_i}^{S_n} - X_{t_{i - 1}}^{S_n}| n\right).
  \end{align*}
  Of course, the first term is also bounded by the total variation. Moreover, we can make further subdivisions so that the mesh size tends to zero, and then the first term vanishes in the limit by continuity. So by dominated convergence, we must have $\E((X_t^{S_n})^2) = 0$. So $X_t^{S_n} = 0$ almost surely for all $n$. So $X_t = 0$ for all $t$ almost surely.
\end{proof}

\subsection{Square integrable martingales}
As previously discussed, we will want to use Hilbert space machinery to construct the It\^o integral. The rough idea is to define the It\^o integral with respect to a fixed martingale on simple processes via a (finite) Riemann sum, and then by calculating appropriate bounds on how this affects the norm, we can extend this to all processes by continuity, and this requires our space to be Hilbert. The interesting spaces are defined as follows:

\begin{defi}[$\mathcal{M}^2$]\index{$\mathcal{M}^2$}
  Let
  \begin{align*}
    \mathcal{M}^2 &= \left\{X : \Omega \times [0, \infty) \to \R : X\text{ is c\'adl\'ag martingale with } \sup_{t \geq 0} \E(X_t^2) < \infty\right\}.\\
    \mathcal{M}^2_c &= \left\{X \in \mathcal{M}^2: X(\omega, \ph)\text{ is continuous for every }\omega \in \Omega\right\}
  \end{align*}
  We define an inner product on $\mathcal{M}^2$ by
  \[
    (X, Y)_{\mathcal{M}^2} = \E(X_\infty Y_\infty),
  \]
  which in aprticular induces a norm
  \[
    \|X\|_{\mathcal{M}^2} = \left(\E(X_\infty^2)\right)^{1/2}.
  \]
  We will prove this is indeed an inner product soon. Here recall that for $X \in \mathcal{M}^2$, the martingale convergence theorem implies $X_t \to X_\infty$ almost surely and in $L^2$.
\end{defi}
Our goal will be to prove that these spaces are indeed Hilbert spaces. First observe that if $X \in \mathcal{M}^2$, then $(X_t^2)_{t \geq 0}$ is a submartingale by Jensen, so $t \mapsto \E X_t^2$ is increasing, and
\[
  \E X_\infty^2 = \sup_{t \geq 0} \E X_t^2.
\]
All the magic that lets us prove they are Hilbert spaces is Doob's inequality.
\begin{thm}[Doob's inequality]\index{Doob's inequality}
  Let $X \in \mathcal{M}^2$. Then
  \[
    \E \left(\sup_{t \geq 0} X_t^2\right) \leq 4 \E(X_\infty^2).
  \]
\end{thm}
So once we control the limit $X_\infty$, we control the whole path. This is why the definition of the norm makes sense, and in particular we know $\|X\|_{\mathcal{M}^2} = 0$ implies that $X = 0$.

\begin{thm}
  $\mathcal{M}^2$ is a Hilbert space and $\mathcal{M}_c^2$ is a closed subspace.
\end{thm}

\begin{proof}
  We need to check that $\mathcal{M}^2$ is complete. Thus let $(X^n) \subseteq \mathcal{M}^2$ be a Cauchy sequence, i.e.
  \[
    \E((X_\infty^n - X_\infty^m)^2) \to 0\text{ as }n, m \to \infty.
  \]
  By passing to a subsequence, we may assume that
  \[
    \E((X_\infty^n - X_{\infty}^{n - 1})^2) \leq 2^{-n}.
  \]
  First note that
  \begin{align*}
    \E\left(\sum_n \sup_{t \geq 0} |X_t^n - X_t^{n - 1}|\right) &\leq \sum_n \E \left(\sup_{t \geq 0} |X_t^n - X_t^{n - 1}|^2 \right)^{1/2}\tag{CS}\\
    &\leq \sum_n 2 \E \left(|X_\infty^n - X_\infty^{n - 1}|^2\right)^{1/2}\tag{Doob's}\\
    &\leq 2 \sum_n 2^{-n/2} < \infty.
  \end{align*}
  So
  \[
    \sum_{n = 1}^\infty \sup_{t \geq 0} |X_t^n - X_t^{n - 1}|< \infty\text{ a.s.}\tag{$*$}
  \]
  So on this event, $(X^n)$ is a Cauchy sequence in the space $(D[0, \infty), \|\ph\|_\infty)$ of c\'adl\'ag sequences. So there is some $X(\omega, \ph) \in D[0, \infty)$ such that
  \[
    \|X^n (\omega, \ph) - X(\omega, \ph)\|_\infty \to 0\text{ for almost all }\omega.
  \]
  and we set $X = 0$ outside this almost sure event $(*)$. We now claim that
  \[
    \E \left(\sup_{t \geq 0} |X^n - X|^2\right) \to 0\text{ as }n \to \infty.
  \]
  We can just compute
  \begin{align*}
    \E \left(\sup_t |X^n - X|^2\right) &= \E \left(\lim_{m \to \infty} \sup_t |X^n - X^m|^2\right)\\
    &\leq \liminf_{m \to \infty} \E\left(\sup_t |X^n - X^m|^2\right) \tag{Fatou}\\
    &\leq \liminf_{m \to \infty} 4 \E (X^n_\infty - X_m^\infty)^2 \tag{Doob's}
  \end{align*}
  and this goes to $0$ in the limit $n \to \infty$ as well.

  We finally have to check that $X$ is indeed a martingale. We use the triangle inequality to write
  \begin{align*}
    \|E(X_t \mid \mathcal{F}_s) - X_s \|_{L^2} &\leq \|\E (X_t - X_t^n \mid \mathcal{F}_s)\|_{L^2} + \|X_s^n - X_s\|_{L^2}\\
    &\leq \E (\E ((X_t - X_t^n)^2 \mid \mathcal{F}_s))^{1/2} + \|X_s^n - X_s\|_{L^2}\\
    &= \|X_t - X_t^n\|_{L^2} + \|X_s^n - X_s\|_{L^2}\\
    &\leq 2 \E\left(\sup_t |X_t - X_t^n|^2\right)^{1/2} \to 0
  \end{align*}
  as $n \to \infty$. But the left-hand side does not depend on $n$. So it must vanish. So $X \in \mathcal{M}^2$.

  We could have done exactly the same with continuous martingales, so the second part follows.
\end{proof}
\subsection{Quadratic variation}
Physicists are used to dropping all terms above first-order. It turns out that Brownian motion, and continuous local martingales in general oscillate so wildly that second order terms become important. We first make the following definition:
\begin{defi}[Uniformly on compact sets in probability]\index{u.c.p.}\index{uniformly on compact sets in probability}
  For a sequence of processes $(X^n)$ and a process $X$, we say that $X^n \to X$ u.c.p. iff
  \[
    \P\left(\sup_{s \in [0, t]} |X_s^n - X_s| > \varepsilon\right) \to 0\text{ as }n \to \infty\text{ for all }t > 0, \varepsilon > 0.
  \]
\end{defi}

\begin{thm}
  Let $M$ be a continuous local martingale with $M_0 = 0$. Then there exists a unique (up to indistinguishability) continuous adapted increasing process $(\bra M\ket_t)_{t \geq 0}$ such that $\bra M\ket_0 = 0$ and $M_t^2 - \bra M\ket_t$ is a continuous local martingale. Moreover,
  \[
    \bra M \ket_t = \lim_{n \to \infty} \bra M\ket_t^{(n)},\quad \bra M\ket_t^{(n)} = \sum_{i = 1}^{\lceil 2^n t\rceil} (M_{t 2^{-n}} - M_{(i - 1)2^{-n}})^2,
  \]
  where the limit u.c.p.
\end{thm}
\begin{defi}[Quadratic variation]\index{quadratic variation}
  $\bra M\ket$ is called the \term{quadratic variation} of $M$.
\end{defi}
It is probably more useful to understand $\bra M\ket_t$ in terms of the explicit formula, and the fact that $M_t^2 - \bra M\ket_t$ is a continuous local martingale is a convenient property.

\begin{eg}
  Let $B$ be a standard Brownian motion. Then $B_t^2 - t$ is a martingale. Thus, $\bra B \ket_t = t$.
\end{eg}

The proof is long and mechanical, but not hard. All the magic happened when we used the magical Doob's inequality to show that $\mathcal{M}_c^2$ and $\mathcal{M}^2$ are Hilbert spaces.

\begin{proof}
  To show uniqueness, we use that finite variation and local martingale are incompatible. Suppose $(A_t)$ and $(\tilde{A}_t)$ obey the conditions for $\bra M\ket$. Then $A_t - \tilde{A}_t = (M_t^2 - \tilde{A}_t) - (M_t^2 - A_t)$ is a continuous adapted local martingale starting at $0$. Moreover, both $A_t$ and $\tilde{A}_t$ are increasing, hence have finite variation. So $A - \tilde{A} = 0$ almost surely.

  To show existence, we need to show that the limit exists and has the right property. We do this in steps.
  \begin{claim}
    The result holds if $M$ is in fact bounded.
  \end{claim}
  Suppose $|M(\omega, t)| \leq C$ for all $(\omega, t)$. Then $M \in \mathcal{M}_c^2$. Fix $T > 0$ deterministic. Let
  \[
    X_t^n = \sum_{i = 1}^{\lceil 2^n T \rceil} M_{(i - 1)2^{-n}} (M_{i 2^{-n} \wedge t} - M_{(i - 1) 2^{-n} \wedge t}).
  \]
  This is defined so that
  \[
    \bra M \ket_{k2^{-n}}^{(n)} = M_{k2^{-n}}^2 - 2 X_{k 2^{-n}}^n.
  \]
  This reduces the study of $\bra M\ket^{(n)}$ to that of $X_{k2^{-n}}^n$.

  We check that $(X_t^n)$ is a Cauchy sequence in $\mathcal{M}_c^2$. The fact that it is a martingale is an immediate computation. To show it is Cauchy, for $n \geq m$, we calculate
  \[
    X_\infty^n - X_\infty^m = \sum_{i = 1}^{\lceil 2^n T\rceil} (M_{(i - 1)2^{-n}} - M_{\lfloor (i - 1) 2^{m - n}\rfloor 2^{-m}})(M_{i2^{-n}} - M_{(i - 1)2^{-n}}).
  \]
  We now take the expectation of the square to get
  \begin{align*}
    \E (X_\infty^n - X_\infty^m)^2 &= \E\left(\sum_{i = 1}^{\lceil 2^n T\rceil} (M_{(i\!-\!1)2^{-\!n}} - M_{\lfloor\!(i\!-\!1) 2^{m\!-\!n}\!\rfloor 2^{-\!m}})^2(M_{i 2^{-\!n}} - M_{(i\!-\!1)2^{-\!n}})^2\right)\\
    &\leq \E \left(\sup_{|s - t| \leq 2^{-m}} |M_t - M_s|^2 \sum_{i = 1}^{\lceil 2^n T\rceil} (M_{i2^{-n}} - M_{(i - 1)2^{-n}})^2\right)\\
    &= \E \left(\sup_{|s - t| \leq 2^{-m}} |M_t - M_s|^2 \bra M\ket_T^{(n)}\right)\\
    &\leq \E \left(\sup_{|s - t| \leq 2^{-m}}|M_t - M_s|^4\right)^{1/2}\E \left((\bra M\ket_T^{(n)})^2\right)^{1/2}\tag{Cauchy--Schwarz}
  \end{align*}
  We shall show that the second factor is bounded, while the first factor tends to zero as $m \to \infty$. These are both not surprising --- the first term vanishing in the limit corresponds to $M$ being continuous, and the second term is bounded since $M$ itself is bounded.

  To show that the first term tends to zero, we note that we have
  \[
    |M_t - M_s|^4 \leq 16 C^4,
  \]
  and moreover
  \[
    \sup_{|s - t| \leq 2^{-m}} |M_t - M_s| \to 0\text{ as }m \to \infty\text{ by uniform continuity}.
  \]
  So we are done by the dominated convergence theorem.

  To show the second term is bounded, we do (writing $N = \lceil 2^n T\rceil$)
  \begin{align*}
    \E\left((\bra M\ket_T^{(n)})^2\right) &= \E \left(\left(\sum_{i = 1}^{N} (M_{i 2^{-n}} - M_{(i - 1)2^{-n}})^2 \right)^2\right)\\
    &= \sum_{i = 1}^N \E \left( (M_{i 2^{-n}}- M_{(i - 1)2^{-n}})^4\right) \\
    &\hphantom{{}={}}+ 2 \sum_{i = 1}^N\E \left((M_{i 2^{-\!n}} - M_{(i\!-\!1)2^{-\!n}})^2 \sum_{k = i + 1}^N (M_{k2^{-\!n}} - M_{(k\!-\!1)2^{-\!n}})^2\right)
  \end{align*}
  We use the martingale property and orthogonal increments the rearrange the off-diagonal term as
  \[
    \E\left((M_{i2^{-n}} - M_{(i - 1)2^{-n}})(M_{N2^{-n}} - M_{i 2^{-n}})^2\right).
  \]
  Taking some sups, we get
  \begin{align*}
    \E\left((\bra M\ket_T^{(n)})^2\right) &\leq 12 C^2 \E \left(\sum_{i = 1}^N (M_{i 2^{-n}} - M_{(i - 1)2^{-n}})^2\right)\\
    &= 12C^2 \E \left((M_{N 2^{-n}} - M_0)^2\right)\\
    &\leq 12 C^2 \cdot 4 C^2.
  \end{align*}
  So done.

  So we now have $X^n \to X$ in $M^2_c$ for some $X \in M_c^2$. In particular, we have
  \[
    \left\|\sup_t |X_t^n - X_t|\right\|_{L^2} \to 0
  \]
  So we know that
  \[
    \sup_t |X_t^n - X_t| \to 0
  \]
  almost surely along a subsequence $\Lambda$.

  Let $N \subseteq \Omega$ be the events on which this convergence fails. We define
  \[
    A_t^{(T)} =
    \begin{cases}
      M_t^2 - 2X_t& \omega \in \Omega \setminus N\\
      0 & \omega \in N
    \end{cases}.
  \]
  Then $A^{(T)}$ is continuous, adapted since $M$ and $X$ are, and $(M_{t \wedge T}^2 - A^{(T)}_{t \wedge T})_t$ is a martingale since $X$ is. Finally, $A^{(T)}$ is increasing since $M_t^2 - X_t^n$ is increasing on $2^{-n} \Z \cap [0, T]$ and the limit is uniform. So this $A^{(T)}$ basically satisfies all the properties we want $\bra M\ket_t$ to satisfy, except we have the stopping time $T$.

  We next observe that for any $T \geq 1$, $A_{t \wedge T}^{(T)} = A_{t \wedge T}^{(T + 1)}$ for all $t$ almost surely. This essentially follows from the same uniqueness argument as we had at the beginning of the proof. Thus, there is a process $(\bra M\ket_t)_{t \geq 0}$ such that
  \[
    \bra M\ket_t = A_t^{(T)}
  \]
  for all $t \in [0, T]$ and $T \in \N$, almost surely. Then this is the desired process. So we have constructed $\bra M\ket$ in the case where $M$ is bounded.

  \begin{claim}
    $\bra M\ket^{(n)} \to \bra M\ket$ u.c.p.
  \end{claim}
  Recall that
  \[
    \bra M\ket_t^{(n)} = M^2_{2^{-n}\lfloor 2^n t\rfloor} - 2 X^n_{2^{-n} \lfloor 2^n t\rfloor}.
  \]
  We also know that
  \[
    \sup_{t \leq T} |X_t^n - X_t| \to 0
  \]
  in $L^2$, hence also in probability. So we have
  \begin{multline*}
    |\bra M\ket_t - \bra M\ket^{(n)}_t| \leq \sup_{t \leq T} |M^2_{2^{-n}\lfloor 2^n t\rfloor} - M_t^2| \\
    + \sup_{t \leq T} |X^n_{2^{-n}\lfloor 2^n t\rfloor} - X_{2^{-n} \lfloor 2^n t\rfloor}| + \sup_{t \leq T} |X_{2^{-n}\lfloor 2^n t\rfloor} - X_t|.
  \end{multline*}
  The first and last terms $\to 0$ in probability since $M$ and $X$ are uniformly continuous on $[0, T]$. The second term converges to zero by our previous assertion. So we are done.

  \begin{claim}
    The theorem holds for $M$ any continuous local martingale.
  \end{claim}
  We let $T_n = \inf\{t \geq 0 : |M_t| \geq n\}$. Then $(T_n)$ reduces $M$ and $M^{T_n}$ is a bounded martingale. So in particular $M^{T_n}$ is a bounded continuous martingale. We set
  \[
    A^n = \bra M^{T_n}\ket.
  \]
  Then $(A_t^n)$ and $(A_{t \wedge T_n}^{n + 1})$ are indistinguishable for $t < T_n$ by the uniqueness argument. Thus there is a process $\bra M\ket$ such that $\bra M \ket_{t \wedge T_n} = A_t^n$ are indistinguishable for all $n$. Clearly, $\bra M\ket$ is increasing since the $A^n$ are, and $M^2_{t \wedge T_n} - \bra M\ket_{t \wedge T_n}$ is a martingale for every $n$, so $M^2_t - \bra M\ket_t$ is a continuous local martingale.

  \begin{claim}
    $\bra M\ket^{(n)} \to \bra M\ket$ u.c.p.
  \end{claim}
  We have seen
  \[
    \bra M^{T_k}\ket^{(n)} \to \bra M^{T_k}\ket\text{ u.c.p.}
  \]
  for every $k$. So
  \[
    \P\left(\sup_{t \leq T} |\bra M\ket_t^{(n)} - \bra M_t\ket| > \varepsilon\right) \leq \P(T_k < T) + \P\left(\sup_{t \leq T} |\bra M^{T_k}\ket_t^{(n)} - \bra M^{T_k}\ket_t > \varepsilon\right).
  \]
  So we can fisrt pick $k$ large enough such that the first term is small, then pick $n$ large enough so that the second is small.
\end{proof}

There are a few easy consequence of this theorem.

\begin{fact}
  Let $M$ be a continuous local martingale, and let $T$ be a stopping time. Then alsmot surely for all $t \geq 0$,
  \[
    \bra M^T\ket_t = \bra M\ket_{t \wedge T}
  \]
\end{fact}

\begin{proof}
  Since $M_t^2 - \bra M\ket_t$ is a continuous local martingle, so is $M^2_{t \wedge T} - \bra M\ket_{t \wedge T} = (M^T)_t^2 - \bra M\ket_{t \wedge T}$. So we are done by uniqueness.
\end{proof}

\begin{fact}
  Let $M$ be a continuous local martingale with $M_0 = 0$. Then $M = 0$ iff $\bra M\ket = 0$.
\end{fact}

\begin{proof}
  If $M = 0$, then $\bra M \ket = 0$. Conversely, if $\bra M\ket = 0$, then $M^2$ is a continuous local martingale and positive. Thus $\E M_t^2 \leq \E M_0^2 = 0$.
\end{proof}

\begin{prop}
  Let $M \in \mathcal{M}_c^2$. Then $M^2 - \bra M\ket$ is a uniformly integrable martingale, and
  \[
    \|M - M_0\|_{\mathcal{M}^2} = (\E \bra M\ket_\infty)^{1/2}.
  \]
\end{prop}

\begin{proof}
  We will show that $\bra M\ket_\infty \in L^1$. This then implies
  \[
    |M_t^2 - \bra M\ket_t| \leq \sup_{t \geq 0} M_t^2 + \bra M\ket_\infty.
  \]
  Then the right hand side is in $L^1$. Since $M^2 - \bra M\ket$ is a local martingale, this implies that it is in fact a uniformly integrable martingale.

  To show $\bra M\ket_\infty \in L^1$, we let
  \[
    S_n = \inf \{t \geq 0: \bra M\ket_t \geq n\}.
  \]
  Then $S_n \to \infty$, $S_n$ is a stopping time and moreover $\bra M\ket_{t \wedge S_n} \leq n$. So we have
  \[
    M_{t \wedge S_n}^2 - \bra M\ket_{t \wedge S_n} \leq n + \sup_{t \geq 0} M_t^2,
  \]
  and the second term is in $L^1$. So $M_{t \wedge S_n}^2 - \bra M\ket_{t \wedge S_n}$ is a true martingale.

  So
  \[
    \E M_{t \wedge S_n}^2 - \E M_0^2 = \E \bra M\ket_{t \wedge S_n}.
  \]
  Taking the limit $t\to \infty$, we know $\E M_{t \wedge S_n}^2 \to \E M^2_{S_n}$ by dominated convergence. Since $\bra M\ket_{t \wedge S_n}$ is increasing, we also have $\E \bra M\ket_{t \wedge S_n} \to \E \bra M\ket_{S_n}$ by \emph{monotone} convergence. We can take $n \to \infty$, and by the same justification, we have
  \[
    \E \bra M\ket \leq \E M_\infty^2 - \E M_0^2 = \E (M_\infty - M_0)^2 < \infty.\qedhere
  \]
\end{proof}

\subsection{Covariation}
We know $\mathcal{M}_c^2$ not only has a norm, but also an inner product. This can also be reflected in the bracket by the polarization identity, and it is natural to define

\begin{defi}[Covariation]\index{covariation}
  Let $M, N$ be two continuous local martingales. Define the \emph{covariation} (or simply the \term{bracket}) between $M$ and $N$ to be process
  \[
    \bra M, N\ket_t = \frac{1}{4} (\bra M + N\ket_t - \bra M -N\ket_t).
  \]
\end{defi}
Then if in fact $M, N \in \mathcal{M}_c^2$, then putting $t = \infty$ gives the inner product.

\begin{prop}\leavevmode
  \begin{enumerate}
    \item $\bra M, N\ket$ is the unique (up to indistinguishability) finite variation process such that $M_t N_t - \bra M, N\ket_t$ is a continuous local martingale.
    \item The mapping $(M, N) \mapsto \bra M, N\ket$ is bilinear and symmetric.
    \item
      \begin{align*}
        \bra M, N\ket_t &= \lim_{n \to \infty} \bra M, N\ket_t^{(n)}\text{ u.c.p.}\\
        \bra M, N\ket^{(n)}_t &= \sum_{i = 1}^{\lceil 2^n t\rceil} (M_{i2^{-n} - M_{(i - 1)2^{-n}}})(N_{i2^{-n}} - N_{(i - 1)} 2^{-n}).
      \end{align*}
    \item For every stopping time $T$,
      \[
        \bra M^T, N^T\ket_t = \bra M^T, N\ket_t = \bra M, N\ket_{t \wedge T}.
      \]
    \item If $M, N \in \mathcal{M}_c^2$, then $M_t N_t - \bra M, N\ket_t$ is a uniformly integrable martingale, and
      \[
        \bra M - M_0, N - N_0\ket_{\mathcal{M}^2} = \E \bra M, N\ket_\infty.\fakeqed
      \]
  \end{enumerate}
\end{prop}

\begin{eg}
  Let $B, B'$ be two independent Brownian motions (with respect to the same filtration). Then $\bra B, B'\ket = 0$.
\end{eg}

\begin{proof}
  Assume $B_0 = B_0' = 0$. Then $X_{\pm} = \frac{1}{\sqrt{2}}(B \pm B')$ are Brownian motions, and so $\bra X_{\pm}\ket = t$. So their difference vanishes.
\end{proof}

An important result about the covariation is the following Cauchy--Schwarz like inequality:
\begin{prop}[Kunita--Watanabe]\index{Kunita--Watanabe}
  Let $M, N$ be continuous local martingales and let $H, K$ be two (previsible) processes. Then almost surely
  \[
    \int_0^\infty |H_s| |K_s| |\d \bra M, N\ket_s| \leq \left(\int_0^\infty H_s^2 \;\d \bra M\ket_s\right)^{1/2} \left(\int_0^\infty H_s^2 \bra N\ket_s\right)^{1/2}.
  \]
\end{prop}

In fact, this \emph{is} Cauchy--Schwarz. All we have to do is to take approximations and take limits and make sure everything works out well.
\begin{proof}
  For convenience, we write
  \[
    \bra M, N\ket_s^t = \bra M, N\ket_t - \bra M, N\ket_s.
  \]
  \begin{claim}
    For all $0 \leq s \leq t$, we have
    \[
      |\bra M, N\ket_s^t| \leq \sqrt{\bra M, M\ket_s^t} \sqrt{\bra N, N\ket_s^t}.
    \]
  \end{claim}
  By continuity, we can assume that $s, t$ are dyadic rationals. Then
  \begin{align*}
    |\bra M, N\ket_s^t| &= \lim_{n \to \infty} \left| \sum_{i = 2^ns + 1}^{2^n t} (M_{i 2^{-n}} - M_{(i - 1)2^{-n}})(N_{i 2^{-n}} -N_{(i - 1)2^{-n}})\right|\\
    &\leq \lim_{n \to \infty} \left| \sum_{i = 2^ns + 1}^{2^n t} (M_{i 2^{-n}} - M_{(i - 1)2^{-n}})^2\right|^{1/2}\times\\
    &\hphantom{aaaaaaaaaaa}\left| \sum_{i = 2^n s + 1}^{2^n t} (N_{i 2^{-n}} - N_{(i - 1)2^{-n}})^2\right|^{1/2}\tag{Cauchy--Schwarz}\\
    &= \left(\bra M, M\ket_s^t \right)^{1/2}\left(\bra N, N\ket_s^t\right)^{1/2},
  \end{align*}
  where all equalities are u.c.p.

  \begin{claim}
    For all $0 \leq s < t$, we have
    \[
      \int_s^t |\d \bra M, N\ket_u| \leq \sqrt{\bra M, M\ket_s^t}\sqrt{\bra N, N\ket_s^t}.
    \]
  \end{claim}
  Indeed, for any subdivision $s = t_0 < t_1 < \cdots t_n = t$, we have
  \begin{align*}
    \sum_{i = 1}^n |\bra M, N\ket_{t_{i - 1}}^{t_i}| &\leq \sum_{i = 1}^n \sqrt{\bra M, M\ket_{t_{i - 1}}^{t_i}} \sqrt{\bra N, N\ket_{t_{i - 1}}^{t_i}}\\
    &\leq \left(\sum_{i = 1}^n \bra M, M\ket_{t_{i - 1}}^{t_i}\right)^{1/2} \left(\sum_{i = 1}^n \bra N, N\ket_{t_{i - 1}}^{t_i}\right)^{1/2}. \tag{Cauchy--Schwarz}
  \end{align*}
  Taking the supremum over all subdivisions, the claim follows.

  \begin{claim}
    For all bounded Borel sets $B \subseteq [0, \infty)$, we have
    \[
      \int_B |\d \bra M, N\ket_u| \leq \sqrt{\int_B \d \bra M\ket_u} \sqrt{\int_B \d \bra N\ket_u}.
    \]
  \end{claim}
  We already know this is true if $B$ is an interval. If $B$ is a finite union of integrals, then we apply Cauchy--Schwarz. By a monotone class argument, we can extend to all Borel sets.

  \begin{claim}
    The theorem holds for
    \[
      H = \sum_{\ell = 1}^k h_\ell \mathbf{1}_{B_\ell},\quad K = \sum_{\ell = 1}^n k_\ell \mathbf{1}_{B_\ell}
    \]
    for $B_\ell \subseteq [0, \infty)$ bounded Borel sets with disjoint support.
  \end{claim}
  We have
  \begin{align*}
    \int |H_s K_s| \;|\d \bra M, N\ket_s| &\leq \sum_{\ell = 1}^n |h_\ell k_\ell| \int_{B_\ell} |\d \bar M, N\ket_s|\\
    &\leq \sum_{\ell = 1}^n |h_\ell k_\ell| \left(\int_{B_\ell} \d \bra M\ket_s\right)^{1/2} \left(\int_{B_\ell} \d \bra N\ket_s\right)^{1/2}\\
    &\leq \left(\sum_{\ell = 1}^n h_\ell^2 \int_{B_\ell} \d \bra M\ket_s\right)^{1/2} \left(\sum_{\ell = 1}^n k_\ell^2 \int_{B_\ell} \d \bra N\ket_s\right)^{1/2}
  \end{align*}
  To finish the proof, approximate general $H$ and $K$ by step functions and take the limit.
\end{proof}

\subsection{Semi-martingale}
\begin{defi}[Semi-martingale]\index{semi-martingale}
  A (continuous) adapted process $X$ is a \emph{(continuous) semi-martingale} if
  \[
    X = X_0 + M + A,
  \]
  where $X_0 \in \mathcal{F}_0$, $M$ is a continuous local martingale with $M_0 = 0$, and $A$ is a continuous finite variation process with $A_0 = 0$.
\end{defi}
This decomposition is unique up to indistinguishables.

\begin{defi}[Quadratic variation]\index{quadratic variation!semi-martingale}
  Let $X = X_0 + M + A$ and $X' = X_0' + M' + A'$ be (continuous) semi-martingales. Set
  \[
    \bra X\ket = \bra M\ket, \quad \bra X, X'\ket = \bra M, M'\ket.
  \]
\end{defi}

This definition makes sense, because continuous finite variation processes do not have quadratic variation.

\begin{ex}
  We have
  \[
    \bra X, Y\ket_t^{(n)} = \sum_{i = 1}^{\lceil 2^n t\rceil} (X_{i 2^{-n}} - X_{(i - 1)2^{-n}})(Y_{i2^{-n}} - Y_{(i - 1)2^{-n}}) \to \bra X, Y\ket\text{ u.c.p.}
  \]
\end{ex}

\section{The stochastic integral}
\subsection{Simple processes}
We now have all the background required to define the stochastic integral, and we can start constructing it. As in the case of the Lebesgue integral, we first define it for simple processes, and then extend to general processes by taking a limit. Recall that we have

\begin{defi}[Simple process]\index{simple process}
  The space of \emph{simple processes} $\mathcal{E}$ consists of functions $H: \Omega \times [0, \infty) \to \R$ that can be written as
  \[
    H_t(\omega) = \sum_{i = 1}^n H_{i - 1}(\omega) \mathbf{1}_{(t_{i - 1}, t_i]} (t)
  \]
  for some $0 \leq t_0 \leq t_1 \leq \cdots \leq t_n$ and bounded random variables $H_i \in \mathcal{F}_{t_i}$.
\end{defi}

\begin{defi}[$H\cdot M$]
  For $M \in \mathcal{M}^2$ and $H \in \mathcal{E}$, we set
  \[
    \int_0^t H \;\d M = (H\cdot M)_t = \sum_{i = 1}^n H_{i - 1} (M_{t_i \wedge t} - M_{t_{i - 1} \wedge t}).
  \]
\end{defi}
If $M$ were of finite variation, then this is the same as what we have previously seen.

Recall that for the Lebesgue integral, extending this definition to general functions required results like monotone convergence. Here we need some similar results that put bounds on how large the integral can be. In fact, we get something better than a bound.

\begin{prop}
  If $M \in \mathcal{M}_c^2$ and $H \in \mathcal{E}$, then $H \cdot M \in \mathcal{M}_c^2$ and
  \[
    \|H \cdot M\|_{\mathcal{M}^2}^2 = \E \left(\int_0^\infty H^2_s \;\d \bra M\ket_s\right).\tag{$*$}
  \]
\end{prop}

\begin{proof}
  We first show that $H \cdot M \in \mathcal{M}_c^2$. By linearity, we only have to check it for
  \[
    X_t^i = H_{i - 1} (M_{t_i \wedge t} - M_{t_{i - 1} \wedge t})
  \]
  We have to check that $\E(X_t^i \mid \mathcal{F}_s) = 0$ for all $t > s$, and the only non-trivial case is when $t > t_{i - 1}$.
  \[
    \E (X_t^i \mid \mathcal{F}_s) = H_{i - 1} \E (M_{t_i \wedge t} - M_{t_{i - 1} \wedge t} \mid \mathcal{F}_s) = 0.
  \]
  We also check that
  \[
    \|X^i\|_{\mathcal{M}^2} \leq 2 \|H\|_{\infty} \|M\|_{\mathcal{M}^2}.
  \]
  So it is bounded. So $H \cdot M \in \mathcal{M}_c^2$.

  To prove $(*)$, we note that the $X^i$ are orthogonal and that
  \[
    \bra X^i\ket_t = H_{i - 1}^2 (\bra M\ket_{t_i \wedge t} - \bra M\ket_{t_{i - 1} \wedge t}).
  \]
  So we have
  \[
    \bra H \cdot M, H \cdot M\ket = \sum \bra X^i, X^i\ket = \sum H_{i - 1}^2 (\bra M\ket_{t_i \wedge t} - \bra M\ket_{t_{i - 1} \wedge t}) = \int_0^t H_s^2 \;\d \bra M\ket_s.
  \]
  In particular,
  \[
    \|H \cdot M\|_{\mathcal{M}^2}^2 = \E \bra H \cdot M\ket_\infty = \E \left(\int_0^\infty H_s^2 \;\d \bra M\ket_s\right).\qedhere
  \]
\end{proof}

\begin{prop}
  Let $M \in \mathcal{M}_c^2$ and $H \in \mathcal{E}$. Then
  \[
    \bra H \cdot M, N\ket = H \cdot \bra M, N\ket
  \]
  for all $N \in \mathcal{M}^2$.
\end{prop}
In other words, the stochastic integral commutes with the bracket.

\begin{proof}
  Write $H \cdot M = \sum X^i = \sum H_{i - 1}(M_{t_i \wedge t} - M_{t_{i - 1} \wedge t})$ as before. Then
  \[
    \bra X^i, N\ket_t = H_{i - 1}\bra M_{t_i \wedge t} - M_{t_{i - 1} \wedge t}, N\ket = H_{i - 1} (\bra M, N\ket_{t_i \wedge t} - \bra M, N\ket_{t_{i - 1}\wedge t}).\qedhere
  \]
\end{proof}

\subsection{\tph{It\^o}{Ito}{It&ocirc;} isometry}
We now try to extend the above definition to something more general than simple processes.
\begin{defi}[$L^2(M)$]\index{$L^2(M)$}
  Let $M \in \mathcal{M}_c^2$. Define $L^2(M)$ to be the space of (equivalence classes of) previsible $H: \Omega \times [0, \infty) \to \R$ such that
  \[
    \|H\|_{L^2(M)} = \|H\|_{\mathcal{M}} = \E\left(\int_0^\infty H_s^2 \;\d \bra M\ket_s\right)^{1/2} < \infty.
  \]
  For $H, K \in L^2(M)$, we set
  \[
    (H, K)_{L^2(M)} = \E \left(\int_0^\infty H_s K_s \;\d \bra M\ket_s\right).
  \]
\end{defi}
In fact, $L^2(M)$ is equal to $L^2(\Omega \times [0, \infty), \mathcal{P}, \d P\;\d \bra M\ket)$, where $\mathcal{P}$ is the previsible $\sigma$-algebra, and in particular is a Hilbert space.

\begin{prop}
  Let $M \in \mathcal{M}_c^2$. Then $\mathcal{E}$ is dense in $L^2(M)$.
\end{prop}

\begin{proof}
  Since $L^2(M)$ is a Hilbert space, it suffices to show that if $(K, H) = 0$ for all $H \in \mathcal{E}$, then $K = 0$.

  So assume that $(K, H) = 0$ for all $H \in \mathcal{E}$ and
  \[
    X_t = \int_0^t K_s \;\d \bra M\ket_s,
  \]
  Then $X$ is a well-defined finite variation process, and $X_t \in L^1$ for all $t$. It suffices to show that $X_t = 0$ for all $t$, and we shall show that $X_t$ is a continuous martingale.

  Let $0 \leq s < t$ and $F \in \mathcal{F}_s$ bounded. We let $H = F 1_{(s, t]} \in \mathcal{E}$. By assumption, we know
  \[
    0 = (K, H) = \E \left(F \int_s^t K_u\; \d \bra M\ket_u\right) = \E (F (X_t - X_S)).
  \]
  Since this holds for all $\mathcal{F}_s$ measurable $F$, we have shown that
  \[
    \E(X_t \mid \mathcal{F}_s) = X_s.
  \]
  So $X$ is a (continuous) martingale, and we are done.
\end{proof}

\begin{thm}
  Let $M \in \mathcal{M}_c^2$. Then
  \begin{enumerate}
    \item The map $H \in \mathcal{E} \mapsto H \cdot M \in \mathcal{M}_c^2$ extends uniquely to an isometry $L^2(M) \to \mathcal{M}^2_c$, called the \term{It\^o isometry}.
    \item For $H \in L^2(M)$, $H \cdot M$ is the unique martingale in $\mathcal{M}_c^2$ such that
      \[
        \bra H \cdot M, N\ket = H \cdot \bra M, N\ket
      \]
      for all $N \in \mathcal{M}_c^2$, where the integral on the LHS is the stochastic integral (as above) and the RHS is the finite variation integral.
    \item If $T$ is a stopping time, then $(1_{[0, T]} H) \cdot M = (H \cdot M)^T = H \cdot M^T$.
  \end{enumerate}
\end{thm}

\begin{defi}[Stochastic integral]\index{stochastic integral}
  $H \cdot M$ is the \emph{stochastic integral} of $H$ with respect to $M$ and we also write
  \[
    (H \cdot M)_t = \int_0^t H_s \;\d M_s.
  \]
\end{defi}
It is important that the integral of martingale is still a martingale. After proving It\^o's formula, we will use this fact to show that a lot of things are in fact martingales in a rather systematic manner. For example, it will be rather effortless to show that $B_t^2 - t$ is a martingale when $B_t$ is a standard Brownian motion.

\begin{proof}\leavevmode
  \begin{enumerate}
    \item We have already shown that this map is an isometry when restricted to $\mathcal{E}$. So extend by completeness of $\mathcal{M}_c^2$ and denseness of $\mathcal{E}$.
    \item Again the equation to show is known for simple $H$, and we want to show it is preserved under taking limits. Suppose $H^n \to H$ in $L^2(M)$ with $H^n \in L^2(M)$. Then $H^n \cdot M \to H \cdot M$ in $\mathcal{M}_c^2$. We want to show that
      \begin{align*}
        \bra H \cdot M, N\ket_\infty &= \lim_{n \to \infty} \bra H^n \cdot M, N\ket_\infty\text{ in }L^1.\\
        H \cdot \bra M, N\ket &= \lim_{n \to \infty} H^n \cdot \bra M, N\ket\text{ in }L^1.
      \end{align*}
      for all $N \in \mathcal{M}_c^2$.

      To show the first holds, we use the Kunita--Watanabe inequality to get
      \[
        \E |\bra H \cdot M - H^n \cdot M, N\ket_\infty| \leq \E \left(\bra H \cdot M - H^n \cdot M\ket_\infty\right)^{1/2} \left(\E \bra N\ket_\infty\right)^{1/2},
      \]
      and the first factor is $\|H \cdot M - H^n \cdot M\|_{\mathcal{M}^2} \to 0$, while the second is finite since $N \in \mathcal{M}_c^2$. The second follows from
      \[
        \E \left|((H - H^n) \cdot \bra M, N\ket)_\infty\right| \leq \|H - H^n\|_{L^2(M)} \|N\|_{\mathcal{M}^2} \to 0.
      \]
      So we know that $\bra H \cdot M, N\ket_\infty = (H \cdot \bra M, N\ket)_\infty$. We can then replace $N$ by the stopped process $N^t$ to get $\bra H \cdot M, N\ket_t = (H \cdot \bra M, N\ket)_t$.

      To see uniqueness, suppose $X \in \mathcal{M}_c^2$ is another such martingale. Then we have $\bra X - H \cdot M, N\ket = 0$ for all $N$. Take $N = X - H \cdot M$, and then we are done.
    \item For $N \in \mathcal{M}^2_c$, we have
      \[
        \bra (H \cdot M)^T, N\ket_t = \bra H \cdot M, N \ket_{t \wedge T} = H \cdot \bra M, N\ket_{t \wedge T} = (H 1_{[0, T]} \cdot \bra M, N\ket)_t
      \]
      for every $N$. So we have shown that
      \[
        (H \cdot M)^T = (1_{[0, T]} H \cdot M)
      \]
      by (ii). To prove the second equality, we have
      \[
        \bra H \cdot M^T, N\ket_t = H \cdot \bra M^T, N\ket_t = H \cdot \bra M, N\ket_{t \wedge T} = ((H1_{[0, T]} \cdot \bra M, N\ket)_t.\qedhere
      \]%\qedhere
  \end{enumerate}
\end{proof}

Note that (ii) can be written as
\[
  \left\bra \int_0^{(-)} H_s \;\d M_s, N\right\ket_t = \int_0^t H_s \;\d \bra M, N\ket_s.
\]
\begin{cor}
  \[
    \bra H \cdot M, K \cdot N\ket = H \cdot (K \cdot \bra M, N\ket) = (HK) \cdot \bra M, N\ket.
  \]
  In other words,
  \[
    \left\bra \int_0^{(-)} H_s \;\d M_s, \int_0^{(-)} K_s\;\d N_s\right\ket_t = \int_0^t H_s K_s \;\d \bra M, N\ket_s.\fakeqed
  \]
\end{cor}

\begin{cor}
  Since $H \cdot M$ and $(H \cdot M)(K \cdot N) - \bra H \cdot M, K \cdot N\ket$ are martingales starting at $0$, we have
  \begin{align*}
    \E \left(\int_0^t H\;\d M_s\right) &= 0\\
    \E \left(\left(\int_0^t H_s \;\d M_s\right) \left(\int_0^t K_s \;\d N_s \right)\right) &= \int_0^t H_s K_s \;\d \bra M, N\ket_s.\fakeqed
  \end{align*}
\end{cor}

\begin{cor}
  Let $H \in L^2 (M)$, then $HK \in L^2(M)$ iff $K \in L^2(H \cdot M)$, in which case
  \[
    (KH) \cdot M = K \cdot (H \cdot M).
  \]
\end{cor}

\begin{proof}
  We have
  \[
    \E \left(\int_0^\infty K_s^2 H_s^2 \;\d \bra M_s\ket\right) = \E \left(\int_0^\infty K_s^2 \bra H \cdot M\ket_s \right),
  \]
  so $\|K\|_{L^2(H \cdot M)} = \|HK\|_{L^2(M)}$. For $N \in \mathcal{M}_c^2$, we have
  \[
    \bra (KH) \cdot M, N\ket_t = (KH \cdot \bra M, N\ket)_t = (K \cdot (H \cdot \bra M, N\ket))_t = (K \cdot \bra H \cdot M, N\ket)_t.\qedhere
  \]
\end{proof}

\subsection{Extension to local martingales}
We have now defined the stochastic integral for continuous martingales. We next go through some formalities to extend this to local martingales, and ultimately to semi-martingales. We are not doing this just for fun. Rather, when we later prove results like It\^o's formula, even when we put in continuous (local) martingales, we usually end up with some semi-martingales. So it is useful to be able to deal with semi-martingales in general.

\begin{defi}[$L_{bc}^2(M)$]\index{$L_{bc}^2(M)$}
  Let $L_{bc}^2(M)$ be the space of previsible $H$ such that
  \[
    \int_0^t H_s^2 \;\d \bra M\ket_s < \infty\text{ a.s.}
  \]
  for all finite $t > 0$.
\end{defi}

\begin{thm}
  Let $M$ be a continuous local martingale.
  \begin{enumerate}
    \item For every $H \in L_{bc}^2(M)$, there is a unique continuous local martingale $H \cdot M$ with $(H \cdot M)_0= 0 $ and
      \[
        \bra H \cdot M, N\ket = H \cdot \bra M, N\ket
      \]
      for all $N, M$.
    \item If $T$ is a stopping time, then
      \[
        (\mathbf{1}_{[0, T]}H) \cdot M = (H \cdot M)^T = H \cdot M^T.
      \]
    \item If $H \in L^2_{loc}(M)$, $K$ is previsible, then $K \in L^2_{loc}(H \cdot M)$ iff $HK \in L^2_{loc}(M)$, and then
      \[
        K \cdot (H \cdot M) = (KH) \cdot M.
      \]
    \item Finally, if $M \in \mathcal{M}_c^2$ and $H \in L^2(M)$, then the definition is the same as the previous one.
  \end{enumerate}
\end{thm}

\begin{proof}
  Assume $M_0 = 0$, and that $\int_0^t H_s^2 \;\d \bra M\ket_s < \infty$ for all $\omega \in \Omega$ (by setting $H = 0$ when this fails). Set
  \[
    S_n = \inf \left\{ t \geq 0 : \int_0^t (1 + H_s^2) \;\d \bra M \ket_s \geq n\right\}.
  \]
  These $S_n$ are stopping times that tend to infinity. Then
  \[
    \bra M^{S_n}, M^{S_n}\ket_t = \bra M, M\ket_{t \wedge S_n} \leq n.
  \]
  So $M^{S_n} \in \mathcal{M}_c^2$. Also,
  \[
    \int_0^\infty H_s \;\d \bra M^{S_n}\ket_s = \int_0^{S_n} H_s^2 \;\d \bra M\ket_s \leq n.
  \]
  So $H \in L^2(M^{S_n})$, and we have already defined what $H \cdot M^{S_n}$ is. Now notice that
  \[
    H \cdot M^{S_n} = (H \cdot M^{S_m})^{S_n}\text{ for }m \geq n.
  \]
  So it makes sense to define
  \[
    H \cdot M = \lim_{n \to \infty} H \cdot M^{S_n}.
  \]
  This is the unique process such that $(H \cdot M)^{S_n} = H \cdot M^{S_n}$. We see that $H \cdot M$ is a continuous adapted local martingale with reducing sequence $S_n$.
  \begin{claim}
    $\bra H \cdot M, N\ket = H \cdot \bra M, N\ket$.
  \end{claim}
  Indeed, assume that $N_0 = 0$. Set $S_n' = \inf \{t \geq 0: |N_t| \geq n\}$. Set $T_n = S_n \wedge S_n'$. Observe that $N^{S_n'} \in \mathcal{M}_c^2$. Then
  \[
    \bra H \cdot M, N\ket^{T_n} = \bra H \cdot M^{S_n}, N^{S_n'}\ket = H \cdot \bra M^{S_n}, N^{S_n'}\ket = H \cdot \bra M, N\ket^{T_n}.
  \]
  Taking the limit $n \to \infty$ gives the desired result.

  The proofs of the other claims are the same as before, since they only use the characterizing property $\bra H \cdot M, N\ket = H \cdot \bra M, N\ket$.
\end{proof}

\subsection{Extension to semi-martingales}
\begin{defi}[Locally boounded previsible process]\index{locally bounded previsible process}
  A previsible process $H$ is \emph{locally bounded} if for all $t \geq 0$, we have
  \[
    \sup_{s \leq t}|H_s| < \infty\text{ a.s.}
  \]
\end{defi}

\begin{fact}\leavevmode
  \begin{enumerate}
    \item Any adapted continuous process is locally bounded.
    \item If $H$ is locally bounded and $A$ is a finite variation process, then for all $t \geq 0$, we have
      \[
        \int_0^t |H_s|\;|\d A_s| < \infty\text{ a.s.}
      \]
  \end{enumerate}
\end{fact}

Now if $X = X_0 + M + A$ is a semi-martingale, where $X_0 \in \mathcal{F}_0$, $M$ is a continuous local martingale and $A$ is a finite variation process, we want to define $\int H_s \;\d X_s$. We already know what it means to define integration with respect to $\d M_s$ and $\d A_s$, using the It\^o integral and the finite variation integral respectively, and $X_0$ doesn't change, so we can ignore it.

\begin{defi}[Stochastic integral]\index{stochastic integral}\index{$H \cdot X$}
  Let $X = X_0 + M + A$ be a continuous semi-martingale, and $H$ a locally bounded previsible process. Then the \term{stochastic integral} $H \cdot X$ is the continuous semi-martingale defined by
  \[
    H \cdot X = H \cdot M + H \cdot A,
  \]
  and we write
  \[
    (H \cdot X)_t = \int_0^T H_s \;\d X_s.
  \]
\end{defi}

\begin{prop}\leavevmode
  \begin{enumerate}
    \item $(H, X) \mapsto H \cdot X$ is bilinear.
    \item $H \cdot (K \cdot X) = (HK) \cdot X$ if $H$ and $K$ are locally bounded.
    \item $(H \cdot X)^T = H1_{[0, T]} \cdot X = H \cdot X^T$ for every stopping time $T$.
    \item If $X$ is a continuous local martingale (resp.\ a finite variation process), then so is $H \cdot X$.
    \item If $H = \sum_{i = 1}^n H_{i - 1} \mathbf{1}_{(t_{i - 1}, t_i]}$ and $H_{i - 1} \in \mathcal{F}_{t_{i - 1}}$ (not necessarily bounded), then
      \[
        (H \cdot X)_t = \sum_{i = 1}^n H_{i - 1}(X_{t_i \wedge t} - X_{t_{i - 1} \wedge t}).
      \]
  \end{enumerate}
\end{prop}
\begin{proof}
  (i) to (iv) follow from analogous properties for $H \cdot M$ and $H \cdot A$. The last part is also true by definition if the $H_i$ are uniformly bounded. If $H_i$ is not bounded, then the finite variation part is still fine, since for each fixed $\omega \in \Omega$, $H_i(\omega)$ is a fixed number. For the martingale part, set
  \[
    T_n = \inf \{t \geq 0 : |H_t| \geq n\}.
  \]
  Then $T_n$ are stopping times, $T_n \to \infty$, and $H1_{[0, T_n]} \in \mathcal{E}$. Thus
  \[
    (H \cdot M)_{t \wedge T_n} = \sum_{i = 1}^n H_{i - 1} T_{[0, T_n]} (X_{t_i \wedge t} - X_{t_{i - 1}\wedge t}).
  \]
  Then take the limit $n \to \infty$.
\end{proof}

Before we get to It\^o's formula, we need a few more useful properties:
\begin{prop}[Stochastic dominated convergence theorem]\index{stochastic dominated convergence theorem}
  Let $X$ be a continuous semi-martingale. Let $H, H_s$ be previsible and locally bounded, and let $K$ be previsible and non-negative. Let $t > 0$. Suppose
  \begin{enumerate}
    \item $H_s^n \to H_s$ as $n \to \infty$ for all $s \in [0, t]$.
    \item $|H_s^n| \leq K_s$ for all $s \in [0, t]$ and $n \in \N$.
    \item $\int_0^t K_s^2\;\d \bra M\ket_s < \infty$ and $\int_0^t K_s \;|\d A_s|< \infty$ (note that both conditions are okay if $K$ is locally bounded).
  \end{enumerate}
  Then
  \[
    \int_0^t H_s^n \;\d X_s \to \int_0^t H_s\;\d X_s \text{ in probability}.
  \]
\end{prop}

\begin{proof}
  For the finite variation part, the convergence follows from the usual dominated convergence theorem. For the martingale part, we set
  \[
    T_m = \inf \left\{t \geq 0: \int_0^t K_s^2 \;\d \bra M\ket_s \geq m\right\}.
  \]
  So we have
  \[
    \E \left(\left(\int_0^{T_m \wedge t}\hspace{-13pt}H_s^n \;\d M_s - \int_0^{T_n \wedge t}\hspace{-13pt}H_s\;\d M_s\right)^2\right) \leq \E \left(\int_0^{T_n \wedge t}\vspace{-13pt}(H_s^n- H_s)^2 \;\d \bra M\ket_s\right) \to 0.
  \]
  using the usual dominated convergence theorem, since $\int_0^{T_n \wedge t} K_s^2 \;\d \bra M\ket_s \leq m$.

  Since $T_n \wedge t = t$ eventually as $n \to \infty$ almost surely, hence in probability, we are done.
\end{proof}

\begin{prop}
  Let $X$ be a continuous semi-martingale, and let $H$ be an adapted bounded left-continuous process. Then for every subdivision $0 < t_0^{(m)} < t_1^{(m)} < \cdots < t_{n_m}^{(m)}$ of $[0, t]$ with $\max_i |t_i^{(m)} - t_{i - 1}^{(m)}| \to 0$, then
  \[
    \int_0^t H_s \;\d X_s = \lim_{m \to \infty} \sum_{i = 1}^{n_m} H_{t_{i - 1}^{(m)}} (X_{t_i^{(m)}} - X_{t_{i - 1}^{(m)}})
  \]
  in probability.
\end{prop}

\begin{proof}
  We have already proved this for the Lebesgue--Stieltjes integral, and all we used was dominated convergence. So the same proof works using stochastic dominated convergence theorem.
\end{proof}

\subsection{\tph{It\^o}{Ito}{It&ocirc;} formula}
We now prove the equivalent of the integration by parts and the chain rule, i.e.\ It\^o's formula. Compared to the world of usual integrals, the difference is that the quadratic variation, i.e.\ ``second order terms'' will crop up quite a lot, since they are no longer negligible.

\begin{thm}[Integration by parts]\index{integration by parts}
  Let $X, Y$ be a continuous semi-martingale. Then almost surely,
  \[
    X_t Y_t - X_0 Y_0 = \int_0^t X_s \;\d Y_s + \int_0^t Y_s\;\d X_s + \bra X, Y\ket_t
  \]
  The last term is called the \term{It\^o correction}.
\end{thm}
Note that if $X, Y$ are martingales, then the first two terms on the right are martingales, but the last is not. So we are forced to think about semi-martingales.

Observe that in the case of finite variation integrals, we don't have the correction.
\begin{proof}
  We have
  \[
    X_t Y_t - X_s Y_s = X_s (Y_t - Y_s) + (X_t - X_s) Y_s + (X_t - X_s)(Y_t - Y_s).
  \]
  When doing usual calculus, we can drop the last term, because it is second order. However, the quadratic variation of martingales is in general non-zero, and so we must keep track of this. We have
  \begin{align*}
    X_{k2^{-n}} Y_{k2^{-n}} - X_0 Y_0 &= \sum_{i = 1}^k (X_{i2^{-n}} Y_{i2^{-n}} - X_{(i - 1)2^{-n}}Y_{(i - 1)2^{-n}})\\
  &= \sum_{i = 1}^n \Big( X_{(i - 1)2^{-n}} (Y_{i2^{-n}} - Y_{(i - 1)2^{-n}})\\
  &\hphantom{=aaa}+ Y_{(i - 1)2^{-n}}(X_{i 2^{-n}} - X_{(i - 1)2^{-n}}) \\
  &\hphantom{=aaa} + (X_{i2^{-n}} - X_{(i - 1)^{2^{-n}}})(Y_{i2^{-n}} - Y_{(i - 1)2^{-n}})\Big)
  \end{align*}
  Taking the limit $n \to \infty$ with $k2^{-n}$ fixed, we see that the formula holds for $t$ a dyadic rational. Then by continuity, it holds for all $t$.
\end{proof}

The really useful formula is the following:
\begin{thm}[It\^o's formula]\index{It\^o's formula}
  Let $X^1, \ldots, X^p$ be continuous semi-martingales, and let $f: \R^p \to \R$ be $C^2$. Then, writing $X = (X^1, \ldots, X^p)$, we have, almost surely,
  \[
    f(X_t) = f(X_0) + \sum_{i = 1}^p \int_0^t \frac{\partial f}{\partial x_i}(X_s)\;\d X_s^i + \frac{1}{2} \sum_{i, j = 1}^p \int_0^t \frac{\partial^2 f}{\partial x^i \partial x^j} (X_s) \;\d \bra X^i, X^j\ket_s.
  \]
  In particular, $f(X)$ is a semi-martingale.
\end{thm}

The proof is long but not hard. We first do it for polynomials by explicit computation, and then use Weierstrass approximation to extend it to more general functions.
\begin{proof}
  \begin{claim}
    It\^o's formula holds when $f$ is a polynomial.
  \end{claim}
  It clearly does when $f$ is a constant! We then proceed by induction. Suppose It\^o's formula holds for some $f$. Then we apply integration by parts to
  \[
    g(x) = x^k f(x).
  \]
  where $x^k$ denotes the $k$th component of $x$. Then we have
  \[
    g(X_t) = g(X_0) + \int_0^t X_s^k \;\d f(X_s) + \int_0^t f(X_s) \;\d X_s^k + \bra X^k, f(X)\ket_t
  \]
  We now apply It\^o's formula for $f$ to write
  \begin{multline*}
    \int_0^t X_s^k \;\d f(X_s) = \sum_{i = 1}^p \int_0^t X_s^k \frac{\partial f}{\partial x^i}(X_s) \;\d X_s^i\\
    + \frac{1}{2} \sum_{i, j = 1}^p \int_0^t X_s^k \frac{\partial^2f}{\partial x^i \partial x^j} (X_s)\;\d \bra X^i, X^j\ket_s.
  \end{multline*}
  We also have
  \[
    \bra X^k, f(X)\ket_t = \sum_{i = 1}^p \int_0^t \frac{\partial f}{\partial x^i}(X_s) \;\d \bra X^k, X^i\ket_s.
  \]
  So we have
  \[
    g(X_t) = g(X_0) + \sum_{i = 1}^p \int_0^t \frac{\partial g}{\partial x^i}(X_s) \;\d X_s^i + \frac{1}{2} \sum_{i, j = 1}^p \int_0^t \frac{\partial^2 g}{\partial x^i \partial x^j} (X_s) \;\d \bra X^i, X^j\ket_s.
  \]
  So by induction, It\^o's formula holds for all polynomials.

  \begin{claim}
    It\^o's formula holds for all $f \in C^2$ if $|X_t(\omega)| \leq n$ and $\int_0^t |\d A_s| \leq n$ for all $(t, \omega)$.
  \end{claim}

  By the Weierstrass approximation theorem, there are polynomials $p_k$ such that
  \[
    \sup_{|x| \leq k} \left(|f(x) - p_k(x)| + \max_i \left|\frac{\partial f}{\partial x^i} - \frac{\partial p}{\partial x^i}\right| + \max_{i, j} \left|\frac{\partial^2 f}{\partial x^i \partial x^j} - \frac{\partial p_k}{\partial x^i \partial x^j}\right| \right) \leq \frac{1}{k}.
  \]
  By taking limits, in probability, we have
  \begin{align*}
    f(X_t) - f(X_0) &= \lim_{k \to \infty} (p_k(X_t) - p_k(X_0))\\
    \int_0^t \frac{\partial f}{\partial x^i} (X_s) \;\d X_s^i &= \lim_{k \to \infty} \frac{\partial p_k}{\partial x^i} (X_s) \;\d X_s^i
  \end{align*}
  by stochastic dominated convergence theorem, and by the regular dominated convergence, we have
  \[
    \int_0^t \frac{\partial f}{\partial x^i \partial x^j} \;\d \bra X^i, X^j\ket_s = \lim_{k \to \infty} \int_0^t \frac{\partial^2 p_k}{\partial x^i \partial x^j} \;\d \bra X^i, X^j\ket.
  \]
  \begin{claim}
    It\^o's formula holds for all $X$.
  \end{claim}

  Let
  \[
    T_n = \inf \left\{t \geq 0: |X_t| \geq n\text{ or }\int_0^t |\d A_s| \geq n\right\}
  \]
  Then by the previous claim, we have
  \begin{align*}
    f(X_t^{T_n}) &= f(X_0) + \sum_{i = 1}^p \int_0^t \frac{\partial f}{\partial x^i} (X_s^{T_n})\;\d (X_i)_s^{T_n} \\
    &\hphantom{aaaaaa}+ \frac{1}{2} \sum_{i, j} \int_0^t \frac{\partial^2 f}{\partial x^i \partial x^j} (X_s^{T_n})\;\d \bra (X_i)^{T_n}, (X^j)^{T_n}\ket_s\\
    &= f(X_0) + \sum_{i = 1}^p \int_0^{t \wedge T_n} \frac{\partial f}{\partial x^i} (X_s)\;\d (X_i)_s \\
    &\hphantom{aaaaaa}+ \frac{1}{2} \sum_{i, j} \int_0^{t \wedge T_n} \frac{\partial^2 f}{\partial x^i \partial x^j} (X_s)\;\d \bra (X_i), (X^j)\ket_s.
  \end{align*}
  Then take $T_n \to \infty$.
\end{proof}

\begin{eg}
  Let $B$ be a standard Brownian motion, $B_0 = 0$ and $f(x) = x^2$. Then
  \[
    B_t^2 = 2 \int_0^t B_S \;\d B_s + t.
  \]
  In other words,
  \[
    B_t^2 - t = 2 \int_0^t B_s\;\d B_s.
  \]
  In particular, this is a continuous local martingale.
\end{eg}

\begin{eg}
  Let $B = (B^1, \ldots, B^d)$ be a $d$-dimensional Brownian motion. Then we apply It\^o's formula to the semi-martingale $X = (t, B^1, \ldots, B^d)$. Then we find that
  \[
    f(t, B_t) - f(0, B_0) - \int_0^t \left(\frac{\partial}{\partial s} + \frac{1}{2} \Delta\right) f(s, B_s) \;\d s = \sum_{i = 1}^d \int_0^t \frac{\partial}{\partial x^i} f(s, B_s)\;\d B_s^i
  \]
  is a continuous local martingale.
\end{eg}

There are some syntactic tricks that make stochastic integrals easier to manipulate, namely by working in differential form. We can state It\^o's formula in differential form
\[
  \d f(X_t) = \sum_{i = 1}^p \frac{\partial f}{\partial x^i} \;\d X^i + \frac{1}{2} \sum_{i, j = 1}^p \frac{\partial^2 f}{\partial x^i\partial x^j} \;\d \bra X^i, X^j\ket,
\]
which we can think of as the chain rule. For example, in the case case of Brownian motion, we have
\[
  \d f(B_t) = f'(B_t) \;\d B_t + \frac{1}{2} f''(B_t) \;\d t.
\]
Formally, one expands $f$ using that that ``$(\d t)^2 = 0$'' but ``$(\d B)^2 = \d t$''. The following formal rules hold:
\begin{align*}
  Z_t - Z_0 = \int_0^t H_s \;\d X_s &\Longleftrightarrow \d Z_t = H_t \;\d X_t\\
  Z_t = \bra X, Y\ket_t = \int_0^t \;\d \bra X, Y\ket_t &\Longleftrightarrow \d Z_t = \d X_t \;\d Y_t.
\end{align*}
Then we have rules such as
\begin{align*}
  H_t(K_t \;\d X_t) &= (H_t K_t)\;\d X_t\\
  H_t (\d X_t \;\d Y_t) &= (H_t \;\d X_t)\;\d Y_t\\
  \d (X_t Y_t) &= X_t \;\d Y_t + Y_t \;\d X_t + \d X_t \;\d Y_t\\
  \d f(X_t) &= f'(X_t) \;\d X_t + \frac{1}{2} f''(X_t) \;\d X_t\;\d X_t.
\end{align*}

\subsection{The \tph{L\'evy}{L\'evy}{Levy} characterization}
A more major application of the stochastic integral is the following convenient characterization of Brownian motion:
\begin{thm}[L\'evy's characterization of Brownian motion]\index{L\'evy's characterization of Brownian motion}
  Let $(X^1, \ldots, X^d)$ be continuous local martingales. Suppose that $X_0 = 0$ and that $\bra X^i, X^j\ket_t = \delta_{ij} t$ for all $i, j = 1, \ldots, d$ and $t \geq 0$. Then $(X^1, \ldots, X^d)$ is a standard $d$-dimensional Brownian motion.
\end{thm}
This might seem like a rather artificial condition, but it turns out to be quite useful in practice (though less so in this course). The point is that we know that $\bra H \cdot M\ket_t = H^2_t \cdot \bra M\ket_t$, and in particular if we are integrating things with respect to Brownian motions of some sort, we know $\bra B_t \ket_t = t$, and so we are left with some explicit, familiar integral to do.

\begin{proof}
  Let $0 \leq s < t$. It suffices to check that $X_t - X_s$ is independent of $\mathcal{F}_s$ and $X_t - X_s \sim N(0, (t - s) I)$.
  \begin{claim}
    $\E(e^{i\theta \cdot (X_t - X_s)} \mid \mathcal{F}_s) = e^{-\frac{1}{2} |\theta|^2 (t - s)}$ for all $\theta \in \R^d$ and $s < t$.
  \end{claim}
  This is sufficient, since the right-hand side is independent of $\mathcal{F}_s$, hence so is the left-hand side, and the Fourier transform characterizes the distribution.

  To check this, for $\theta \in \R^d$, we define
  \[
    Y_t = \theta \cdot X_t = \sum_{i = 1}^d \theta^i X_t^i.
  \]
  Then $Y$ is a continuous local martingale, and we have
  \[
    \bra Y\ket_t = \bra Y, Y\ket_t = \sum_{i, j = 1}^d \theta^j \theta^k\; \bra X^j, X^k\ket_t = |\theta|^2 t.
  \]
  by assumption. Let
  \[
    Z_t = e^{iY_t + \frac{1}{2} \bra Y\ket_t} = e^{i \theta \cdot X_t + \frac{1}{2} |\theta|^2 t}.
  \]
  By It\^o's formula, with $X = i Y + \frac{1}{2} \bra Y\ket_t$ and $f(x) = e^x$, we get
  \[
    \d Z_t = Z_t \left(i \d Y_t - \frac{1}{2} \d \bra Y\ket_t + \frac{1}{2} \d \bra Y\ket_t\right) = i Z_t \;\d Y_t.
  \]
  So this implies $Z$ is a continuous local martingale. Moreover, since $Z$ is bounded on bounded intervals of $t$, we know $Z$ is in fact a martingale, and $Z_0 = 1$. Then by definition of a martingale, we have
  \[
    \E (Z_t \mid \mathcal{F}_s) = Z_s,
  \]
  And unwrapping the definition of $Z_t$ shows that the result follows.
\end{proof}

In general, the quadratic variation of a process doesn't have to be linear in $t$. It turns out if the quadratic variation increases to infinity, then the martingale is still a Brownian motion up to reparametrization.
\begin{thm}[Dubins--Schwarz]\index{Dubins--Schwarz theorem}
  Let $M$ be a continuous local martingale with $M_0 = 0$ and $\bra M\ket_\infty = \infty$. Let
  \[
    T_s = \inf \{t \geq 0: \bra M\ket_t > s\},
  \]
  the right-continuous inverse of $\bra M\ket_t$. Let $B_s = M_{T_s}$ and $\mathcal{G}_s = \mathcal{F}_{T_s}$. Then $T_s$ is a $(\mathcal{F}_t)$ stopping time, $\bra M\ket_{T_s} = s$ for all $s \geq 0$, $B$ is a $(\mathcal{G}_s)$-Brownian motion, and
  \[
    M_t = B_{\bra M\ket_t}.
  \]
\end{thm}

\begin{proof}
  Since $\bra M\ket$ is continuous and adapted, and $\bra M\ket_\infty = \infty$, we know $T_s$ is a stopping time and $T_s < \infty$ for all $s \geq 0$.
  \begin{claim}
    $(\mathcal{G}_s)$ is a filtration obeying the usual conditions, and $\mathcal{G}_\infty = \mathcal{F}_\infty$
  \end{claim}
  Indeed, if $A \in \mathcal{G}_s$ and $s < t$, then
  \[
    A \cap \{T_t \leq u\} = A \cap \{T_s \leq u\} \cap \{T_t \leq u\} \in \mathcal{F}_u,
  \]
  using that $A \cap \{T_s \leq u\} \in \mathcal{F}_u$ since $A \in \mathcal{G}_s$. Then right-continuity follows from that of $(\mathcal{F}_t)$ and the right-continuity of $s \mapsto T_s$.

  \begin{claim}
    $B$ is adapted to $(\mathcal{G}_s)$.
  \end{claim}
  In general, if $X$ is c\'adl\'ag and $T$ is a stopping time, then $X_T \mathbf{1}_{\{T < \infty\}} \in \mathcal{F}_T$. Apply this is with $X = M$, $T = T_s$ and $\mathcal{F}_T = \mathcal{G}_s$. Thus $B_s \in \mathcal{G}_s$.

  \begin{claim}
    $B$ is continuous.
  \end{claim}
  Here this is actually something to verify, because $s \mapsto T_s$ is only right continuous, not necessarily continuous. Thus, we only know $B_s$ is right continuous, and we have to check it is left continuous.

  Now $B$ is left-continuous at $s$ iff $B_s = B_{s^-}$, iff $M_{T_s} = M_{T_{s-}}$.
  Now we have
  \[
    T_{s-} = \inf \{t \geq 0 : \bra M\ket_t \geq s\}.
  \]
  If $T_s = T_{s-}$, then there is nothing to show. Thus, we may assume $T_s > T_{s-}$. Then we have $\bra M\ket_{T_s} = \bra M\ket_{T_{s-}}$. Since $\bra M\ket_t$ is increasing, it means $\bra M\ket_{T_s}$ is constant in $[T_{s-}, T_s]$. We will later prove that
  \begin{lemma}
    $M$ is constant on $[a, b]$ iff $\bra M\ket$ being constant on $[a, b]$.
  \end{lemma}
  So we know that if $T_s > T_{s-}$, then $M_{T_s} = M_{T_s-}$. So $B$ is left continuous.

  We then have to show that $B$ is a martingale.
  \begin{claim}
    $(M^2 - \bra M\ket)^{T_s}$ is a uniformly integrable martingale.
  \end{claim}
  To see this, observe that $\bra M^{T_s}\ket_\infty = \bra M\ket_{T_s} = s$, and so $M^{T_s}$ is bounded. So $(M^2 - \bra M\ket)^{T_s}$ is a uniformly integrable martingale.

  We now apply the optional stopping theorem, which tells us
  \[
    \E (B_s \mid \mathcal{G}_r) = \E (M^{T_s}_\infty \mid \mathcal{G}_s) = M_{T_t} = B_t.
  \]
  So $B_t$ is a martingale. Moreover,
  \[
    \E (B_s^2 - s \mid \mathcal{G}_r) = \E ((M^2 - \bra M\ket)^{T_s} \mid \mathcal{F}_{T_r}) = M_{T_r}^2 - \bra M\ket_{T_r} = B^2_r - r.
  \]
  So $B^2_t - t$ is a martingale, so by the characterizing property of the quadratic variation, $\bra B\ket_t = t$. So by L\'evy's criterion, this is a Brownian motion in one dimension.
\end{proof}

The theorem is only true for martingales in one dimension. In two dimensions, this need not be true, because the time change needed for the horizontal and vertical may not agree. However, in the example sheet, we see that the holomorphic image of a Brownian motion is still a Brownian motion up to a time change.

\begin{lemma}
  $M$ is constant on $[a, b]$ iff $\bra M\ket$ being constant on $[a, b]$.
\end{lemma}

\begin{proof}
  It is clear that if $M$ is constant, then so is $\bra M\ket$. To prove the converse, by continuity, it suffices to prove that for any fixed $a < b$,
  \[
    \{M_t = M_a \text{ for all }t \in [a, b]\} \supseteq \{\bra M\ket_b = \bra M\ket_a\}\text{ almost surely}.
  \]
  We set $N_t = M_t - M_t \wedge a$. Then $\bra N\ket_t = \bra M\ket_t - \bra M\ket_{t \wedge a}$. Define
  \[
    T_\varepsilon = \inf \{t \geq 0: \bra N\ket_t \geq \varepsilon\}.
  \]
  Then since $N^2 - \bra N\ket$ is a local martingale, we know that
  \[
    \E(N_{t \wedge T_\varepsilon}^2) = \E (\bra N\ket_{t \wedge T_\varepsilon})\leq \varepsilon.
  \]
  Now observe that on the event $\{\bra M\ket_b = \bra M\ket_a\}$, we have $\bra N\ket_b = 0$. So for $t \in [a, b]$, we have
  \[
    \E(1_{\{\bra M\ket_b = \bra M\ket_a\}} N_t^2) = \E(1_{\{\bra M\ket_b = \bra M\ket_a} N_{t \wedge T_\varepsilon}^2)= \E(\bra N\ket_{t \wedge T_\varepsilon}) = 0.\qedhere
  \]
\end{proof}

\subsection{Girsanov's theorem}
Girsanov's theorem tells us what happens to our (semi)-martingales when we change the measure of our space. We first look at a simple example when we perform a shift.
\begin{eg}
  Let $X \sim N(0, C)$ be an $n$-dimensional centered Gaussian with positive definite covariance $C = (C_{ij})_{i, j = 1}^n$. Put $M = C^{-1}$. Then for any function $f$, we have
  \[
    \E f(X) = \left(\det \frac{M}{2\pi}\right)^{1/2} \int_{\R^n} f(x) e^{-\frac{1}{2} (x, Mx)}\;\d x.
  \]
  Now fix an $a \in \R^n$. The distribution of $X + a$ then satisfies
  \[
    \E f(X + a) = \left(\det \frac{M}{2\pi}\right)^{1/2} \int_{\R^n} f(x) e^{-\frac{1}{2} (x - a, M (x - a))}\;\d x = \E [Z f(X)],
  \]
  where
  \[
    Z = Z(x) = e^{-\frac{1}{2} (a, Ma) + (x, Ma)}.
  \]
  Thus, if $\P$ denotes the distribution of $X$, then the measure $\Q$ with
  \[
    \frac{\d \Q}{\d \P} = Z
  \]
  is that of $N(a, C)$ vector.
\end{eg}

\begin{eg}
  We can extend the above example to Brownian motion. Let $B$ be a Brownian motion with $B_0 = 0$, and $h: [0, \infty) \to \R$ a deterministic function. We then want to understand the distribution of $B_t + h$.

  Fix a finite sequence of times $0 = t_0 < t_1 < \cdots < t_n$. Then we know that $(B_{t_i})_{i = 1}^n$ is a centered Gaussian random variable. Thus, if $f(B) = f(B_{t_1}, \ldots, B_{t_n})$ is a function, then
  \[
    \E (f(B)) = c \cdot \int_{\R^n} f(x) e^{-\frac{1}{2} \sum_{i = 1}^n \frac{(x_i - x_{i - 1})^2}{t_i - t_{i - 1}}}\;\d x_1 \cdots \d x_n.
  \]
  Thus, after a shift, we get
  \begin{align*}
    &\E (f (B + h)) = \E (Z f(B)),\\
    &Z = \exp \left(-\frac{1}{2} \sum_{i = 1}^n \frac{(h_{t_i} - h_{t_{i - 1}})^2}{t_i - t_{i - 1}} + \sum_{i = 1}^n \frac{(h_{t_i} - h_{t_{i - 1}})(B_{t_i} - B_{t_{i - 1}})}{t_i - t_{i - 1}}\right).
  \end{align*}
\end{eg}
In general, we are interested in what happens when we change the measure by an exponential:
\begin{defi}[Stochastic exponential]\index{stochastic exponential}
  Let $M$ be a continuous local martingale. Then the \emph{stochastic exponential} (or \term{Dol\'eans--Dade exponential}) of $M$ is
  \[
    \mathcal{E}(M)_t = e^{M_t - \frac{1}{2} \bra M\ket_t}
  \]
\end{defi}
The point of introducing that quadratic variation term is
\begin{prop}
  Let $M$ be a continuous local martingale with $M_0 = 0$. Then $\mathcal{E}(M) = Z$ satisfies
  \[
    \d Z_t = Z_t \;\d M,
  \]
  i.e.
  \[
    Z_t = 1 + \int_0^t Z_s \;\d M_s.
  \]
  In particular, $\mathcal{E}(M)$ is a continuous local martingale. Moreover, if $\bra M\ket$ is uniformly bounded, then $\mathcal{E}(M)$ is a uniformly integrable martingale.
\end{prop}
There is a more general condition for the final property, namely Novikov's condition, but we will not go into that.

\begin{proof}
  By It\^o's formula with $X = M - \frac{1}{2} \bra M\ket$, we have
  \[
    \d Z_t = Z_t \d \left(M_t - \frac{1}{2} \d \bra M\ket_t\right) + \frac{1}{2} Z_t \d \bra M\ket_t = Z_t \;\d M_t.
  \]
  Since $M$ is a continuous local martingale, so is $\int Z_s \;\d M_s$. So $Z$ is a continuous local martingale.

  Now suppose $\bra M\ket_\infty \leq b < \infty$. Then
  \[
    \P \left(\sup_{t \geq 0} M_t \geq a\right) = \P \left(\sup_{t \geq 0} M_t \geq a,\, \bra M\ket_\infty \leq b\right) \leq e^{-a^2/2b},
  \]
  where the final equality is an exercise on the third example sheet, which is true for general continuous local martingales. So we get
  \begin{align*}
    \E \left(\exp \left(\sup_t M_t\right)\right) &= \int_0^\infty \P(\exp (\sup M_t)\geq \lambda)\;\d \lambda \\
    &= \int_0^\infty \P (\sup M_t \geq \log \lambda)\;\d \lambda\\
    &\leq 1 + \int_1^\infty e^{-(\log \lambda)^2/2b} \;\d \lambda < \infty.
  \end{align*}
  Since $\bra M\ket \geq 0$, we know that
  \[
    \sup_{t \geq 0} \mathcal{E}(M)_t \leq \exp \left(\sup M_t\right),
  \]
  So $\mathcal{E}(M)$ is a uniformly integrable martingale.
\end{proof}

\begin{thm}[Girsanov's theorem]\index{Girsanov's theorem}
  Let $M$ be a continuous local martingale with $M_0 = 0$. Suppose that $\mathcal{E}(M)$ is a uniformly integrable martingale. Define a new probability measure
  \[
    \frac{\d \Q}{\d \P} = \mathcal{E}(M)_\infty
  \]
  Let $X$ be a continuous local martingale with respect to $\P$. Then $X - \bra X, M\ket$ is a continuous local martingale with respect to $\Q$.
\end{thm}

\begin{proof}
  Define the stopping time
  \[
    T_n = \inf\{ t \geq 0: |X_t - \bra X, M\ket_t| \geq n\},
  \]
  and $\P(T_n \to \infty) = 1$ by continuity. Since $\Q$ is absolutely continuous with respect to $\P$, we know that $\Q(T_n \to \infty) = 1$. Thus it suffices to show that $X^{T_n} - \bra X^{T_n}, M\ket$ is a continuous martingale for any $n$. Let
  \[
    Y = X^{T_n} - \bra X^{T_n}, M\ket,\quad Z = \mathcal{E}(M).
  \]
  \begin{claim}
    $ZY$ is a continuous local martingale with respect to $\P$.
  \end{claim}
  We use the product rule to compute
  \begin{align*}
    \d (ZY) &= Y_t \;\d Z_t + Z_t \;\d Y_t + \d \bra Y, Z\ket_t\\
    &= Y Z_t \;\d M_t + Z_t (\d X^{T_n} - \d \bra X^{T_n}, M\ket_t) + Z_t \;\d \bra M, X^{T_n}\ket\\
    &= Y Z_t \;\d M_t + Z_t\;\d X^{T_n}
  \end{align*}
  So we see that $ZY$ is a stochastic integral with respect to a continuous local martingale. Thus $ZY$ is a continuous local martingale.

  \begin{claim}
    $ZY$ is uniformly integrable.
  \end{claim}
  Since $Z$ is a uniformly integrable martingale, $\{Z_T: T\text{ is a stopping time}\}$ is uniformly integrable. Since $Y$ is bounded, $\{Z_T Y_T: T\text{ is a stopping time}\}$ is also uniformly integrable. So $ZY$ is a true martingale (with respect to $\P$).

  \begin{claim}
    $Y$ is a martingale with respect to $\Q$.
  \end{claim}
  We have
  \begin{align*}
    \E^{\Q}(Y_t - Y_s \mid \mathcal{F}_s) &= \E^\P(Z_\infty Y_t - Z_\infty Y_s \mid \mathcal{F}_s)\\
    &= \E^{\P} (Z_t Y_t - Z_s Y_s \mid \mathcal{F}_s) = 0.\qedhere
  \end{align*}
\end{proof}
Note that the quadratic variation does not change since
\[
  \bra X - \bra X, M\ket \ket = \bra X \ket_t = \lim_{n \to \infty} \sum_{i = 1}^{\lfloor 2^n t \rfloor} (X_{i 2^{-n}} - X_{(i - 1)2^{-n}})^2\text{ a.s.}
\]
along a subsequence.

\section{Stochastic differential equations}
\subsection{Existence and uniqueness of solutions}
After all this work, we can return to the problem we described in the introduction. We wanted to make sense of equations of the form
\[
  \dot{x}(t) = F(x(t)) + \eta(t),
\]
where $\eta(t)$ is Gaussian white noise. We can now interpret this equation as saying
\[
  \d X_t = F(X_t)\;\d t + \d B_t,
\]
or equivalently, in integral form,
\[
  X_t - X_0 = \int_0^T F(X_s) \;\d s+ B_t.
\]
In general, we can make the following definition:
\begin{defi}[Stochastic differential equation]\index{stochastic differential equation}
  Let $d, m \in \N$, $b:\R_+ \times \R^d \to \R^d$, $\sigma: \R_+ \times \R^d \to \R^{d \times m}$ be locally bounded (and measurable). A solution to the stochastic differential equation $E(\sigma, b)$ given by
  \[
    \d X_t = b(t, X_t) \;\d t + \sigma(t, X_t) \;\d B_t
  \]
  consists of
  \begin{enumerate}
    \item a filtered probability space $(\Omega, \mathcal{F}, (\mathcal{F}_t), \P)$ obeying the usual conditions;
    \item an $m$-dimensional Brownian motion $B$ with $B_0 = 0$; and
    \item an $(\mathcal{F}_t)$-adapted continuous process $X$ with values in $\R^d$ such that
      \[
        X_t = X_0 + \int_0^t \sigma(s, X_s) \;\d B_s + \int_0^t b(s, X_s) \;\d s.
      \]
  \end{enumerate}
  If $X_0 = x \in \R^d$, then we say $X$ is a \emph{(weak) solution}\index{weak solution} to $E_x(\sigma, b)$. It is a \emph{strong} solution\index{strong solution} if it is adapted with respect to the canonical filtration of $B$.
\end{defi}
Our goal is to prove existence and uniqueness of solutions to a general class of SDEs. We already know what it means for solutions to be unique, and in general there can be multiple notions of uniqueness:

\begin{defi}[Uniqueness of solutions]
  For the stochastic differential equation $E(\sigma, b)$, we say there is
  \begin{itemize}
    \item \term{uniqueness in law} if for every $x \in \R^d$, all solutions to $E_x(\sigma, b)$ have the same distribution.
    \item \term{pathwise uniqueness} if when $(\Omega, \mathcal{F}, (\mathcal{F}_t), \P)$ and $B$ are fixed, any two solutions $X, X'$ with $X_0 = X_0'$ are indistinguishable.
  \end{itemize}
\end{defi}

These two notions are not equivalent, as the following example shows:
\begin{eg}[Tanaka]\index{Tanaka equation}
  Consider the stochastic differential equation
  \[
    \d X_t = \sgn(X_t) \;\d B_t,\quad X_0 = x,
  \]
  where
  \[
    \sgn(x) =
    \begin{cases}
      +1 & x > 0\\
      -1 & x \leq 0
    \end{cases}.
  \]
  This has a weak solution which is unique in law, but pathwise uniqueness fails.

  To see the existence of solutions, let $X$ be a one-dimensional Brownian motion with $X_0 = x$, and set
  \[
    B_t = \int_0^t \sgn(X_s) \;\d X_s,
  \]
  which is well-defined because $\sgn(X_s)$ is previsible and left-continuous. Then we have
  \[
    x + \int_0^t \sgn(X_s)\;\d B_s = x + \int_0^t \sgn(X_s)^2 \;\d X_s = x + X_t - X_0 = X_t.
  \]
  So it remains to show that $B$ is a Brownian motion. We already know that $B$ is a continuous local martingale, so by L\'evy's characterization, it suffices to show its quadratic variation is $t$. We simply compute
  \[
    \bra B, B\ket_t = \int_0^t \d \bra X_s, X_s\ket = t.
  \]
  So there is weak existence. The same argument shows that any solution is a Brownian motion, so we have uniqueness in law.

  Finally, observe that if $x = 0$ and $X$ is a solution, then $-X$ is also a solution with the same Brownian motion. Indeed,
  \[
    -X_t = \int_0^t \sgn(X_s) \;\d B_s = \int_0^t \sgn(-X_s)\;\d B_s + 2 \int_0^t \mathbf{1}_{X_s = 0} \;\d B_s,
  \]
  where the second term vanishes, since it is a continuous local martingale with quadratic variation $\int_0^t \mathbf{1}_{X_s = 0} \;\d s = 0$. So pathwise uniqueness does not hold.
\end{eg}

In the other direction, however, it turns out pathwise uniqueness implies uniqueness in law.
\begin{thm}[Yamada--Watanabe]\index{Yamada--Watanabe}
  Assume weak existence and pathwise uniqueness holds. Then
  \begin{enumerate}
    \item Uniqueness in law holds.
    \item For every $(\Omega, \mathcal{F}, (\mathcal{F}_t), \P)$ and $B$ and any $x \in \R^d$, there is a unique strong solution to $E_x(a, b)$.\fakeqed
  \end{enumerate}
\end{thm}
We will not prove this, since we will not actually need it.

The key, important theorem we are now heading for is the existence and uniqueness of solutions to SDEs, assuming reasonable conditions. As in the case of ODEs, we need the following Lipschitz conditions:

\begin{defi}[Lipschitz coefficients]\index{Lipschitz coefficients}
  The coefficients $b: \R_+ \times \R^d \to \R^d$, $\sigma: \R_+ \times \R^d \to \R^{d \times m}$ are Lipschitz in $x$ if there exists a constant $K > 0$ such that for all $t \geq 0$ and $x, y \in \R^d$, we have
  \begin{align*}
    |b(t, x) - b(t, y)| &\leq K|x - y|\\
    |\sigma(t, x) - \sigma(t, y)| &\leq |x - y|
  \end{align*}
\end{defi}

\begin{thm}
  Assume $b, \sigma$ are Lipschitz in $x$. Then there is pathwise uniqueness for the $E(\sigma, b)$ and for every $(\Omega, \mathcal{F}, (\mathcal{F}_t), \P)$ satisfying the usual conditions and every $(\mathcal{F}_t)$-Brownian motion $B$, for every $x \in \R^d$, there exists a unique strong solution to $E_x(\sigma, b)$.
\end{thm}

\begin{proof}
  To simplify notation, we assume $m = d = 1$.

  We first prove pathwise uniqueness. Suppose $X, X'$ are two solutions with $X_0 = X_0'$. We will show that $\E[(X_t - X_t')^2] = 0$. We will actually put some bounds to control our variables. Define the stopping time
  \[
    S = \inf \{t \geq 0: |X_t| \geq n\text{ or }|X_t'| \geq n\}.
  \]
  By continuity, $S \to \infty$ as $n \to \infty$. We also fix a deterministic time $T > 0$. Then whenever $t \in [0, T]$, we can bound, using the identity $(a + b)^2 \leq 2a^2 + 2b^2$,
%  \begin{align*}
%    X_{t \wedge S} &= X_0 + \int_0^{t \wedge S} \sigma(s, X_s)\;\d B_s + \int_0^{t \wedge S} b(s, X_s)\;\d s\\
%    X'_{t \wedge S} &= X_0 + \int_0^{t \wedge S} \sigma(s, X_s')\;\d B_s + \int_0^{t \wedge S} b(s, X_s')\;\d s.
%  \end{align*}
%  Fix a deterministic $T > 0$. Then for $t \in [0, T]$, (using $(a + b)^2 \leq 2a^2 + 2b^2$), we have
  \begin{multline*}
    \E ((X_{t \wedge S} - X'_{t \wedge S})^2) \leq 2 \E \left(\left(\int_0^{t \wedge S} (\sigma(s, X_s) - \sigma(s, X_s'))\;\d B_s\right)^2\right) \\
    + 2 \E \left(\left(\int_0^{t \wedge S} (b(s, X_s) - b(s, X_s'))\;\d s\right)^2 \right).
  \end{multline*}
  We can apply the Lipschitz bound to the second term immediately, while we can simplify the first term using the (corollary of the) It\^o isometry
  \[
    \E \left(\left(\int_0^{t \wedge S}\!\!\!\!\!(\sigma(s, X_s) - \sigma(s, X_s'))\;\d B_s\right)^2\right) = \E \left(\int_0^{t \wedge S}\!\!\!\!\!(\sigma(s, X_s) - \sigma(s, X_s'))^2\;\d s\right).
  \]
  So using the Lipschitz bound, we have
  \begin{align*}
    \E ((X_{t \wedge S} - X'_{t \wedge S})^2) &\leq 2K^2 (1 + T) \E \left(\int_0^{t \wedge S} |X_s - X_s'|^2 \;\d s\right)\\
    &\leq 2K^2 (1 + T) \int_0^t \E (|X_{s \wedge S} - X'_{s \wedge S}|^2)\;\d s.
  \end{align*}
  We now use Gr\"onwall's lemma:
  \begin{lemma}
    Let $h(t)$ be a function such that
    \[
      h(t) \leq c \int_0^t h(s)\;\d s
    \]
    for some constant $c$. Then
    \[
      h(t) \leq h(0) e^{ct}.\fakeqed
    \]
  \end{lemma}
  Applying this to
  \[
    h(t) = \E((X_{t \wedge S} - X_{t \wedge S}')^2),
  \]
  we deduce that $h(t) \leq h(0) e^{ct} = 0$. So we know that
  \[
    \E(|X_{t \wedge S} - X_{t \wedge S}'|^2) = 0
  \]
  for every $t \in [0, T]$. Taking $n \to \infty$ and $T \to \infty$ gives pathwise uniqueness.

  We next prove existence of solutions. We fix $(\Omega, \mathcal{F}, (\mathcal{F}_t)_t)$ and $B$, and define
  \[
    F(X)_t = X_0 + \int_0^t \sigma(s, X_s)\;\d B_s + \int_0^t b(s, X_s) \;\d s.
  \]
  Then $X$ is a solution to $E_x(a, b)$ iff $F(X) = X$ and $X_0 = x$. To find a fixed point, we use Picard iteration. We fix $T > 0$, and define the $T$-norm of a continuous adapted process $X$ as
  \[
    \|X\|_T = \E\left(\sup_{t \leq T} |X_t|^2\right)^{1/2}.
  \]
  In particular, if $X$ is a martingale, then this is the same as the norm on the space of $L^2$-bounded martingales by Doob's inequality. Then
  \[
    B = \{X: \Omega \times [0, T] \to \R: \|X\|_T < \infty\}
  \]
  is a Banach space.
  \begin{claim}
    $\|F(0)\|_T < \infty$, and
    \[
      \|F(X) - F(Y)\|_T^2 \leq (2T + 8)K^2 \int_0^T \|X - Y\|_t^2 \;\d t.
    \]
  \end{claim}
  We first see how this claim implies the theorem. First observe that the claim implies $F$ indeed maps $B$ into itself. We can then define a sequence of processes $X^i_t$ by
  \[
    X_t^0 = x,\quad
    X^{i + 1} = F(X^i).
  \]
  Then we have
  \[
    \|X^{i + 1} - X^i\|_T^2 \leq CT \int_0^T \|X^i - X^{i - 1}\|^2_t \;\d t \leq \cdots \leq \|X^1 - X^0\|^2_T \left(\frac{CT^i}{i!}\right).
  \]
  So we find that
  \[
    \sum_{i = 1}^\infty \|X^i - X^{i - 1}\|_T^2 < \infty
  \]
  for all $T$. So $X^i$ converges to $X$ almost surely and uniformly on $[0, T]$, and $F(X) = X$. We then take $T \to \infty$ and we are done.

  To prove the claim, we write
  \[
    \|F(0)\|_T \leq |X_0| + \left\| \int_0^t b(s, 0) \;\d s\right\| + \left\| \int_0^t \sigma(s, 0) \;\d B_s\right\|_T.
  \]
  The first two terms are constant, and we can bound the last by Doob's inequality and the It\^o isometry:
  \[
    \left\| \int_0^t \sigma(s, 0) \;\d B_s\right\|_T \leq 2 \E \left(\left|\int_0^T \sigma(s, 0) \;\d B_s\right|^2\right) = 2 \int_0^T \sigma(s, 0)^2 \;\d s.
  \]
  To prove the second part, we use
  \begin{multline*}
    \|F(X) - F(Y)\|^2 \leq 2 \E \left(\sup_{t \leq T} \left|\int_0^t b(s, X-s) - b(s, Y_s)\;\d s\right|^2\right) \\
    + 2 \E \left(\sup_{t \leq T} \left|\int_0^t(\sigma(s, X_s) - \sigma(s, Y_s))\;\d B_s\right|^2\right).
  \end{multline*}
  We can bound the first term with Cauchy--Schwartz by
  \[
    T \E \left(\int_0^T |b(s, X_s) - b(s, Y_s)|^2 \right) \leq TK^2 \int_0^T \|X -Y \|_t^2 \;\d t,
  \]
  and the second term with Doob's inequality by
  \[
    \E \left(\int_0^T |\sigma(s, X_s) - \sigma(s, Y_s)|^2 \;\d s\right) \leq 4K^2 \int_0^T \|X - Y\|_t^2\;\d t.\fakeqed%\qedhere
  \]
\end{proof}

\subsection{Examples of stochastic differential equations}
\begin{eg}[The \term{Ornstein--Uhlenbeck process}]
  Let $\lambda > 0$. Then the Ornstein--Uhlenbeck process is the solution to
  \[
    \d X_t = - \lambda X_t \;\d t + \d B_t.
  \]
  The solution exists by the previous theorem, but we can also explicitly find one.

  By It\^o's formula applied to $e^{\lambda t} X_t$, we get
  \[
    \d (e^{\lambda t} X_t) = e^{\lambda t} \d X_t + \lambda e^{\lambda t} X_t \;\d t = \d B_t.
  \]
  So we find that
  \[
    X_t = e^{-\lambda t} X_0 + \int_0^t e^{-\lambda (t - s)}\;\d B_s.
  \]
  Observe that the integrand is deterministic. So we can in fact interpret this as an Wiener integral.
\end{eg}

\begin{fact}
  If $X_0 = x \in \R$ is fixed, then $(X_t)$ is a Gaussian process, i.e.\ $(X_{t_i})_{i = 1}^n $ is jointly Gaussian for all $t_1 < \cdots < t_n$. Any Gaussian process is determined by the mean and covariance, and in this case, we have
  \[
    \E X_t = e^{-\lambda t} x,\quad \cov(X_t, X_s) = \frac{1}{2\lambda} \left(e^{-\lambda |t - s|} - e^{-\lambda |t + s|}\right)
  \]
\end{fact}

\begin{proof}
  We only have to compute the covariance. By the It\^o isometry, we have
  \begin{align*}
    \E ((X_t - \E X_t) (X_s - \E X_s)) &= \E \left(\int_0^t e^{-\lambda (t - u)}\;\d B_u \int_0^s e^{-\lambda (s - u)}\;\d B_u\right)\\
    &= e^{-\lambda (t + s)} \int_0^{t \wedge s} e^{\lambda u}\;\d u.\qedhere
  \end{align*}
\end{proof}
In particular,
\[
  X_t \sim N\left(e^{-\lambda t}x, \frac{1 - e^{-2\lambda t}}{2 \lambda}\right) \to N\left(0, \frac{1}{2\lambda}\right).
\]
\begin{fact}
  If $X_0 \sim N(0, \frac{1}{2\lambda})$, then $(X_t)$ is a centered Gaussian process with stationary covariance, i.e.\ the covariance depends only on time differences:
  \[
    \cov(X_t, X_s) = \frac{1}{2\lambda} e^{-\lambda |t - s|}.
  \]
\end{fact}
The difference is that in the deterministic case, the $\E X_t$ cancels the first $e^{-\lambda t} X_0$ term, while in the non-deterministic case, it doesn't.

This is a very nice example where we can explicitly understand the long-time behaviour of the SDE. In general, this is non-trivial.

\subsubsection*{Dyson Brownian motion}
\index{Dyson Brownian motion}
Let $\mathcal{H}_N$ be an inner product space of real symmetric $N \times N$ matrices with inner product $N \Tr(HK)$ for $H, K \in \mathcal{H}_N$. Let $H^1, \ldots, H^{\dim(\mathcal{H}_N)}$ be an orthonormal basis for $\mathcal{H}_N$.

\begin{defi}[Gaussian orthogonal ensemble]\index{Gaussian orthogonal ensemble}
  The \emph{Gaussian Orthogonal Ensemble} GOE$_N$ is the standard Gaussian measure on $\mathcal{H}_N$, i.e.\ $H \sim \mathrm{GOE}_N$ if
  \[
    H = \sum_{r = 1}^{\dim \mathcal{H}_n} H^i X^i
  \]
  where each $X^i$ are iid standard normals.
\end{defi}

We now replace each $X^i$ by a Ornstein--Uhlenbeck process with $\lambda = \frac{1}{2}$. Then GOE$_N$ is invariant under the process.

\begin{thm}
  The eigenvalues $\lambda_1(t) \leq \cdots \leq \lambda_N(t)$ satisfies
  \[
    \d \lambda_t^i = \left(-\frac{\lambda^i}{2} + \frac{1}{N} \sum_{j \not= i} \frac{1}{\lambda_i - \lambda_j}\right)\;\d t + \sqrt{\frac{2}{N\beta}}\;\d B^i.
  \]
  Here $\beta = 1$, but if we replace symmetric matrices by Hermitian ones, we get $\beta = 2$; if we replace symmetric matrices by symplectic ones, we get $\beta = 4$.
\end{thm}
This follows from It\^o's formula and formulas for derivatives of eigenvalues.% read more about this.

\begin{eg}[Geometric Brownian motion]\index{Geometric Brownian motion}
  Fix $\sigma > 0$ and $t \in \R$. Then geometric Brownian motion is given by
  \[
    \d X_t = \sigma X_t \;\d B_t + r X_t \;\d t.
  \]
  We apply It\^o's formula to $\log X_t$ to find that
  \[
    X_t = X_0 \exp \left(\sigma B_t + \left(r - \frac{\sigma^2}{2}\right)t\right).
  \]
\end{eg}

\begin{eg}[Bessel process]\index{Bessel process}
  Let $B = (B^1, \ldots, B^d)$ be a $d$-dimensional Brownian motion. Then
  \[
    X_t = |B_t|
  \]
  satisfies the stochastic differential equation
  \[
    \d X_t = \frac{d - 1}{2X_t} \;\d t + \d B_t % check
  \]
  if $t < \inf\{t \geq 0: X_t = 0\}$.
\end{eg}

\subsection{Representations of solutions to PDEs}
Recall that in Advanced Probability, we learnt that we can represent the solution to Laplace's equation via Brownian motion, namely if $D$ is a suitably nice domain and $g: \partial D \to \R$ is a function, then the solution to the Laplace's equation on $D$ with boundary conditions $g$ is given by
\[
  u(\mathbf{x}) = \E_{\mathbf{x}}[g(B_T)],
\]
where $T$ is the first hitting time of the boundary $\partial D$.

A similar statement we can make is that if we want to solve the heat equation
\[
  \frac{\partial u}{\partial t} = \nabla^2 u
\]
with initial conditions $u(x, 0) = u_0(x)$, then we can write the solution as
\[
  u(\mathbf{x}, t) = \E_\mathbf{x}[u_0(\sqrt{2} B_t)]
\]
This is just a fancy way to say that the Green's function for the heat equation is a Gaussian, but is a good way to think about it nevertheless.

In general, we would like to associate PDEs to certain stochastic processes. Recall that a stochastic PDE is generally of the form
\[
  \d X_t = b(X_t)\;\d t + \sigma(X_t)\;\d B_t
\]
for some $b: \R^d \to \R$ and $\sigma: \R^d \to \R^{d \times m}$ which are measurable and locally bounded. Here we assume these functions do not have time dependence. We can then associate to this a differential operator $L$ defined by
\[
  L = \frac{1}{2} \sum_{i, j} a_{ij} \partial_i \partial_j + \sum_i b_i \partial_i.
\]
where $a = \sigma \sigma^T$.

\begin{eg}
  If $b = 0$ and $\sigma = \sqrt{2} I$, then $L = \Delta$ is the standard Laplacian.
\end{eg}

The basic computation is the following result, which is a standard application of the It\^o formula:
\begin{prop}
  Let $x \in \R^d$, and $X$ a solution to $E_x(\sigma, b)$. Then for every $f: \R_+ \times \R^d \to \R$ that is $C^1$ in $\R_+$ and $C^2$ in $\R^d$, the process
  \[
    M_t^f = f(t, X_t) - f(0, X_0) - \int_0^t \left(\frac{\partial}{\partial s} + L\right)f(s, X_s)\;\d s
  \]
  is a continuous local martingale.\fakeqed
\end{prop}

We first apply this to the \term{Dirichlet--Poisson problem}, which is essentially to solve $-Lu = f$. To be precise, let $U \subseteq \R^d$ be non-empty, bounded and open; $f \in C_b(U)$ and $g \in C_b(\partial U)$. We then want to find a $u \in C^2(\bar{U}) = C^2(U) \cap C(\bar{U})$ such that
\begin{align*}
  -Lu(x) &= f(x)\quad\text{ for } x \in U\\
  u(x) &= g(x)\quad\text{ for } x \in \partial U.
\end{align*}
If $f = 0$, this is called the \term{Dirichlet problem}; if $g = 0$, this is called the \term{Poisson problem}.

We will have to impose the following technical condition on $a$:
\begin{defi}[Uniformly elliptic]\index{uniformly elliptic}
  We say $a: \bar{U} \to \R^{d \times d}$ is \emph{uniformly elliptic} if there is a constant $c > 0$ such that for all $\xi \in \R^d$ and $x \in \bar{U}$, we have
  \[
    \xi^T a(x) \xi \geq c |\xi|^2.
  \]
\end{defi}
If $a$ is symmetric (which it is in our case), this is the same as asking for the smallest eigenvalue of $a$ to be bounded away from $0$.

It would be very nice if we can write down a solution to the Dirichlet--Poisson problem using a solution to $E_x(\sigma, b)$, and then simply check that it works. We can indeed do that, but it takes a bit more time than we have. Instead, we shall prove a slightly weaker result that if we happen to have a solution, it must be given by our formula involving the SDE. So we first note the following theorem without proof:

\begin{thm}
  Assume $U$ has a smooth boundary (or satisfies the exterior cone condition), $a, b$ are H\"older continuous and $a$ is uniformly elliptic. Then for every H\"older continuous $f: \bar{U} \to \R$ and any continuous $g: \partial U \to \R$, the Dirichlet--Poisson process has a solution.\fakeqed
\end{thm}

The main theorem is the following:
\begin{thm}
  Let $\sigma$ and $b$ be bounded measurable and $\sigma \sigma^T$ uniformly elliptic, $U \subseteq \R^d$ as above. Let $u$ be a solution to the Dirichlet--Poisson problem and $X$ a solution to $E_x(\sigma, b)$ for some $x \in \R^d$. Define the stopping time
  \[
    T_U = \inf \{t \geq 0: X_t \not \in U\}.
  \]
  Then $\E T_U < \infty$ and
  \[
    u(x) = \E_x\left(g(X_{T_U}) + \int_0^{T_U} f(X_s)\;\d s\right).
  \]
  In particular, the solution to the PDE is unique.
\end{thm}

\begin{proof}
  Our previous proposition applies to functions defined on all of $\R^n$, while $u$ is just defined on $U$. So we set
  \[
    U_n = \left\{x \in U: \mathrm{dist}(x, \partial U) > \frac{1}{n}\right\},\quad T_n = \inf \{ t \geq 0 : X_t \not \in U_n\},
  \]
  and pick $u_n \in C_b^2(\R^d)$ such that $u|_{U_n} = u_n|_{U_n}$. Recalling our previous notation, let
  \[
    M^n_t = (M^{u_n})^{T_n}_t = u_n(X_{t \wedge T_n}) - u_n(X_0) - \int_0^{t \wedge T_n} Lu_n(X_s)\;\d s.
  \]
  Then this is a continuous local martingale that is bounded by the proposition, and is bounded, hence a true martingale. Thus for $x \in U$ and $n$ large enough, the martingale property implies
  \begin{multline*}
    u(x) = u_n(x) =\E \left(u(X_{t \wedge T_n}) - \int_0^{t \wedge T_n} Lu(X_s)\;\d s\right) \\
    = \E \left(u(X_{t \wedge T_n}) + \int_0^{t \wedge T_n} f(X_s)\;\d s\right).
  \end{multline*}
  We would be done if we can take $n \to \infty$. To do so, we first show that $\E [T_U] < \infty$.

  Note that this does not depend on $f$ and $g$. So we can take $f = 1$ and $g = 0$, and let $v$ be a solution. Then we have
  \[
    \E (t \wedge T_n) = \E \left(- \int_0^{t \wedge T_n} Lv(X_s)\;\d s\right) = v(x) - \E(v(X_{t \wedge T_n})).
  \]
  Since $v$ is bounded, by dominated/monotone convergence, we can take the limit to get
  \[
    \E (T_U) < \infty.
  \]

  Thus, we know that $t \wedge T_n \to T_U$ as $t \to \infty$ and $n \to \infty$. Since
  \[
    \E \left(\int_0^{T_U} |f(X_s)|\;\d s\right) \leq \|f\|_\infty \E[T_U] < \infty,
  \]
  the dominated convergence theorem tells us
  \[
    \E \left(\int_0^{t \wedge T_n} f(X_s)\;\d s \right) \to \E \left(\int_0^{T_U} f(X_s)\;\d s\right).
  \]
  Since $u$ is continuous on $\bar{U}$, we also have
  \[
    \E (u(X_{t \wedge T_n})) \to \E(u(T_u)) = \E(g(T_u)).\qedhere
  \]
\end{proof}

We can use SDE's to solve the \term{Cauchy problem} for parabolic equations as well, just like the heat equation. The problem is as follows: for $f \in C_b^2(\R^d)$, we want to find $u: \R_+ \times \R^d \to \R$ that is $C^1$ in $\R_+$ and $C^2$ in $\R^d$ such that
\begin{align*}
  \frac{\partial u}{\partial t} &= Lu \quad \text{ on }\R_+ \times \R^d\\
  u(0, \ph) &= f\hphantom{L} \quad \text{ on }\R^d
\end{align*}

Again we will need the following theorem:
\begin{thm}
  For every $f \in C_b^2(\R^d)$, there exists a solution to the Cauchy problem.\fakeqed
\end{thm}

\begin{thm}
  Let $u$ be a solution to the Cauchy problem. Let $X$ be a solution to $E_x(\sigma, b)$ for $x \in \R^d$ and $0 \leq s \leq t$. Then
  \[
    \E_x(f(X_t) \mid \mathcal{F}_s) = u(t - s, X_s).
  \]
  In particular,
  \[
    u(t, x) = \E_x(f(X_t)).
  \]
\end{thm}
In particular, this implies $X_t$ is a continuous Markov process.

\begin{proof}
  The martingale has $\frac{\partial}{\partial t} + L$, but the heat equation has $\frac{\partial}{\partial t} - L$. So we set $g(s, x) = u(t - s, x)$. Then
  \[
    \left(\frac{\partial}{\partial s} + L\right) g(s, x) = - \frac{\partial}{\partial t} u(t - s, x) + Lu(t - s, x) = 0.
  \]
  So $g(s, X_s) - g(0, X_0)$ is a martingale (boundedness is an exercise), and hence
  \[
    u(t - s, X_s) = g(s, X_s) = \E (g(t, X_t) \mid \mathcal{F}_s) = \E(u(0, X_t) \mid \mathcal{F}_s) = \E(f(X_t) \mid X_s).\qedhere
  \]
\end{proof}

There is a generalization to the \emph{Feynman--Kac formula}.
\begin{thm}[Feynman--Kac formula]\index{Feynman--Kac formula}
  Let $f \in C_b^2(\R^d)$ and $V \in C_b(\R^d)$ and suppose that $u: \R_+ \times \R^d \to \R$ satisfies
  \begin{align*}
    \frac{\partial u}{\partial t} &= Lu + Vu \quad \text{ on }\R_+ \times \R^d\\
    u(0, \ph) &= f\hphantom{L+Vu}\quad \text{ on } \R^d,
  \end{align*}
  where $Vu = V(x) u(x)$ is given by multiplication.

  Then for all $t > 0$ and $x \in \R^d$, and $X$ a solution to $\E_x(\sigma, b)$. Then
  \[
    u(t, x) = \E_x\left(f(X_t) \exp \left(\int_0^t V(X_s)\;\d s\right)\right).\fakeqed
  \]
\end{thm}
If $L$ is the Laplacian, then this is Schr\"odinger equation, which is why Feynman was thinking about this.
%In quantum mechanics, a basic problem is to estimate the lowest eigenvalue of the Schr\"odinger operator, which is the ground state energy. This is hard. This gives a means to do that, as this gives a formula to calculate $e^{-tH}$, since $\frac{1}{t} (e^{-tH} - 1)$ converges to the lowest eigenvalue.

\printindex
\end{document}